💡 Press Cmd+P (Mac) or Ctrl+P (Windows) to save as PDF. This banner hides when printed.
MissionRobo · Career roadmap

Perception / Computer Vision Engineer

Skydio · Anduril · Dedrone · Apptronik · 20 weeks

The 20-week path from "I can run YOLO on a webcam" to "I ship robust perception on a robot." Targets perception roles at the top autonomy companies, where the bar includes multi-view geometry, sensor fusion, and real-time deployment.

Advanced · ~20 weeks · 12 topics · 15 resources

01. CV foundations

Image formation, multi-view geometry, classical features.

Classical features (SIFT, ORB, FAST)Recommended

Pre-deep-learning detection. Still used inside many SLAM stacks.

ORB-SLAM3 is built on classical features. You cannot debug it without knowing them.

02. Modern deep-learning vision

Detection, segmentation, depth, NeRF, transformers.

Object detection (YOLO → DETR)Required

YOLO for speed, transformer-based detectors for quality.

YOLOv8/v9 is the workhorse on real robots. DETR variants are the research direction.

Segmentation (Mask R-CNN, SAM 2)Recommended

Per-pixel labeling for scene understanding.

SAM 2 (Meta, 2024) is the default for new pipelines. Mask R-CNN still ships in legacy stacks.

Depth estimation (stereo + mono)Recommended

Stereo block matching, monocular depth networks (MiDaS, Depth Anything).

Stereo is the reliable workhorse; mono is the cheap option that's gotten surprisingly good.

  • Depth Anything V2FREECurrent best free monocular depth model. Plug-and-play for many use cases.

NeRF and 3D reconstructionOptional

Neural radiance fields, Gaussian splatting, novel view synthesis.

The hot research area. Real production use is still emerging but worth being fluent for senior interviews.

03. Sensor fusion + calibration

Camera × LiDAR × IMU — what separates demo perception from production.

Multi-sensor calibration (Kalibr)Required

How to align camera + IMU + LiDAR coordinate frames.

Kalibr is the open-source standard. Reading the source is half the job.

Camera + LiDAR fusionRecommended

Project LiDAR points onto images, project image features into 3D.

Autonomous vehicles have done this for a decade. Robotics is catching up.

04. Production deployment

TensorRT, ONNX, real-time pipelines, edge inference.

TensorRT for edge inferenceRecommended

NVIDIA's optimizer for running models fast on Jetson.

Senior perception engineers can convert a PyTorch model to TensorRT in a day. Junior engineers can't. Be the senior.

ONNX as the interchange formatRecommended

Convert between PyTorch, TF, JAX, edge runtimes.

ONNX is the duct tape that holds modern MLOps together for vision.