💡 Press Cmd+P (Mac) or Ctrl+P (Windows) to save as PDF. This banner hides when printed.

MissionRobo · Career roadmap

Perception / Computer Vision Engineer

Skydio · Anduril · Dedrone · Apptronik · 20 weeks

The 20-week path from "I can run YOLO on a webcam" to "I ship robust perception on a robot." Targets perception roles at the top autonomy companies, where the bar includes multi-view geometry, sensor fusion, and real-time deployment.

Advanced · ~20 weeks · 12 topics · 15 resources

01. CV foundations

Image formation, multi-view geometry, classical features.

Camera models, intrinsics, extrinsicsRequired

Pinhole + distortion, what camera calibration actually computes.

You will read camera_info.yaml every day. Know what every field means.

OpenCV camera calibration tutorialFREE — Official OpenCV docs. Run the example end-to-end with a checkerboard.
Computer Vision: Algorithms and Applications — Szeliski (free PDF)FREE — The free CV textbook. Chapters 2-3 cover camera models in depth.

Classical features (SIFT, ORB, FAST)Recommended

Pre-deep-learning detection. Still used inside many SLAM stacks.

ORB-SLAM3 is built on classical features. You cannot debug it without knowing them.

OpenCV feature detection tutorialsFREE — Official walkthrough of SIFT, ORB, BRIEF, FAST with code samples.

Epipolar geometry + triangulationRequired

How two cameras give you depth. The math behind stereo and SfM.

Hartley & Zisserman chapters 9-12 are the canonical reference. Heavy but worth it.

Multiple View Geometry — Hartley & Zisserman — The bible of multi-view geometry. Reference, not cover-to-cover.
Cyrill Stachniss — Photogrammetric Computer VisionFREE — Free Uni Bonn lectures. Clearest videos on epipolar geometry online.

02. Modern deep-learning vision

Detection, segmentation, depth, NeRF, transformers.

Deep learning for vision (Stanford CS231n)Required

CNNs, attention, modern training tricks.

The Stanford course is still the best free intro. Lectures + assignments take ~6 weeks if you do them properly.

Stanford CS231n — Convolutional Neural NetworksFREE — Free Stanford lecture videos + assignments. The single best CV deep-learning course.
Deep Learning Specialization — Andrew Ng (Coursera) — Paid specialization. Companion to CS231n; lighter math, more breadth.

Object detection (YOLO → DETR)Required

YOLO for speed, transformer-based detectors for quality.

YOLOv8/v9 is the workhorse on real robots. DETR variants are the research direction.

Ultralytics YOLO docsFREE — Official YOLOv8/v9 docs. Training + deployment in one place.

Segmentation (Mask R-CNN, SAM 2)Recommended

Per-pixel labeling for scene understanding.

SAM 2 (Meta, 2024) is the default for new pipelines. Mask R-CNN still ships in legacy stacks.

Segment Anything 2 (Meta)FREE — The current state of the art for general segmentation. Read the paper + repo.

Depth estimation (stereo + mono)Recommended

Stereo block matching, monocular depth networks (MiDaS, Depth Anything).

Stereo is the reliable workhorse; mono is the cheap option that's gotten surprisingly good.

Depth Anything V2FREE — Current best free monocular depth model. Plug-and-play for many use cases.

NeRF and 3D reconstructionOptional

Neural radiance fields, Gaussian splatting, novel view synthesis.

The hot research area. Real production use is still emerging but worth being fluent for senior interviews.

Gaussian Splatting (Inria)FREE — The 2023 paper that shifted the NeRF research direction. Read it.

03. Sensor fusion + calibration

Camera × LiDAR × IMU — what separates demo perception from production.

Multi-sensor calibration (Kalibr)Required

How to align camera + IMU + LiDAR coordinate frames.

Kalibr is the open-source standard. Reading the source is half the job.

Kalibr (ETH Zurich)FREE — The de-facto standard for multi-sensor calibration in robotics.

Camera + LiDAR fusionRecommended

Project LiDAR points onto images, project image features into 3D.

Autonomous vehicles have done this for a decade. Robotics is catching up.

KITTI Vision Benchmark SuiteFREE — Free dataset + calibration files. Best playground for camera + LiDAR fusion.

04. Production deployment

TensorRT, ONNX, real-time pipelines, edge inference.

TensorRT for edge inferenceRecommended

NVIDIA's optimizer for running models fast on Jetson.

Senior perception engineers can convert a PyTorch model to TensorRT in a day. Junior engineers can't. Be the senior.

NVIDIA TensorRT documentationFREE — Official NVIDIA docs. Includes Jetson-specific recipes.

ONNX as the interchange formatRecommended

Convert between PyTorch, TF, JAX, edge runtimes.

ONNX is the duct tape that holds modern MLOps together for vision.

ONNX official tutorialFREE — Official conversion tutorials between every major framework.