Segmentation¶

Pixel-level object boundaries. Three flavors: semantic (pixel → class), instance (pixel → instance), panoptic (both). SAM changed everything in 2023.

Segmentation used to be a painful subtask that required large annotated datasets and task-specific models. SAM (Segment Anything, Meta, 2023) and SAM 2 (video, 2024) reset the field. For most use cases in 2026, the question is "which SAM-family model" rather than "train my own from scratch."

Recommended picks¶

Tier	Pick	When to use
Edge / mobile	MobileSAM or EfficientSAM	On-device, low-latency interactive segmentation
Default (zero-shot, concept-prompted)	SAM 3 / SAM 3.1 (Meta, Nov 2025 / Mar 2026)	Native text + image exemplar prompts. Doubles SAM 2 accuracy on PCS. Video tracking at 32 FPS.
Default (trained task)	Mask2Former or YOLO26-seg	When you have labeled data and want instance segmentation in one pass
Max accuracy (research)	OneFormer or SAM 3 with concept prompts	Research benchmarks

[!WARNING] License notes for picks: - SAM 2 / 3 / 3.1: Apache-2.0 (Meta) — commercial-safe. Good news: segmentation's most capable open models ship commercial-clean. - YOLO26-seg: AGPL-3.0 (Ultralytics) — commercial use requires paid Enterprise License. Use Mask2Former if you want commercial-safe trained instance seg. - Mask2Former: MIT (Meta) — commercial-safe. - OneFormer: MIT — commercial-safe. - MobileSAM / EfficientSAM / FastSAM: Apache-2.0 — commercial-safe.

Segmentation splits into "interactive / zero-shot" and "trained / one-shot" — those are different picks.

Zero-shot: SAM 3 / SAM 3.1 (current default)¶

Meta shipped SAM 3 in November 2025 and SAM 3.1 in March 2026. The shift from SAM 2 is meaningful enough to replace SAM 2 as the default pick.

What SAM 3 added over SAM 2: - Concept prompts — short noun phrases ("yellow school bus") or image exemplars, not just points/boxes. Open-vocabulary segmentation natively. - ~2× accuracy on image and video PCS (Promptable Concept Segmentation) vs SAM 2. - Trained via a data engine that auto-annotated 4M+ unique concepts, the largest open-vocab segmentation dataset to date.

What SAM 3.1 added over SAM 3 (March 2026): - Object Multiplex — shared-memory joint multi-object tracking. - 2× throughput on video — 16 → 32 FPS on H100 for medium-object-count videos.

Install: pip install git+https://github.com/facebookresearch/sam3.git or via Ultralytics (docs.ultralytics.com/models/sam-3/).

SAM 2 is still deployed in production pipelines and fine; treat as migration-when-convenient, not urgent.

Trained task (Mask2Former / YOLOv8-seg default)¶

When you have 1,000+ labeled instances of the specific objects you care about and want one-pass inference:

YOLOv8-seg / YOLOv11-seg — fast instance segmentation, same ecosystem as YOLO detection. Good for real-time, reasonable accuracy.
Mask2Former (Meta) — unified architecture for semantic/instance/panoptic. Accuracy-leaning default.

Edge segmentation¶

SAM is heavy. For on-device: - MobileSAM (2023) — distilled SAM, ~10x faster, small accuracy drop. Pretrained weights available. - EfficientSAM (2024) — similar goal, different distillation. Slightly better accuracy-speed frontier. - YOLOv8n-seg — not SAM-family, but a fast instance segmenter for known classes.

When to pick something else¶

Panoptic segmentation (combined instance + semantic) → Mask2Former panoptic head.
Semantic only (road, sky, vegetation in autonomous driving) → DeepLabV3+, SegFormer, or Mask2Former semantic head.
Medical imaging → MONAI ecosystem. nnU-Net for tight-annotation medical segmentation.
Video segmentation with temporal consistency → SAM 2. Or the XMem / Cutie family for mask propagation from a first-frame annotation.
Browser / real-time → MediaPipe selfie segmentation, body segmentation.

The three questions to narrow¶

Do you have labeled data for your classes? No → SAM 2 interactive. Yes → trained model.
Edge or server? Edge → MobileSAM / YOLO-seg. Server → SAM 2 or Mask2Former.
Instance, semantic, or panoptic? Pick accordingly; SAM produces instance-flavored output by default.

The Dump¶

U-Net (2015) — the encoder-decoder that started it all. Still the standard for medical.
Mask R-CNN (2017) — instance segmentation via two-stage detection + mask head. Historical default.
DeepLabV3+ (2018) — semantic segmentation. Still used in autonomous.
PointRend (2020) — sharp boundaries via point-based refinement.
SegFormer (2021) — ViT-based semantic. Good accuracy-efficiency. License: NVIDIA Source Code License — non-commercial; commercial use requires NVIDIA agreement.
Mask2Former (2022) — unified architecture. The serious trained-model default.
SAM (Meta, 2023) — open-vocabulary, promptable. Changed the field.
SAM 2 (Meta, 2024) — video extension of SAM.
SAM 3 (Meta, Nov 2025) — concept prompts (text + exemplars). 2× accuracy on PCS vs SAM 2.
SAM 3.1 (Meta, Mar 2026) — Object Multiplex for joint MOT, 2× video throughput.
MobileSAM (2023) — mobile/edge distilled SAM.
EfficientSAM (Meta, 2024) — another SAM distillation.
FastSAM (2023) — YOLOv8-seg adapted for SAM-style prompts. Significantly faster, less accurate.
HQ-SAM (2023) — high-quality mask refinement on top of SAM.
YOLOv8-seg / YOLOv11-seg — instance segmentation with the YOLO ecosystem. License: AGPL-3.0; commercial use requires Ultralytics Enterprise License.
OneFormer (2023) — unified architecture; one model for all three segmentation types.
Grounded-SAM — Grounding DINO + SAM → text-prompt segmentation.
SAM + CLIP — zero-shot class-agnostic then classify.
nnU-Net — self-configuring U-Net for medical. The medical default.
MONAI — medical imaging AI ecosystem; includes segmentation networks and training pipelines.
Panoptic-DeepLab — panoptic semantic + instance in one net.
MediaPipe Selfie / Image Segmenter — browser/mobile, limited to specific classes.
Apple Vision VNGeneratePersonSegmentationRequest — on-device iOS person segmentation.

Graveyard¶

Mask R-CNN as a default for new work — retired by Mask2Former and SAM.
Hand-crafted CRF post-processing — retired ~2019; modern models produce clean masks without it.
FCN (fully convolutional, 2015) — superseded by U-Net and DeepLab family.

Last reviewed¶

2026-04-22. SAM 2 is still the newest major release in this space as of writing.