Pose Estimation¶

Finding keypoints on people (and sometimes hands and faces). The default stack changed in 2023 when RTMPose distilled the ViTPose accuracy into a fast CNN, and again in 2024 as MediaPipe fully replaced OpenPose for browser/mobile.

Pose splits into two deployment flavors — server-side accuracy and on-device real-time — plus a research tier. Different picks for each.

Recommended picks¶

Tier	Pick	When to use
Browser / mobile	MediaPipe Pose	JS in the browser, iOS/Android, real-time, no server
Default	RTMPose (OpenMMLab)	Server-side real-time, production accuracy
Max accuracy	ViTPose	Research benchmarks, offline batch

Why RTMPose is the default¶

Published 2023 by OpenMMLab. Top-down architecture (detect box, then estimate keypoints inside) with a strong training recipe. Hits ViTPose-level accuracy at 5–10× the speed. ONNX-exportable. Ships with COCO and whole-body (COCO-WholeBody: face + body + hand) weights.

mmpose is the reference library. Install via pip install mmpose. Can be finicky with PyTorch versions; the InsightFace team's rtmlib packages RTMPose with fewer dependencies.

MediaPipe Pose — browser/mobile default¶

Google's on-device pose library. 33 body keypoints. TFLite-backed, runs in WASM (browser) or native (iOS/Android). ~30 FPS on a modern phone. Accuracy a few points below RTMPose on hard sets but the deployment story is unbeatable for client-side work.

If you're building anything that runs in a browser tab or a phone app without a server, this is your pick.

When to pick something else¶

Hand pose only → MediaPipe Hands (21 keypoints per hand) or OpenMMLab's hand-specific RTMPose variant.
Whole-body with face + hands in one pass → RTMPose with COCO-WholeBody weights (133 keypoints total).
3D pose from a single image → MediaPipe Pose already outputs pseudo-3D. For metric 3D, MotionBERT or the MMHuman3D family.
Crowded scenes → bottom-up methods (HRNet bottom-up, DEKR) handle many people better than top-down.
Action recognition from pose → pose first (RTMPose), then a downstream action model (STGCN++, PoseC3D).

The three questions to narrow¶

Server or client? Server → RTMPose. Client → MediaPipe.
Body only or whole-body? Body only → either pick works. Whole-body → RTMPose with WholeBody weights.
2D or 3D? 2D → any. Pseudo-3D → MediaPipe. Metric 3D → MotionBERT family.

The Dump¶

OpenPose (2017) — the original multi-person pose pipeline. Historical weight, supplanted. License: CMU non-commercial academic license — commercial use requires a paid license from CMU; this is the famous reason OpenPose stopped being a production default around 2020.
HRNet (2019) — high-resolution network. Still the backbone of many modern pose models.
AlphaPose (SJTU) — strong top-down system. Outpaced by RTMPose. License: non-commercial research license.
Detectron2 Keypoint R-CNN — Facebook's keypoint detection in Detectron2. Reasonable but dated.
MediaPipe Pose (Google) — TFLite, 33 keypoints, browser/mobile default.
MediaPipe Hands — 21 keypoints per hand, same ecosystem.
MediaPipe Holistic — body + face + hands together, same shot.
DEKR — bottom-up keypoint regression. Crowded scenes.
ViTPose (2022) — ViT backbone, strong accuracy. Expensive.
ViTPose++ — improved training recipe. Same compute class.
RTMPose (2023) — OpenMMLab's distilled real-time pose. The current default.
RTMW (OpenMMLab) — whole-body variant. First model to exceed 70 AP on COCO-WholeBody (RTMW-x at 70.2 AP).
RTMO — bottom-up variant from the same team.
DWPose (2023) — distilled whole-body, used as the ControlNet pose preprocessor.
MotionBERT — 3D human motion representation. Strong on video.
MMHuman3D (OpenMMLab) — SMPL body fitting from video.
Yolov8-pose / Yolov11-pose (Ultralytics) — YOLO with a keypoint head. Fast, good ecosystem, less accurate than RTMPose. License: AGPL-3.0; commercial use requires Ultralytics Enterprise License.
PoseNet (Google, TFJS) — the browser default before MediaPipe. Retired.
MMPose (OpenMMLab) — the reference training/inference library for top-down pose.
rtmlib — lightweight wrapper around RTMPose without the MMPose dependency chain.

Graveyard¶

OpenPose as a production default — retired ~2020. Licensing (non-commercial for original OpenPose weights) plus slower than modern alternatives.
PoseNet in the browser — retired when MediaPipe Pose launched. Less accurate.
Stacked Hourglass (2016) — historically important, no longer competitive.

Last reviewed¶

2026-04-22.