Edge Deployment¶
Running CV models on devices that aren't in a data center. Four broad tiers: accelerated embedded (Jetson), general embedded (Pi), dedicated TPU (Coral), and mobile (phones).
[!WARNING] Hardware landscape changes quickly. The specific board recommendations below reflect today's practical defaults (as of April 2026), not universal truths. New SKUs ship every quarter; prices and availability shift. Power budgets and model-class fit are more stable guidance than specific product picks. Verify current pricing and availability before committing to a BOM.
Edge deployment is where most CV products actually live. The constraints are different from server-side — you care about power, thermals, cost, and the specific accelerator more than raw FLOPs.
Recommended picks¶
| Tier | Pick | When to use |
|---|---|---|
| Premium embedded (GPU-ish) | Jetson Orin Nano Super (2024, $249) | Strong ML performance in a 15W envelope |
| Budget embedded | Raspberry Pi 5 (8GB) | General-purpose, larger ecosystem, OK CV performance |
| Dedicated TPU | Google Coral USB/Dev Board | Efficient INT8 inference; but check current availability |
| Mobile flagship | iPhone + CoreML | Best-in-class on-device ML in the consumer device category |
Jetson — the embedded default¶
NVIDIA's embedded GPU line. The Orin Nano Super (launched Dec 2024) is the current sweet-spot board: 67 TOPS INT8, 8 GB RAM, 15W, $249. Runs TensorRT natively. Can execute most modern CV models including smaller VLMs.
Options in the Jetson family (roughly increasing power): - Jetson Nano (original, 2019) — 472 GFLOPS FP16, 4 GB RAM. End-of-life; avoid for new work. - Jetson Orin Nano (2023) — 40 TOPS INT8, 8 GB. Decent. - Jetson Orin Nano Super (2024) — 67 TOPS INT8 via firmware boost, 8 GB, $249. The current default for consumer / maker edge. - Jetson Orin NX — 100 TOPS, 8 or 16 GB. More expensive. - Jetson AGX Orin — 275 TOPS, 32/64 GB. Premium. - Jetson Thor (2025–2026) — ~2,070 FP4 TFLOPS (petaFLOP-class at the edge), ~7.5× Orin generation. Targeted at robotics / autonomous / industrial; expensive; overkill for most CV products. Verify current availability and pricing before committing.
Pick Jetson when: - You're training or evaluating on an NVIDIA cluster and want the same model to run at the edge with TensorRT optimization. - You need to run models larger than classical CV (small LLM, SAM, depth estimation) on the edge. - Power budget is 10–25W.
Raspberry Pi — the budget default¶
Not accelerated, just a decent ARM CPU. Pi 5 (8GB) is good enough for classical CV (Haar, HOG), small CNNs (YOLOv8n INT8 at ~10 FPS), and anything where the pipeline doesn't saturate the CPU.
Pick Pi when: - Cost is critical ($80). - Power is critical (~5W). - The CV model is light or legacy. - You don't need hardware ML acceleration.
Add a Hailo M.2 accelerator to turn a Pi 5 into a credible edge AI platform. Two variants sold as Pi 5 kits: - Hailo-8L (standard AI Kit) — 13 TOPS, cheaper. - Hailo-8 (AI+ Kit) — 26 TOPS, premium.
Pi 5 + Hailo-8 is competitive with Jetson Orin Nano at similar cost; Pi 5 + Hailo-8L is cheaper but noticeably slower.
Google Coral — the dedicated TPU¶
Coral USB Accelerator ($60) + Coral Dev Board. 4 TOPS INT8 on a tiny module. Runs TFLite-quantized models only.
Status: uncertain in 2026. Google announced no new Coral hardware since 2022, though the current products still ship. Treat Coral as a "mature, possibly slowly fading" option — fine for existing deployments, shaky for new commitments.
Mobile — CoreML (iOS) and NNAPI / QNN (Android)¶
Phones are the highest-shipping-volume edge devices. Deploy CV models as part of an app:
- iOS → export to CoreML. Apple Neural Engine runs INT8/INT16 quantized models at very high efficiency.
- Android → export to TFLite or ONNX. TFLite runs on NNAPI (vendor-specific) or GPU delegate. Qualcomm QNN for Snapdragon-specific performance.
iPhone's Neural Engine is the strongest single-device mobile ML platform in 2026. If your product ships on iOS, CoreML is not optional.
When to pick something else¶
- Multi-camera office/retail → an Intel NUC + small dGPU outperforms any embedded board at ~4× the price and 10× the compute. Different form factor.
- Industrial / outdoor → look at ruggedized boards (NVIDIA IGX Orin, specific industrial PC vendors). Jetson in an enclosure works but isn't ruggedized.
- Low-power, always-on → sub-watt vision SoC (e.g., Sony IMX500 sensor with in-sensor ML, Himax WE-I Plus). Specialized, not plug-and-play.
- Existing browser-based product → you're not really edge; you're web. Different section.
The three questions to narrow¶
- Power budget? < 5W → Pi. 5–15W → Jetson Orin Nano Super. 15W+ → Orin NX / AGX.
- Model family? Classical CV / small CNN → Pi. Modern CNN / small transformer → Jetson Orin Nano Super. Large (SAM, VLM) → Orin NX / AGX or NUC+GPU.
- Ship volume? Low (< 100 devices) → developer-friendly (Jetson, Pi). High (> 10K devices) → custom board design with specific SoC.
The Dump¶
Embedded boards¶
- NVIDIA Jetson family — Nano / Orin Nano / Orin Nano Super / Orin NX / AGX Orin.
- Raspberry Pi 4 / 5 — general-purpose ARM. Pi 5 8GB is the usable baseline.
- Orange Pi / Rock Pi / Banana Pi — Rockchip-based Pi alternatives. Some (RK3588) have decent NPUs.
- ASUS Tinker Board — Rockchip-based. Less community.
- BeagleBone AI-64 — TI-based. Niche.
- NVIDIA IGX Orin — industrial/medical hardened Jetson variant.
- NVIDIA Thor (announced) — next-gen auto/robotics.
Accelerators¶
- Hailo-8 / Hailo-10 M.2 — the winning Pi accelerator story. 26 / 40 TOPS.
- Google Coral (USB + board) — dedicated TPU. Aging.
- Intel Neural Compute Stick 2 — USB accelerator. Discontinued; Intel shifted focus to iGPU.
- Kneron KL720 / KL630 — dedicated SoC.
- Gyrfalcon Lightspeeur — niche.
- Axelera Metis — newer dedicated ML accelerator, interesting.
Mobile / consumer¶
- iPhone + CoreML / Neural Engine — top of the consumer mobile class.
- Android flagships + QNN (Qualcomm NPU) — strong in Snapdragon 8 Gen 3+.
- Google Pixel + Tensor chip — Google's own mobile chip with TPU.
Industrial / automotive¶
- NVIDIA DRIVE Orin / Thor — automotive.
- Qualcomm Ride / Flex — automotive.
- Ambarella CV-family — dashcam / auto / robotics vision SoCs.
- Intel IGX Orin / NVIDIA IGX Orin — medical-grade.
Cameras with built-in compute¶
- Luxonis OAK-D / OAK-1 — Intel Myriad X inside. Depth + detection in one device.
- ReoLink / Reolink Duo smart cameras — consumer CV cameras with basic on-device ML.
- Sony IMX500 — sensor with ML inside. Niche but interesting.
Software stacks¶
- NVIDIA JetPack — Jetson's SDK (CUDA, TensorRT, DeepStream, etc.).
- NVIDIA DeepStream — video pipeline + model serving for Jetson/servers.
- Picamera2 — Raspberry Pi camera API.
- libcamera — cross-embedded camera library.
- GStreamer — media pipelines; the glue on most Linux embedded CV stacks.
Graveyard¶
- Jetson Nano (original 2019) — end-of-life by NVIDIA. Avoid for new projects.
- Movidius NCS / NCS2 — discontinued.
- Intel Neural Compute Stick 2 — discontinued.
- Coral TPU as a clearly-growing platform — uncertain status.
- Qualcomm Robotics RB3 — superseded by RB5, and attention moved to mobile SoCs.
Last reviewed¶
2026-04-22.