Skip to content

Edge Deployment

Running CV models on devices that aren't in a data center. Four broad tiers: accelerated embedded (Jetson), general embedded (Pi), dedicated TPU (Coral), and mobile (phones).

[!WARNING] Hardware landscape changes quickly. The specific board recommendations below reflect today's practical defaults (as of April 2026), not universal truths. New SKUs ship every quarter; prices and availability shift. Power budgets and model-class fit are more stable guidance than specific product picks. Verify current pricing and availability before committing to a BOM.

Edge deployment is where most CV products actually live. The constraints are different from server-side — you care about power, thermals, cost, and the specific accelerator more than raw FLOPs.

Tier Pick When to use
Premium embedded (GPU-ish) Jetson Orin Nano Super (2024, $249) Strong ML performance in a 15W envelope
Budget embedded Raspberry Pi 5 (8GB) General-purpose, larger ecosystem, OK CV performance
Dedicated TPU Google Coral USB/Dev Board Efficient INT8 inference; but check current availability
Mobile flagship iPhone + CoreML Best-in-class on-device ML in the consumer device category

Jetson — the embedded default

NVIDIA's embedded GPU line. The Orin Nano Super (launched Dec 2024) is the current sweet-spot board: 67 TOPS INT8, 8 GB RAM, 15W, $249. Runs TensorRT natively. Can execute most modern CV models including smaller VLMs.

Options in the Jetson family (roughly increasing power): - Jetson Nano (original, 2019) — 472 GFLOPS FP16, 4 GB RAM. End-of-life; avoid for new work. - Jetson Orin Nano (2023) — 40 TOPS INT8, 8 GB. Decent. - Jetson Orin Nano Super (2024) — 67 TOPS INT8 via firmware boost, 8 GB, $249. The current default for consumer / maker edge. - Jetson Orin NX — 100 TOPS, 8 or 16 GB. More expensive. - Jetson AGX Orin — 275 TOPS, 32/64 GB. Premium. - Jetson Thor (2025–2026) — ~2,070 FP4 TFLOPS (petaFLOP-class at the edge), ~7.5× Orin generation. Targeted at robotics / autonomous / industrial; expensive; overkill for most CV products. Verify current availability and pricing before committing.

Pick Jetson when: - You're training or evaluating on an NVIDIA cluster and want the same model to run at the edge with TensorRT optimization. - You need to run models larger than classical CV (small LLM, SAM, depth estimation) on the edge. - Power budget is 10–25W.

Raspberry Pi — the budget default

Not accelerated, just a decent ARM CPU. Pi 5 (8GB) is good enough for classical CV (Haar, HOG), small CNNs (YOLOv8n INT8 at ~10 FPS), and anything where the pipeline doesn't saturate the CPU.

Pick Pi when: - Cost is critical ($80). - Power is critical (~5W). - The CV model is light or legacy. - You don't need hardware ML acceleration.

Add a Hailo M.2 accelerator to turn a Pi 5 into a credible edge AI platform. Two variants sold as Pi 5 kits: - Hailo-8L (standard AI Kit) — 13 TOPS, cheaper. - Hailo-8 (AI+ Kit) — 26 TOPS, premium.

Pi 5 + Hailo-8 is competitive with Jetson Orin Nano at similar cost; Pi 5 + Hailo-8L is cheaper but noticeably slower.

Google Coral — the dedicated TPU

Coral USB Accelerator ($60) + Coral Dev Board. 4 TOPS INT8 on a tiny module. Runs TFLite-quantized models only.

Status: uncertain in 2026. Google announced no new Coral hardware since 2022, though the current products still ship. Treat Coral as a "mature, possibly slowly fading" option — fine for existing deployments, shaky for new commitments.

Mobile — CoreML (iOS) and NNAPI / QNN (Android)

Phones are the highest-shipping-volume edge devices. Deploy CV models as part of an app:

  • iOS → export to CoreML. Apple Neural Engine runs INT8/INT16 quantized models at very high efficiency.
  • Android → export to TFLite or ONNX. TFLite runs on NNAPI (vendor-specific) or GPU delegate. Qualcomm QNN for Snapdragon-specific performance.

iPhone's Neural Engine is the strongest single-device mobile ML platform in 2026. If your product ships on iOS, CoreML is not optional.

When to pick something else

  • Multi-camera office/retail → an Intel NUC + small dGPU outperforms any embedded board at ~4× the price and 10× the compute. Different form factor.
  • Industrial / outdoor → look at ruggedized boards (NVIDIA IGX Orin, specific industrial PC vendors). Jetson in an enclosure works but isn't ruggedized.
  • Low-power, always-on → sub-watt vision SoC (e.g., Sony IMX500 sensor with in-sensor ML, Himax WE-I Plus). Specialized, not plug-and-play.
  • Existing browser-based product → you're not really edge; you're web. Different section.

The three questions to narrow

  1. Power budget? < 5W → Pi. 5–15W → Jetson Orin Nano Super. 15W+ → Orin NX / AGX.
  2. Model family? Classical CV / small CNN → Pi. Modern CNN / small transformer → Jetson Orin Nano Super. Large (SAM, VLM) → Orin NX / AGX or NUC+GPU.
  3. Ship volume? Low (< 100 devices) → developer-friendly (Jetson, Pi). High (> 10K devices) → custom board design with specific SoC.

The Dump

Embedded boards

Accelerators

Mobile / consumer

Industrial / automotive

Cameras with built-in compute

Software stacks

  • NVIDIA JetPack — Jetson's SDK (CUDA, TensorRT, DeepStream, etc.).
  • NVIDIA DeepStream — video pipeline + model serving for Jetson/servers.
  • Picamera2 — Raspberry Pi camera API.
  • libcamera — cross-embedded camera library.
  • GStreamer — media pipelines; the glue on most Linux embedded CV stacks.

Graveyard

  • Jetson Nano (original 2019) — end-of-life by NVIDIA. Avoid for new projects.
  • Movidius NCS / NCS2 — discontinued.
  • Intel Neural Compute Stick 2 — discontinued.
  • Coral TPU as a clearly-growing platform — uncertain status.
  • Qualcomm Robotics RB3 — superseded by RB5, and attention moved to mobile SoCs.

Last reviewed

2026-04-22.