Evaluation Criteria¶

A resource should not be Recommended only because it is popular, or because it has a strong benchmark number. "Recommended" in this playbook means a senior CV engineer shipping a product would pick it first. This page makes that judgment explicit.

Primary criteria (heavy weight)¶

These matter most. A pick without strong marks here doesn't go in the top tier.

Practical adoption in 2026 production systems. Is this actually running in real products, or is it a research codebase with a strong paper?
Maintenance status. Has the project shipped a release in the last 6 months? Are open issues being triaged? Abandoned projects graveyard, however good they were once.
Deployment ease. Clear pip install / one-line Docker / clear ONNX export path. If getting it working is a two-week odyssey, it's not a recommended pick.
Documentation quality. Real tutorials and examples, not just a README with a PyPI badge.
Ecosystem strength. Integrations, community support, how many other tools assume its existence.

Secondary criteria (tie-break weight)¶

These break ties between otherwise comparable picks.

License clarity. MIT / Apache 2 preferred. Restricted licenses (non-commercial, SSPL, custom) noted explicitly. Some picks might still be listed but with a license warning.
Hardware friendliness. Does it run on CPU + GPU + edge? Or GPU-only?
Export paths. ONNX, TensorRT, CoreML, TFLite availability.
Training support. Can the user fine-tune this easily if they need to?
Benchmark relevance. Strong on realistic benchmarks, not just on a single saturated academic one.
Reproducibility. If the paper claims X accuracy, does the public code reproduce X?

Recommendation tiers¶

Recommended (top 3 per section)¶

A pick that most senior engineers should reach for first given the tier's constraint. Each Recommended entry must include: - A why — 1-3 sentences. - A when to avoid — at least one concrete failure mode or unfit scenario. - An install / try command or link.

Alternatives (the Dump)¶

Everything else that's worth knowing. The Dump isn't curated — contributions welcome. Each entry has a one-line honest verdict. No marketing blurbs.

Graveyard¶

Historically important, retired. Format: - Name (year retired). - Why it was the pick before. - Why it's retired.

Anti-criteria — things that should NOT influence tiering¶

Hype / social media visibility. A project with 30K stars that shipped last in 2022 is still dead.
Single-benchmark wins. A model that beats COCO by 0.5 AP at 3× the latency is research, not a production pick.
Author authority alone. Even a famous lab's code gets unmaintained. Check the commits.
"It's what we used before." Sunk cost is not a vote.

How to disagree¶

If you think the rubric produced the wrong pick: 1. Open an issue with your counter-argument. 2. Name which criterion you think was misjudged. 3. Provide evidence.

Disagreements are welcome. Hand-wavy "I think X is better" is not.

Last reviewed¶

2026-04-22.