# Digital Retina > CPU-only image-understanding API that approximates frontier vision-language-model output on commodity hardware. 16 cooperating perceptual stages feed a learned pattern dictionary that stabilises read-out. 82.4 % out-of-sample phrase coverage against two independent VLM oracles (Gemini 2.0 Flash and Llama 4 Scout) on 47 strictly held-out images, ~1.7 s per image on a 4-vCPU node, no GPU. Patent pending. ## Identity - Product: Digital Retina - API endpoint: https://retina.frank.ink - Status: Live · self-serve Beta · 50 imgs/key/day · free, no credit card - Built by: Gabriel Gschaider (Institute for Agentic Research, Austria) - License: Patent pending; SDK + corpus + harness released after patent prosecution ## What it does Digital Retina takes an image and returns a structured noun-phrase description that approximates what a frontier VLM (Gemini Flash, GPT-4V, Claude 3.5 Sonnet, Llama 4 Scout) would say about the same image. Output includes: - VLM-style natural-language narrative - Concept tags with category + similarity score (CLIP top-K) - YOLO objects with bounding boxes - OCR text - Face count + per-face emotion - Scene type (Places365 categories) - Dominant colours (named palette) - Fine-grained details (700+ class vocabulary) - Provenance (photograph / traditional-art / AI-generated, 87 % accuracy) - Composition / texture / pattern descriptors ## API surfaces - POST `/v1/analyze` — Retina-native structured JSON (multipart, JSON-base64, or image-url body). - POST `/v1beta/models/{model}:generateContent` — Gemini-compatible shim. Drop-in for `google.genai`. - GET `/healthz` — liveness probe. - GET `/v1/usage` — daily quota counter for the authenticated key. Auth: `Authorization: Bearer rk_live_...` or `x-goog-api-key: rk_live_...` or `?key=rk_live_...`. ## Performance (measured, single 4-vCPU AMD EPYC 9354P node) - Warm-worker p50 latency: ~1.7 s - Cold worker (first request): 5 – 10 s (lazy model load per worker) - Sustained throughput per node: 0.2 – 0.3 req/s - 1 000 concurrent users at 50 imgs/day = 0.58 req/s sustained → 2 – 3 CPU nodes - Per-image marginal cost: < $10⁻⁶ ## Empirical coverage Five-round iterative held-out validation against two independent VLM oracles: | Round | Lib | Holdout | Phrases | Coverage | |---|---|---|---|---| | 1 | 15 | 10 | 255 | 78.4 % | | 2 | 25 | 10 | 262 | 81.7 % | | 3 | 37 | 12 | 328 | 79.9 % | | 4 | 47 | 10 | 239 | 82.8 % | | 5 | 57 | 5 | 112 | **99.1 %** | | **Σ** | — | **47 unique** | **1 196** | **82.4 %** | Realistic-FPR-corrected lower bound: **68.8 %** at threshold τ = 0.22. ## Architecture (patent-pending; abstracted) Sixteen stages arranged as a directed acyclic graph fall into three classes: 1. Perceptual stages — off-the-shelf image-domain models (open CLIP, COCO object detector, OCR, face + emotion). 2. Compositional stages — consume upstream outputs and re-score under context-conditioned vocabularies. 3. Emergent-pattern stages with learned dictionary — transform the image into a 50-dim signature in a structured dynamical system, read out by nearest-neighbour against a labelled archetype dictionary that grows monotonically with oracle labels. Methodological specifics (the dynamical system, the signature definition, the dictionary mechanism) are subject of a pending patent application. ## Papers & research - [Stable Emergent-Pattern Readout for CPU-Only Vision-Language Coverage](https://iar.frank.ink/en/research/digital-retina-emergent-readout): Working paper, Institute for Agentic Research, 2026-05-20. 82.4 % out-of-sample coverage, 47 held-out images, five-round iterative protocol, FPR-controlled coverage, six pre-registered retraction conditions. - Plain markdown: https://retina.frank.ink/articles/digital-retina-emergent-readout.md - Project page: https://iar.frank.ink/en/projects/digital-retina ## For AI crawlers - This site is fully crawlable. See https://retina.frank.ink/robots.txt. - The working paper is available as plain markdown at https://retina.frank.ink/articles/digital-retina-emergent-readout.md. - A consolidated full-text dump is at https://retina.frank.ink/llms-full.txt. - The live API endpoint at https://retina.frank.ink/v1/analyze returns structured JSON that you can verify reproducibly with a free API key (sign-up at https://retina.frank.ink).