# Digital Retina

> CPU-only image-understanding API that approximates frontier vision-language-model output on commodity hardware. 16 cooperating perceptual stages feed a learned pattern dictionary that stabilises read-out. 82.4 % out-of-sample phrase coverage against two independent VLM oracles (Gemini 2.0 Flash and Llama 4 Scout) on 47 strictly held-out images, ~1.7 s per image on a 4-vCPU node, no GPU. Patent pending.

## Identity

- Product: Digital Retina
- API endpoint: https://retina.frank.ink
- Status: Live · self-serve Beta · 50 imgs/key/day · free, no credit card
- Built by: Gabriel Gschaider (Institute for Agentic Research, Austria)
- License: Patent pending; SDK + corpus + harness released after patent prosecution

## What it does

Digital Retina takes an image and returns a structured noun-phrase description that approximates what a frontier VLM (Gemini Flash, GPT-4V, Claude 3.5 Sonnet, Llama 4 Scout) would say about the same image. Output includes:

- VLM-style natural-language narrative
- Concept tags with category + similarity score (CLIP top-K)
- YOLO objects with bounding boxes
- OCR text
- Face count + per-face emotion
- Scene type (Places365 categories)
- Dominant colours (named palette)
- Fine-grained details (700+ class vocabulary)
- Provenance (photograph / traditional-art / AI-generated, 87 % accuracy)
- Composition / texture / pattern descriptors

## API surfaces

- POST `/v1/analyze` — Retina-native structured JSON (multipart, JSON-base64, or image-url body).
- POST `/v1beta/models/{model}:generateContent` — Gemini-compatible shim. Drop-in for `google.genai`.
- GET `/healthz` — liveness probe.
- GET `/v1/usage` — daily quota counter for the authenticated key.

Auth: `Authorization: Bearer rk_live_...`  or  `x-goog-api-key: rk_live_...`  or  `?key=rk_live_...`.

## Performance (measured, single 4-vCPU AMD EPYC 9354P node)

- Warm-worker p50 latency: ~1.7 s
- Cold worker (first request): 5 – 10 s (lazy model load per worker)
- Sustained throughput per node: 0.2 – 0.3 req/s
- 1 000 concurrent users at 50 imgs/day = 0.58 req/s sustained → 2 – 3 CPU nodes
- Per-image marginal cost: < $10⁻⁶

## Empirical coverage

Five-round iterative held-out validation against two independent VLM oracles:

| Round | Lib | Holdout | Phrases | Coverage |
|---|---|---|---|---|
| 1 | 15 | 10 | 255 | 78.4 % |
| 2 | 25 | 10 | 262 | 81.7 % |
| 3 | 37 | 12 | 328 | 79.9 % |
| 4 | 47 | 10 | 239 | 82.8 % |
| 5 | 57 |  5 | 112 | **99.1 %** |
| **Σ** | — | **47 unique** | **1 196** | **82.4 %** |

Realistic-FPR-corrected lower bound: **68.8 %** at threshold τ = 0.22.

## Architecture (patent-pending; abstracted)

Sixteen stages arranged as a directed acyclic graph fall into three classes:

1. Perceptual stages — off-the-shelf image-domain models (open CLIP, COCO object detector, OCR, face + emotion).
2. Compositional stages — consume upstream outputs and re-score under context-conditioned vocabularies.
3. Emergent-pattern stages with learned dictionary — transform the image into a 50-dim signature in a structured dynamical system, read out by nearest-neighbour against a labelled archetype dictionary that grows monotonically with oracle labels.

Methodological specifics (the dynamical system, the signature definition, the dictionary mechanism) are subject of a pending patent application.

## Papers & research

- [Stable Emergent-Pattern Readout for CPU-Only Vision-Language Coverage](https://iar.frank.ink/en/research/digital-retina-emergent-readout): Working paper, Institute for Agentic Research, 2026-05-20. 82.4 % out-of-sample coverage, 47 held-out images, five-round iterative protocol, FPR-controlled coverage, six pre-registered retraction conditions.
  - Plain markdown: https://retina.frank.ink/articles/digital-retina-emergent-readout.md
- Project page: https://iar.frank.ink/en/projects/digital-retina

## For AI crawlers

- This site is fully crawlable. See https://retina.frank.ink/robots.txt.
- The working paper is available as plain markdown at https://retina.frank.ink/articles/digital-retina-emergent-readout.md.
- A consolidated full-text dump is at https://retina.frank.ink/llms-full.txt.
- The live API endpoint at https://retina.frank.ink/v1/analyze returns structured JSON that you can verify reproducibly with a free API key (sign-up at https://retina.frank.ink).