# Morphēa Milestone Plan

This document expands the implementation plan beyond the first base-form
prototype. The long-term goal is a local, self-improving, semantic-first
raster-to-SVG research system.

## North Star

Morphēa should produce SVGs that feel intentionally constructed by a human:
simple shapes stay simple, strokes stay editable, cut-out lines stay clean,
and repeated structures become coherent groups. Visual similarity matters, but
editable semantic structure is the primary quality target.

The system should evolve from deterministic geometry first, then add local AI,
then add training and self-learning loops.

The active step-by-step quality track for primary forms and primitive
compositions lives in [primitive-quality-roadmap.md](primitive-quality-roadmap.md).
The forward-looking high-bar track for real images lives in
[real-image-promotion-v10-roadmap.md](real-image-promotion-v10-roadmap.md).

## M0: Primitive Anchor Prototype

Status: implemented.

Purpose: prove that the repo can detect and process the first simple forms
end-to-end.

Implemented capabilities:

- Binary-mask connected components.
- Circle and dot anchors.
- Circle rings as `stroke_circle`.
- Smooth line strokes using principal-axis fitting.
- Perspective quads as editable polygons.
- Simple white cut-out gaps as overlay strokes.
- Near-flat RGB grouping via `--color-tolerance`.
- Plain SVG export and JSON recognition manifest.
- CLI path: `morphea vectorize input.png -o output.svg`.

Acceptance evidence:

- `PYTHONPATH=src python3 -m unittest discover -s tests`
- Current suite covers masks, PNG fixtures, SVG export, CLI, manifests,
  cut-outs, rings, strokes, quads, and grid grouping.

## M1: Real-Image Preprocessing and Runtime Control

Status: implemented for the current real-image runtime baseline.

Purpose: make real generated images tractable before adding heavier ML.

Why this matters: real images contain antialiasing, palette drift, transparent
backgrounds, soft texture, and large components. The current pure-Python mask
pipeline can stall on full-size 1254px images.

Deliverables:

- Image normalization stage:
  - alpha flattening or transparent-region ignore policy
  - palette quantization with configurable max colors
  - near-white and near-background grouping
  - optional downsample-for-analysis with final-coordinate scaling
- Region-of-interest splitting:
  - process large color masks as bounded components
  - skip or defer components above a configured complexity threshold
  - emit warnings instead of hanging
- Runtime limits:
  - `--max-size`, `--max-component-area`, `--timeout-seconds`
  - manifest entries for skipped/deferred components
- Faster component implementation:
  - replace hot loops with array-backed operations where useful
  - preserve the current `BinaryMask` API as a testable interface

Acceptance criteria:

- `terminaro-tweaked.png` completes a CLI run without hanging.
- Manifest reports major palette groups and skipped/deferred regions clearly.
- Large background/transparent areas do not become vector candidates.
- Tests include at least one antialiased and one transparent-background fixture.
- `morphea curated-check docs/real-images/suite.json --run` completes the
  current curated real-image suite with no failed expectations under an
  external timeout.

Implemented so far:

- `--max-size` analysis resize with anchor coordinate scaling.
- `--max-colors` palette quantization.
- `--max-component-area` color-mask/component deferral diagnostics.
- `--timeout-seconds` internal partial-run cutoff.
- manifest `diagnostics` entries.
- bounded `terminaro-tweaked.png` run completes under an external timeout.
- transparent-background regression fixture.
- transparent pixels are ignored with diagnostics, and partial-alpha pixels are
  flattened onto the inferred or configured background before color grouping.
- explicit `background` preprocessing is available through vectorize/profile
  configs, sweep run configs, curated recommended configs, and flat-color
  segment configs.
- large color masks emit `color_mask_split_for_components` and are split into
  connected components before oversized components are deferred.
- image component scanning checks `timeout_seconds` during traversal and avoids
  retaining pixel sets for oversized deferred components.
- `connected_components` and the bounded image component scanner use a
  bytearray-backed occupancy grid during BFS while preserving the public
  `BinaryMask` / `MaskComponent` API.
- component BFS scans neighbors inline instead of allocating per-pixel
  neighbor tuples in the hot loop.
- flat-color mask extraction scans image buffers sequentially via Pillow pixel
  data instead of calling `getpixel` for every pixel.
- flat-color mask extraction stores linear pixel indexes during palette
  grouping and materializes `(x, y)` coordinates only for retained masks.
- exact-color mask extraction bypasses palette membership scans, while
  tolerant grouping caches repeated source colors to avoid repeated nearest
  palette searches.
- mask components can carry bounds hints from component scanning, and
  `row_spans` computes per-row extents in one pass instead of rescanning the
  component once per row.
- `morphea profile input.png -o profile.json` records bounded vectorize timings,
  anchor counts, diagnostics, diagnostic stage counts, and min/mean/max elapsed
  summaries for repeated runs.
- `morphea profile-curated suite.json -o profile.json` profiles every available
  curated real-image case with its recommended config, keeps missing sources
  visible, and reports the slowest case plus per-case min/mean/max timings.
- component BFS in both generic masks and bounded image scanning uses direct
  8-neighbor index checks instead of nested per-pixel neighbor range loops.
- boundary-pixel detection and centroid calculation avoid repeated generator
  passes and per-pixel neighbor tuple allocation.
- raster edge metrics use compact integer luma buffers instead of float lists
  during preview/refinement comparisons.
- mask components cache derived centroid, boundary-pixel, and row-span
  geometry so primitive candidate generation does not rescan the same pixels
  for each simple-shape hypothesis.
- connected-component BFS now fills bounds, centroid, and row-span hints during
  the scan, including bounded image scans, so downstream primitive detection
  avoids repeated component passes.
- connected-component BFS now also fills boundary-pixel hints, including
  bounded image scans, so circle/ring/stroke candidate generation can reuse
  scanner-derived boundaries instead of rescanning retained components.
- freeform cut-out gap detection uses a local interior-gap component scanner
  for temporary background gaps, avoiding the heavier generic
  `connected_components` hint path during real-image profiling.
- principal-axis fitting computes projection bounds in one streaming pass
  instead of allocating per-component projection lists.
- temporary interior-gap scans avoid redundant seed-list allocation, and
  enclosed-gap bound checks use component bounds instead of rescanning every
  gap pixel.
- generic and bounded component scanners update bounds and row spans with
  direct comparisons instead of per-pixel `min()`/`max()` calls.
- the temporary interior-gap scanner inlines 8-neighbor enqueueing so
  freeform cut-out detection avoids tens of thousands of per-pixel helper
  calls on curated real images.

Remaining:

- none for the current real-image runtime baseline; continue profile-guided
  hot-loop work from `profile-curated` reports as larger curated image
  families expose new bottlenecks.

## M2: Primitive Anchor Detection V2

Status: implemented for the current primitive-anchor baseline.

Purpose: make simple forms robust enough to be trusted before generic fitting.

Deliverables:

- Circle/ring detection:
  - robust center/radius estimation from boundary samples
  - roundness regularization
  - stroke-width stability metric
- Stroke detection:
  - polyline centerline extraction for curved strokes
  - width estimation along the path
  - cap/join classification
  - cut-out stroke detection for white/near-background gaps
- Quad and grid detection:
  - perspective tile grouping
  - row/column grouping
  - grid consistency metrics
  - vanishing-line diagnostics
- Candidate reservation:
  - anchors reserve pixels/regions before generic fitting
  - later fitting must not fragment reserved simple shapes

Acceptance criteria:

- Outer rings become `stroke_circle` or arc strokes.
- Dots become true circles.
- Straight and gently curved strokes export as stroked paths.
- Perspective table tiles export as quads and are grouped as a grid.
- Simple shape candidates beat noisier path candidates unless fidelity breaks
  materially.

Implemented so far:

- circle/ring metrics for roundness and stroke width.
- filled circle and ring candidates regularize center/radius from boundary
  samples with a deterministic algebraic fit and record
  `circle_fit_residual_error`.
- stroke width, smoothness, cut-out error metrics.
- quad edge/corner metrics and perspective grid consistency metric.
- quad detection adds numeric subtype markers for trapezoids and
  parallelograms while preserving `quad` as the editable primitive kind.
- stroke payloads preserve `cap_style` and `join_style`.
- straight high-coverage stroke components classify as `butt` caps.
- stroke-polyline candidates keep straight strokes as two-point editable
  centerlines, but add a conservative control point when the component visibly
  deviates from the principal-axis line.
- curved stroke and arc candidates record local `width_samples` at centerline
  support points instead of collapsing every editable stroke to one global
  width.
- sparse curved stroke components can classify as editable `arc` anchors with a
  three-point centerline when their bow is large enough to beat a straight
  stroke interpretation.
- diagonal and freeform thin interior gaps can be detected as editable cut-out
  overlay strokes when they are enclosed by the host shape.
- anti-aliased neutral UI rings can be recovered from a composite grayscale
  mask as `circle`/`stroke_circle` anchors when individual gray palette
  fragments are too small to pass per-color component thresholds.
- compact filled axis-aligned rectangles classify as `rect` and stay in the
  `filled_primitives` scene layer.
- simple rounded-rectangle silhouettes classify as `rounded_rect`; descriptive
  metrics such as `corner_radius` are excluded from candidate error scoring.
- anchors with a shared `parallel_group_id` are exposed as
  `parallel_stroke_group` scene groups with `parallel_spacing_error`.
- perspective-grid scene groups expose row/column counts and
  `vanishing_line_diagnostics` derived from quad edge pairs.
- reserved simple-shape anchors are exposed as a
  `primitive_anchor_reservation` scene group with reserved bounds area.
- segment proposals mark simple parametric anchors as reserved with a
  `simple_shape_anchor` reason and reserved bounds before later fitting stages
  can fragment them.

## M3: Scene Graph and Layer Semantics

Status: implemented for the current scene-graph baseline.

Purpose: move from a list of anchors to a coherent editable vector scene.

Deliverables:

- Canonical scene graph:
  - layers
  - groups
  - reserved anchors
  - source masks
  - confidence and provenance
- Stacking and cut-out semantics:
  - v1 white overlay strokes remain supported
  - add mask/negative-stroke option for cases where overlay strokes are wrong
- Shape merging:
  - merge same-color fragments when they form one semantic object
  - avoid Vectorizer.ai-style layer explosion when a simpler object is better
- Export policy:
  - plain editable SVG by default
  - optional debug SVG with source ids and confidence labels

Acceptance criteria:

- Manifest can explain why each output element exists.
- Same-color fragmentation is penalized and visible in reports.
- Cut-outs remain editable and do not silently become unstructured holes.

Implemented so far:

- anchor manifests include stable ids, layer names, confidence, reservation
  bounds, provenance, and export policy metadata.
- anchor manifests include stable `source_mask` proxies derived from reserved
  bounds so run artifacts and review workflows can refer back to mask sources.
- scene manifests include a top-level `layers` section with anchor indexes and
  counts per semantic layer.
- simple anchors are marked as reserved by `simple_shape_anchor`.
- cut-out strokes are assigned to a `cutout_overlays` layer.
- cut-out export policy records the default `overlay_stroke` strategy and
  whether the anchor is eligible for negative-mask SVG export.
- promotion SVG exports wrap each emitted shape in a stable metadata node with
  anchor id, anchor index, promotion state, source promotion region ids, and
  applied review-decision metadata when present.
- curated promotion sidecars use the same stable SVG metadata nodes for
  `promoted.svg` and `fallback.svg`, while preserving the configured cut-out
  export strategy.
- `morphea promotion-export --markdown promotion-export.md` writes a
  scan-friendly export report with promoted/fallback/rejected/deferred anchor
  and region counts plus missing-from-promoted rows with region reasons.
- `morphea vectorize --cutout-export negative_mask` writes editable cut-out
  strokes into an SVG mask instead of painting visible white overlay strokes;
  run directories apply the same export option to `output.svg`.
- vectorize config files can set `cutout_export`, with an explicit CLI flag
  taking precedence.
- scene metrics expose reserved simple-shape count, reserved bounds area, and
  reserved area ratio so later fitting can be audited against primitive
  reservations.
- scene metrics include `cutout_overlay_count` and
  `negative_mask_candidate_count` so reports can distinguish overlay exports
  from mask-capable cut-out semantics.
- `morphea vectorize --debug-svg` writes an inspectable SVG with anchor ids,
  bounds, and confidence labels.
- vectorize run directories include `debug.svg`.
- reports, eval summaries, and sweep summaries include layer counts.
- scene manifests include `parallel_stroke_group` entries for grouped strokes.
- scene manifests include `same_color_fragment_group` entries that identify
  same-color merge candidates instead of leaving fragmentation as only a scalar
  penalty.
- same-color fragment groups include a structured `merge_plan` with combined
  bounds, per-fragment bounds, bounds fill ratio, and a conservative merge or
  review action.
- same-color merge plans include `auto_merge_allowed` and `decision_reason` so
  automatic merges stay auditable instead of becoming opaque cleanup behavior.
- compact, same-color, axis-aligned `rect` fragments can be conservatively
  merged into one editable `rect` when their combined bounds contain no gap;
  gapped same-color fragments stay separate to preserve cut-out semantics.

## M4: Reports, Metrics, and Experiment Runs

Status: implemented for the current experiment-report baseline.

Purpose: make every iteration measurable.

Deliverables:

- Timestamped run directories:
  - input copy
  - effective config
  - palette and masks
  - anchors
  - final scene JSON
  - SVG
  - rasterized preview
  - metrics
  - HTML/Markdown report
- Metrics:
  - editability score
  - node/shape/parameter counts
  - simple-shape priority bonus
  - fragmentation penalty
  - circle roundness
  - line smoothness
  - stroke width variance
  - parallel spacing
  - quad/grid consistency
  - raster fidelity diagnostics
- Config sweeps:
  - compare preprocessing thresholds
  - compare anchor candidate thresholds
  - rank outputs by semantic-first score

Acceptance criteria:

- A run can be inspected without rerunning the pipeline.
- Reports show where failures originate: palette, segmentation, fitting,
  cleanup, scoring, or export.

Implemented so far:

- timestamped vectorize run directories via `--run-dir`
- input copy
- `output.svg`
- `preview.png`
- `manifest.json`
- `config.json`
- `anchors.json`
- `palette.json`
- `mask-summary.json`
- `report.md`
- `report.html`
- report summaries for anchor types and diagnostics
- report summaries group diagnostics by pipeline stage so failures can be
  attributed to preprocessing, palette, segmentation, fitting, cleanup,
  scoring, export, runtime, or unknown sources.
- report summaries for scene groups, including same-color fragment groups
- report summaries include same-color merge actions and decision reasons when
  present.
- `morphea eval` JSON/Markdown summaries over run directories
- scene-level `metrics` in manifests
- `editability_score`
- `editability_components`, with formula-level score components exposed for
  reports, snapshots, and sweep summaries
- `editability_v10_components`, with review-level scores for shape identity,
  parameter/node economy, stroke stability, smoothness, topology, grouping,
  fragmentation, raster fidelity, provenance, and classifier-prior agreement
- `editability_review`, with accepted/manual-review/rejected decisions derived
  from promotion state, v10 component thresholds, gate-blocked components, and
  explicit regression-delta status
- `curated-check --baseline-snapshot`, which compares editability-review
  component scores against a previous curated snapshot
- `fragmentation_penalty`
- `unstructured_fragmentation_penalty`
- run-directory raster fidelity metrics: `raster_l1_error`,
  `raster_alpha_error`, `raster_edge_error`, and `raster_size_match`
- node, parameter, simple-shape, generic-path, cut-out, color-fragment, and
  unstructured-fragment counts
- aggregate anchor-quality summaries expose mean/max quality error and
  per-metric counts/means/maxima for primitive fit metrics such as circle
  roundness, line smoothness, stroke-width variance, and quad/grid errors.
- anchor manifests include `simple_shape_priority_bonus` and
  `semantic_anchor_score`, making simple-form preference visible in reports,
  reviews, and pseudo-label harvesting.
- scene metrics aggregate anchor scoring into `anchor_scoring_summary`, so
  runs expose total/mean simple-shape priority and semantic score envelopes.
- scene metrics expose `editability_components`, so score changes can be
  inspected as component deltas instead of opaque aggregate movement.
- scene metrics expose `editability_v10_components`, so RIP4 can grow toward
  the v10 contract without turning any single score into a promotion bypass.
- curated promotion gates cap matching v10 components with `gate_blocked`,
  `failed_gates`, and `uncapped_score`, so red topology, shape-class, grouping,
  visual-fidelity, provenance, or fragmentation gates cannot be averaged away.
- curated Markdown reports render failed promotion-gate details with case id,
  gate id, type, severity, and reason, so the main acceptance report explains
  why red/yellow cases are deferred or rejected.
- text-like fallback grouping only treats small glyph-sized cubic paths as
  structured text evidence; larger same-color organic fallback paths remain in
  unstructured fallback and fragmentation debt.
- curated reports derive `editability_review`, so accepted-output status is
  tied to component thresholds instead of raster fidelity alone.
- editability review can record `regression_deltas` and
  `regressed_components`, so accepted outputs can be downgraded when component
  quality regresses against a supplied baseline snapshot.
- checked promotion runs write `editability-review.md`, so accepted-output
  decisions, threshold failures, gate-blocked components, issue tags, and
  regression deltas are reviewable beside the promotion export artifacts.
- checked promotion runs write `review-decision.json`, so reviewers get a
  pending machine-readable decision record with suggested
  accepted/corrected/rejected/deferred outcome, issue tags, failed gates,
  component failures, regression evidence, and `review_artifacts` links back
  to the manifest, promotion-region JSON, promotion review, and editability
  review.
- checked promotion runs write
  `review-templates/{accepted,corrected,rejected,deferred}.json`, so reviewers
  can start from terminal decision templates that preserve the same evidence
  and indicate that reviewer/reason evidence is required for every terminal
  decision, with correction notes and artifacts required for `corrected`.
- suite-level `review-packet.json` / `review-packet.md` carry the same
  `review_requirements`, so reviewers can see terminal and corrected decision
  requirements without opening individual templates first.
- suite-level review-packet cases also carry failed-gate details with gate id,
  type, severity, and reason, and `review-packet.md` renders the same details
  per case so reviewers can see why a case is queued without opening raw JSON.
- queued review-packet cases carry per-decision `review_commands`, so
  reviewers can apply an edited terminal template without hand-assembling the
  manifest, applied-review JSON, or applied-review Markdown paths.
- pending, terminal, and applied promotion review records carry a
  `quality_label_policy` block with `mode: sidecar_only`, making
  `current_quality_label` a manual suite-metadata field rather than something
  accepted/corrected applied reviews update implicitly.
- curated output roots write `review-gallery.html`, a local static review
  gallery with contact sheets, quality labels, promotion/editability decisions,
  failed gates/components, failed-gate reasons, artifact links, terminal
  decision-template links, and per-decision apply commands for queued cases.
- `morphea promotion-apply-review` validates edited promotion review decisions,
  rejects pending records, requires reviewer/reason evidence, requires
  correction notes and corrected artifacts for `corrected` records, writes
  applied JSON/Markdown summaries with preserved `review_artifacts`, and can
  persist `review_decision_applied` into a run manifest. It can also apply
  generated terminal templates with explicit CLI evidence overrides such as
  `--reviewer`, `--reason`, `--correction-notes`, `--corrected-artifact`, and
  `--reviewed-region`. Accepted/corrected reviewed regions are validated
  against gate-ok manifest regions and only those selected anchors become
  review-promoted.
- `morphea harvest --require-applied-review` gates pseudo-label harvesting on
  applied promotion review decisions, so only `accepted` and `corrected`
  applied decisions become candidates while missing, invalid, rejected, and
  deferred decisions remain visible in `rejected_runs`. Promotion-annotated
  manifests also require at least one trusted `promotion_state: promoted`
  anchor, and only promoted anchors from that run can become trainable
  pseudo-labels.
- `morphea harvest-curated --require-applied-review` preserves existing applied
  review decisions across fresh curated reruns, restores them into regenerated
  manifests and curated JSON reports, and harvests only accepted/corrected
  applied decisions.
- `morphea review --accept-applied-reviews` maps harvested applied promotion
  reviews into the existing review/apply-review flow, accepting
  accepted/corrected decisions, rejecting rejected/deferred decisions, and
  preserving issue tags for reviewed-label artifacts.
- `morphea merge-labels` preserves `review` and `review_decision_applied`
  provenance in accepted pseudo-label manifests and carries review item id,
  review reason, applied review decision/case/source path, and issue tags in
  dataset samples while keeping rejected/deferred review items out of trainable
  datasets. The current region-scoped review plan is covered through this path:
  accepted reviewed region anchors become train examples, while deferred
  real-image evidence remains excluded.
- `morphea self-learn` separates retraining from acceptance: model acceptance
  now requires an accepted training comparison gate and, when configured,
  passing curated validation, with reviewed-label issue counts and
  applied-review decision counts in the cycle summary.
- self-learning Markdown reports include reviewed-label issue counts and
  provenance-field coverage for review item ids, review reasons, applied case
  ids, and source review-decision paths.
- training comparisons expose per-label validation accuracy deltas, and those
  label-level deltas feed the best/worst gate summary so primitive-family
  regressions can block acceptance.
- self-learning cycle summaries expose normalized suite-family validation
  across primitive label deltas, curated real-image family summaries, and
  optional Lucide family summaries; configured Lucide validation blocks
  acceptance on failure.
- `morphea self-learn --suite-family-baseline baseline.json` compares current
  suite-family validation with a fixed baseline and blocks acceptance only for
  newly introduced bad family outcomes while reporting carried
  `known_debt`.
- baseline-gedeckte curated/Lucide suite failures bleiben in
  `acceptance_gate.reasons` sichtbar, werden aber nicht mehr als
  `blocking_reasons` behandelt.
- `morphea self-learn --suite-family-baseline-output next-baseline.json` writes
  accepted suite-family validation as a reusable baseline artifact and skips
  writes for rejected cycles.
- suite-family baseline snapshots require reviewer, reason, and changelog
  evidence before writing, so accepted baseline refreshes produce a review
  record and JSONL changelog entry.
- self-learning Markdown keeps baseline snapshots portable while surfacing the
  source cycle report, base dataset, reviewed-label file, and validation
  dataset beside the snapshot status for audit handoff.
- existing suite-family baseline output files are protected unless
  `--suite-family-baseline` points to the same path, preventing accidental
  overwrites of checked-in baseline artifacts.
- checked-in reviewed suite-family baseline at
  `docs/real-images/baselines/current-suite-family-baseline.json`, exercised
  through the real `morphea self-learn --suite-family-baseline` CLI path.
- metrics surfaced in reports, eval summaries, and sweep summaries
- diagnostic stage counts surfaced in reports, eval summaries, and sweep
  summaries for cross-run failure attribution.
- deterministic manifest preview renderer for current primitive types
- `morphea report` can render Markdown or HTML from an existing manifest
- sweep summaries include `semantic_rank` and a top-level `ranking` list using
  semantic-first score ordering before raster error.
- sweep run configs can pass `cutout_export` through to run-directory SVG
  export, so overlay and negative-mask exports can be compared in experiments.
- optional Markdown comparison reports for config sweeps

## M5: Synthetic Dataset Generator

Status: implemented for the current synthetic-data baseline.

Purpose: create reliable labels for training and evaluation.

Scope:

- Flat-color only at first.
- No noise, blur, gradients, or photoreal texture in the first generator.
- Include overlaid shapes and cut-out-like white strokes.

Deliverables:

- Synthetic scene generator for:
  - circles, rings, dots
  - lines, arcs, curved strokes
  - rects, rounded rects
  - quads, trapezoids, parallelograms
  - grid/tile structures
  - simple logo-like compositions
- Ground-truth scene JSON.
- Rasterized PNG fixtures.
- Train/validation/test splits.
- Difficulty tiers.

Acceptance criteria:

- Generator can create thousands of labeled flat-color examples.
- The deterministic pipeline can be benchmarked against known ground truth.
- Failures are reproducible from seed/config.

Implemented so far:

- deterministic `morphea generate`
- labeled PNG + JSON manifest pairs
- `dataset.json` index
- `dataset.json` records aggregate and per-sample anchor-kind counts so
  generated training corpora can be audited without reopening every manifest
- deterministic `train` / `val` / `test` split folders
- core primitive ground truth for circles, point dots, circle strokes, line
  strokes, curved strokes, arc strokes, rects, rounded rects, quads, and
  perspective tile grids
- synthetic quad ground truth includes numeric subtype markers for trapezoids
  and parallelograms while preserving `quad` as the editable primitive kind
- cut-out-like white overlay strokes with editable stroke metadata
- preview/SVG coverage for generated `arc`, `rect`, and `rounded_rect`
  manifests
- `basic` and `dense` difficulty tiers; `dense` adds labeled parallel stroke
  groups while preserving deterministic seed behavior
- `logo` difficulty tier adds simple logo-like compositions with a ring mark,
  accent dot, diagonal stroke, and rounded wordmark bar
- `grid` difficulty tier adds a larger labeled perspective tile family with
  row/column metadata for table-like quad-grid training cases

## M6: Local MLX Segmentation Layer

Status: implemented for the current local-segmentation baseline.

Purpose: add local AI as a proposal layer, not as the final source of truth.

Deliverables:

- MLX environment setup via `uv`.
- MLX SAM integration behind a segmenter interface.
- Classical segmenter remains available as a baseline.
- Segment proposal manifest:
  - mask id
  - confidence
  - source model
  - bounding box
  - downstream accepted/rejected status

Acceptance criteria:

- MLX SAM can propose regions locally for selected test images.
- Pipeline can run with or without MLX and compare outcomes.
- AI proposals never bypass geometry scoring and editability metrics.

Implemented so far:

- `Segmenter` protocol
- `SegmentProposal` metadata schema
- `FlatColorSegmenter` baseline
- `MlxSamSegmenter` adapter placeholder with explicit not-configured error
- manifest-ready proposal serialization
- `morphea segment input.png -o proposals.json` writes segment proposal manifests
  from the flat-color baseline
- segment proposal manifests include backend availability/status metadata so
  flat-color and future MLX runs can be compared explicitly.
- flat-color segment proposals split connected components by default and can
  mark oversized components as `deferred` via `max_component_area`
- flat-color segment proposal bounds are scaled back to source-image
  coordinates after `max_size` analysis resizing, keeping side-by-side
  comparisons with MLX/SAM masks in the same coordinate space.
- segment proposals include `downstream_status` and `rejection_reason` so
  geometry/review stages can distinguish pending proposals from rejected ones.
- pending flat-color segment proposals include primitive `anchor_kind`,
  `anchor_metrics`, and `anchor_parameter_count` summaries from the geometry
  scorer, while deferred oversized proposals remain rejected without pretending
  to be accepted anchors.
- segment proposal manifests include aggregate proposal status, downstream
  status, anchor-kind, and reserved-anchor counts for quick scan/review.
- segment proposal manifests include `proposal_tile_grid` groups for regular
  2D arrangements of reserved `rect`/`quad` proposals, including row/column
  counts, occupancy, spacing errors, and proposal ids in grid order.
- `morphea segment --markdown proposals.md` renders a scan-friendly proposal
  report with backend status, aggregate counts, anchor kinds, and reservation
  reasons.
- optional segment geometry gating can turn pending proposals into
  `accepted` or `rejected` downstream decisions using primitive anchor quality
  error and reservation requirements.
- segment proposal manifests record `anchor_quality_error` and
  `downstream_decision_reason` so later review/training stages can explain why
  a proposal passed or failed the geometry gate.
- `morphea compare-segments before.json after.json -o comparison.json` compares
  segment proposal manifests from different configs or segmenter backends,
  including per-source proposal/downstream status deltas, summary-count deltas,
  config differences, added/removed proposals, changed downstream/anchor
  decisions, and added/removed/changed proposal groups such as
  `proposal_tile_grid`.
- segment comparison Markdown reports include Source Summaries and Source
  Deltas tables plus explicit Promotion Proxy Deltas, so classical and MLX/SAM
  proposal runs can be reviewed side by side without opening the raw JSON.
- segment comparisons include a source delta assessment that labels side-by-side
  runs as improved, mixed, noise, unchanged, or needing review from green
  promotion, red candidate, manual-review, and proposal-count deltas while
  recording whether the basis is true promotion-region state counts or
  downstream-status proxy counts.
- `compare-segments` CLI stdout now summarizes before/after sources, proposal
  counts, shared proposal count, source verdict, and green/red/manual-review
  deltas instead of only reporting shared proposal ids, keeping MLX/SAM
  side-by-side smoke runs understandable when sources use different id spaces.
- segment comparisons also report greedy spatial proposal matches by bbox IoU,
  so Flat-Color and MLX/SAM runs can show overlapping regions even when their
  proposal ids never intersect.
- segment comparisons summarize spatial matches with mean/min/max IoU and
  downstream/anchor transition counts, making prompt-strategy runs comparable
  without opening every raw match row.
- segment comparisons include `segment_comparison_audit`, a machine-readable
  RIP5 evidence block for classical-vs-MLX/SAM source pairs or explicit
  MLX/SAM prompt/runtime deltas, proposal provenance, source summaries,
  downstream geometry-gate visibility, promotion-proxy deltas, source assessment
  verdict/basis records, and spatial match evidence.
- segment configs accept future MLX runtime knobs for model path, score
  threshold, mask count, and runtime timeout while preserving the explicit
  not-configured failure path
- MLX SAM status reporting distinguishes missing MLX package, missing model
  configuration, missing model file, and adapter-pending states without
  allowing AI proposals to bypass the geometry pipeline.
- MLX SAM status reports the adjacent `.safetensors.json` sidecar path and
  existence as non-blocking diagnostics, making quantized checkpoint setup
  inspectable without marking unquantized checkpoints unavailable.
- runtime status Markdown renders backend diagnostics such as MLX/SAM adapter,
  model path, model existence, sidecar path, and sidecar existence, so the
  default `morphea status` output exposes the same setup evidence as JSON.
- MLX SAM status includes per-capability diagnostics for the JSON proposal
  adapter and the optional live SAM model adapter.
- `MlxSamSegmenter` can consume local JSON proposal payloads through the same
  segment proposal schema, score threshold, mask limit, and downstream
  geometry gate; this gives M6 an operational adapter contract before live SAM
  weights are wired.
- JSON proposal payloads can carry either rectangular bounds/bboxes or
  mask-row payloads, so local adapter tests can exercise non-rectangular
  region proposals before the live SAM runtime is connected.
- when the optional `mlx-sam` package is available in a compatible Python
  environment and `mlx_model_path` points at a `.safetensors` checkpoint,
  `MlxSamSegmenter` can run bounded grid-point prompts and convert positive
  live SAM masks into the same proposal schema and geometry gate.
- MLX/SAM segment configs can choose `mlx_prompt_strategy=grid_points` or
  `flat_color_centers`; the guided strategy prompts SAM at centers of
  Flat-Color proposals that already pass the reserved-anchor geometry gate, so
  prompt experiments are repeatable without making Flat-Color the final source
  of truth.
- the first local live-SAM smoke used the 4-bit tiny SAM2.1 MLX checkpoint
  `sam2.1_hiera_tiny_image_segmenter_q8_trunk_mask_q4_memory.safetensors` on
  `assets/curated/terminaro-opaque-table-grid.png`; it produced 4 MLX/SAM
  proposals, all accepted by the geometry gate, then compared them against the
  Flat-Color baseline with `compare-segments`. The comparison was intentionally
  recorded as a runtime proof rather than a quality claim: green promotion
  proxy count dropped from 29 to 4 and the source assessment was `noise`.
- checked-in smoke configs under `docs/real-images/mlx-sam-smoke/` replay the
  same status, Flat-Color segment, MLX/SAM segment, and segment-comparison
  steps while keeping checkpoint weights and generated `/tmp` outputs out of
  git.
- MLX/SAM adapter masks now respect `max_component_area` before geometry
  gating, so oversized AI masks are deferred instead of being allowed to pass
  as coarse primitive anchors.
- `morphea segment --segmenter mlx_sam` exposes the explicit not-configured path
  until the local MLX/SAM runtime is installed and a checkpoint is configured.
- `morphea status` treats the `mlx_sam_package_available` adapter state as an
  available backend state, so a configured MLX/SAM package adapter is not
  reported as blocked merely because its status name is adapter-specific.
- `morphea status` now carries `next_action` hints through backend and
  capability rows, including concrete `uv` setup commands for MLX/MLX-SAM,
  checkpoint configuration, missing model paths, and JSON proposal payloads.
- runtime status Markdown now exposes MLX/SAM model configuration, thresholds,
  prompt settings, and MLX classifier core/autograd diagnostics in the Backend
  Diagnostics table so local runtime state can be audited from stdout.

## M7: Primitive Classifier Training

Status: implemented for the current classifier-training baseline.

Purpose: train the first local model that helps choose semantic primitive type.

Model target:

- Small from-scratch MLX Transformer.
- Input: mask/RGBA crop plus geometric features.
- Output: primitive/stroke class plus confidence.
- No direct geometry-parameter prediction in the first version.

Classes:

- `circle`
- `stroke_circle`
- `ellipse`
- `rect`
- `rounded_rect`
- `stroke_path`
- `stroke_polyline`
- `polygon`
- `quad`
- `arc`
- `star`
- `cubic_path`
- `unknown`

Deliverables:

- Training dataset from M5.
- Training command.
- Evaluation command.
- Confusion matrix.
- Integration into candidate ranking as a confidence prior.

Acceptance criteria:

- Classifier improves candidate selection on synthetic validation data compared
  with heuristic-only ranking.
- Low-confidence predictions degrade safely to deterministic geometry.

Implemented so far:

- feature extraction from generated ground-truth manifests
- trainable centroid primitive-classifier baseline
- `morphea train dataset.json -o model.json`
- `morphea train-mlx dataset.json -o model.json` for the optional MLX
  Transformer classifier path
- train/val/test evaluation sections in model artifact
- confusion matrix output
- `morphea eval-classifier model.json dataset.json -o report.json` for
  standalone evaluation of an existing primitive classifier artifact
- `morphea eval-classifier --markdown report.md` for scan-friendly classifier
  evaluation summaries
- classifier artifacts and evaluation reports include centroid-spread
  `feature_importance`, making it visible when simple geometry and
  scene-group context signals separate primitive classes.
- optional `--classifier-model` prior during `morphea vectorize`
- `classifier_prior_error` metric in candidate manifests
- `morphea train` writes `ranking_evaluation` comparing heuristic-only candidate
  ranking with classifier-prior-assisted ranking on validation/test splits
- `morphea train-mlx --allow-unavailable` writes a deterministic artifact with
  MLX backend status plus centroid fallback weights, so the training pipeline
  remains runnable when MLX is not installed locally
- MLX classifier runtime status distinguishes missing MLX package from an
  available package that trains the
  `mlx_feature_raster_token_classifier` path, while keeping centroid fallback
  weights usable as the safe ranking prior.
- the available MLX training path no longer emits metadata-only hooks; it
  writes optimized feature-head, raster-token mixer, feature/raster fusion, and
  token-transformer components with auditable weights, normalization, and loss
  history.
- classifier feature extraction includes detected/generated
  `quad_subtype_code` values so trapezoid and parallelogram structure can
  influence candidate ranking without adding new top-level primitive classes
- classifier feature extraction includes anchor `group_context` signals for
  perspective grids, parallel stroke groups, same-color fragment groups, and
  primitive reservations, so reviewed pseudo-labels can carry scene structure
  into centroid and MLX training examples.
- classifier training can extract fixed-size RGBA anchor-crop token sequences
  from synthetic dataset images and manifests.
- `morphea train-mlx --crop-size N` records the raster token size, token shape,
  channel order, and crop-token summary in the MLX training artifact.
- available MLX training artifacts include `raster_token_mixer_v1`, a
  trainable attention-style pooling block over RGBA crop tokens with its own
  normalized weights, bias, and loss history.
- available MLX training artifacts also include `mlx_feature_raster_fusion_v1`,
  a trainable fusion head over geometric primitive features plus raster-token
  attention embeddings. Runtime prediction prefers this learned fusion when
  crop tokens are available and falls back to the older feature/mixer logits
  when it is missing or malformed.
- available MLX training artifacts now include `mlx_token_transformer_v1`, a
  serialized small token encoder that pools RGBA crops into raster tokens,
  combines them with geometric feature tokens, runs scaled dot-product
  self-attention layers, and trains a classifier head on the pooled encoder
  embedding.
- `mlx_token_transformer_v1` now records a learned projection calibration over
  encoder dimensions, so token embeddings are no longer purely deterministic
  before classifier-head training.
- when real MLX autograd is available, `mlx_token_transformer_v1` now trains
  `mlx_token_projection_v1` token-to-hidden projection weights and the token
  classifier head together, records `training_status`, and uses those
  projection weights at runtime.
- runtime classifier loading prefers valid `mlx_token_transformer_v1` logits
  when crop tokens are available, then falls back through feature/raster fusion,
  raster-token mixer, feature head, and centroid fallback.
- `morphea eval-classifier` uses RGBA crop-token examples for direct
  accuracy/confusion when evaluating a valid MLX raster-token mixer artifact.
- `morphea eval-classifier` also uses RGBA crop-token examples for
  candidate-ranking evaluation when a valid MLX raster mixer, fusion head, or
  token-transformer block is present.
- `--classifier-model` can load `mlx_feature_head_v1` artifacts and use their
  serialized weights as the candidate-ranking prior, while malformed or
  unavailable MLX artifacts degrade to centroid fallback weights.
- vectorize candidate ranking now generates component-derived RGBA crop tokens
  for valid `raster_token_mixer_v1` artifacts, allowing runtime priors to fuse
  raster attention and geometric feature logits.
- MLX classifier runtime status reports trainable feature/raster/token
  capabilities and end-to-end token-projection training separately from the
  end-to-end attention-weight training capability.
- available MLX token-transformer artifacts now train
  `mlx_attention_diagonal_v1` per-layer query/key/value/output scales and
  output bias with MLX autograd, serialize them in `attention_parameters`, and
  use them during runtime prediction.
- checked-in configs under `docs/real-images/primitive-classifier-smoke/`
  replay the direct own-model primitive-classifier path: generate a small
  synthetic corpus, train `mlx_transformer_primitive_classifier`, and evaluate
  the trained model with raster-token direct and ranking paths.

Remaining:

- none for the current primitive-classifier milestone baseline; future quality
  work can replace diagonal attention parameters with richer projection
  matrices if real-image results justify the extra complexity.

## M8: Self-Learning Loop

Status: implemented for the current self-learning baseline.

Purpose: turn the pipeline into an iteration engine.

Deliverables:

- Batch runner over curated real images.
- Quality filters for pseudo-label candidates:
  - high editability score
  - acceptable raster diagnostics
  - stable anchor metrics
  - low fragmentation
- Pseudo-label export.
- Human review hooks:
  - accept/reject anchors
  - mark wrong primitive type
  - mark bad cut-out/stroke behavior
- Retraining loop:
  - synthetic pretraining
  - real pseudo-label fine-tuning
  - validation against fixed real-image suite

Acceptance criteria:

- The system can collect its own high-confidence examples from real images.
- Retraining produces measurable improvement without using external vectorizer
  outputs as labels.

Implemented so far:

- `morphea harvest` pseudo-label collection from run manifests
- run-level warning-diagnostic filter
- anchor-level `classifier_prior_error` filter
- run-level `editability_score` minimum filter
- run-level `fragmentation_penalty` maximum filter
- run-level `raster_l1_error` and `raster_edge_error` maximum filters
- anchor-level aggregate quality filter for unstable simple-shape metrics
- output pseudo-label index with source manifest provenance
- harvested pseudo-labels preserve scene `group_context` for groups that
  contain the accepted anchor, and `morphea merge-labels` carries that context
  into generated pseudo-sample manifests with source group provenance.
- `morphea harvest --markdown harvest.md` writes a scan-friendly pseudo-label
  quality-gate report with accepted labels, filters, and rejected runs
- `morphea harvest-curated suite.json --run-root runs/curated -o pseudo.json`
  runs bounded curated real-image cases and harvests pseudo-labels from the
  generated per-case manifests
- human-editable review queue via `morphea review`
- `morphea review --markdown review.md` writes a scan-friendly queue summary
  while keeping accept/reject decisions in JSON; review and apply-review
  Markdown reports surface harvested group context so reviewers can see when an
  anchor belongs to a grid, parallel stroke group, merge candidate, or
  reservation group.
- accepted/rejected/pending split via `morphea apply-review`
- `morphea apply-review --markdown accepted.md` writes a scan-friendly decision
  summary for accepted, rejected, and pending labels
- review items support `corrected_kind` and structured issue tags for wrong
  primitive type, cut-out, and stroke-behavior feedback
- review queue and apply-review artifacts aggregate `issue_counts` so repeated
  primitive-type, cut-out, and stroke-behavior problems are visible in JSON and
  Markdown summaries.
- accepted reviewed pseudo-labels can be merged into a classifier-compatible
  train split via `morphea merge-labels`
- `morphea compare-training` compares baseline classifier training against
  reviewed pseudo-label augmentation on a fixed validation/test dataset
- comparison reports include a scan-friendly augmentation verdict with
  best/worst accuracy deltas and train-example delta, so regressions are easier
  to spot before retraining is accepted.
- comparison reports include feature-importance spread deltas so reviewed
  pseudo-labels can be audited for which geometry or group-context signals they
  strengthen.
- `morphea compare-training --markdown comparison.md` writes a scan-friendly
  retraining comparison table derived from the JSON report.
- `morphea training-gate comparison.json -o gate.json` turns a retraining
  comparison into an accept/manual-review/reject decision using explicit
  regression tolerances
- checked-in reviewed-region configs replay standalone `compare-training` and
  `training-gate` runs from the generated base dataset plus accepted reviewed
  pseudo-label dataset, writing JSON and Markdown evidence under
  `/tmp/morphea-real-image-review-run/training-gate/`
- `morphea self-learn base/dataset.json --reviewed-labels reviewed.json -o cycle`
  runs merge-labels, compare-training, training-gate, and accepted-gate
  retraining as one repeatable reviewed-label cycle
- `morphea self-learn --curated-suite suite.json` validates an accepted
  retrained model against the fixed curated real-image suite by passing the
  model as `classifier_model`; skipped gates do not pretend validation ran
- `morphea self-learn --lucide-suite suite.json` validates an accepted
  retrained model against the curated Lucide benchmark with the same
  `classifier_model` override and reports the result beside primitive and
  real-image families
- `morphea lucide-check --markdown report.md` writes a Quality Ledger that
  separates semantic pass/fail from visual-review labels and keeps named yellow
  Lucide calibration cases visible
- `morphea lucide-corpus suite.json -o corpus.json --output-dir corpus/`
  renders the definitive Lucide SVG set into supervised PNG/SVG training
  examples with explicit shape, forbidden-shape, metric, bounded-region, and
  source-SVG structure targets for own-model work
- `morphea train-raster-targets corpus.json -o model.json` trains the generic
  `raster_target_classifier` from rendered Lucide PNG features to explicit
  shape targets, giving the definitive-shape corpus an actual MLX model
  artifact while keeping Lucide as a corpus adapter rather than a classifier
  name; `morphea train-lucide-targets` remains a compatible corpus-specific
  alias for the same generic trainer, and the default 12x12-raster-feature MLP
  reaches 1.0 train exact-match accuracy on the checked-in 24-case Lucide
  corpus smoke
- `morphea train-raster-targets --target-label-key ...` and config
  `target_label_key` can train the same generic raster-target model from a
  different corpus label map, so the own-model path is not hard-wired to
  Lucide anchor-kind labels
- raster-target training artifacts and Markdown reports include the same
  per-target expected/predicted, true/false positive and negative, precision,
  and recall diagnostics as standalone evaluation reports, so training data
  gaps are visible before a model artifact is promoted
- raster-target training artifacts also record full-corpus and per-split target
  summaries plus `untrained_target_summary`, making validation/test vocabulary
  gaps visible even before standalone evaluation gates run
- `morphea train-raster-targets --max-untrained-targets ...` writes an optional
  `training_gate` so untrained validation/test target vocabulary can reject a
  model artifact before it is treated as promotion evidence
- `morphea eval-raster-targets model.json corpus.json -o report.json`
  evaluates the generic model against a rendered target corpus and reports
  unknown expected targets explicitly, making future vocabulary expansion
  visible before model artifacts are accepted
- raster-target evaluation reports now propagate the model's
  `model_training_gate`; evaluation gates reject a model whose training gate
  already rejected it, even when no other eval thresholds are configured
- raster-target evaluation reports include per-target expected/predicted
  positive counts, true/false positive and negative counts, precision, and
  recall, making own-model data gaps visible target by target rather than only
  through aggregate accuracy
- `morphea eval-raster-targets --min-target-accuracy ...
  --min-exact-match-accuracy ... --min-target-precision ...
  --min-target-recall ... --max-unknown-expected-targets ...` writes an
  optional acceptance gate with accept/manual-review/reject decisions, so
  raster-target model artifacts can be blocked before they become promotion
  evidence; per-target precision/recall gates prevent aggregate accuracy from
  hiding rare target regressions
- raster-target evaluation reports include `raster_target_evaluation_audit`,
  and `morphea eval-raster-targets` surfaces `audit=pass|fail`, so accepted
  evaluation metrics still fail visibly when model training-source,
  training-gate, evaluation-gate, or target-diagnostic evidence is incomplete
- `morphea self-learn --suite-family-baseline baseline.json` distinguishes
  newly introduced family regressions from known baseline debt before accepting
  the cycle
- known baseline debt is reported as `known_debt` and separated from blocking
  acceptance reasons so reviewed suite debt can be carried without hiding new
  regressions
- `morphea self-learn --suite-family-baseline-output next-baseline.json`
  persists accepted `suite_family_validation` snapshots for the next baseline
  comparison
- `--suite-family-baseline-reviewer`, `--suite-family-baseline-reason`, and
  `--suite-family-baseline-changelog` make baseline refreshes auditable and
  prevent silent baseline replacement
- existing `--suite-family-baseline-output` paths require a matching
  `--suite-family-baseline` path before they can be overwritten
- `docs/real-images/baselines/current-suite-family-baseline.json` provides a
  checked-in reviewed accepted-cycle baseline for baseline-gated self-learning
  CLI runs
- `morphea retrain` writes an augmented primitive classifier model from base plus
  reviewed pseudo-label train examples, including source-dataset provenance and
  validation/test evaluation metrics
- `morphea retrain --config retrain.json` supports repeatable self-learning
  retraining runs and can optionally write the comparison report next to the
  model
- `morphea retrain --backend mlx` writes an augmented MLX classifier artifact
  from base plus reviewed pseudo-label train examples, using the MLX
  train/fallback path and recording the generated augmented dataset index.
- MLX retraining can consume reviewed pseudo-label manifests that do not carry
  source images: feature training includes those pseudo labels, while
  raster-token crops are trained from image-backed samples.
- reviewed-label MLX retraining now uses the same end-to-end token projection
  and attention-parameter training path as `morphea train-mlx` for image-backed
  examples.
- the checked-in region-review replay now runs through the final
  `docs/real-images/reviews/retrain-mlx-reviewed-regions.json` smoke when MLX
  is available, proving that 10 reviewed gold-circle pseudo labels become 40
  local MLX primitive-classifier train examples with raster pseudo-label
  coverage and trainable component summaries.
- `docs/real-images/reviews/self-learn-mlx-reviewed-regions.json` makes the
  same region-review replay repeatable through the higher self-learning cycle,
  requiring MLX raster pseudo-label evidence while preserving the normal
  training-gate rejection when the comparison regresses.
- accepted MLX self-learning cycles copy the trained component summary into
  `self-learning-cycle.json` and Markdown, including inference order,
  parameter counts, loss epochs, raster-token usage, and MLX-autograd component
  count.
- MLX retraining reports now separate semantic base/pseudo examples from
  raster-capable base/pseudo examples, so reviewed pseudo labels without source
  images are visible as feature-path learning rather than implied raster-token
  training.
- harvested run-directory pseudo labels now retain detected `source_image`
  provenance, and `merge-labels` copies valid reviewed source images into the
  pseudo dataset so reviewed image-backed labels can train MLX raster-token
  components.
- MLX self-learning can require `min_mlx_raster_pseudo_examples`, blocking
  acceptance with `mlx_raster_pseudo_examples_below_min` when reviewed pseudo
  labels have not actually trained raster-token components.
- self-learning cycle reports include `reviewed_label_loop_audit`, a
  machine-readable RIP7 block for accepted-only pseudo datasets, pseudo manifest
  artifacts, provenance summaries, training/acceptance gates, model-acceptance
  discipline, suite-family validation, baseline review evidence, and MLX raster
  pseudo-label minimums.
- self-learning cycle reports include `multi_family_regression_audit`, a
  machine-readable RIP8 block for primitive/real-image/Lucide family views,
  per-family outcomes, real-image pipeline-quality counts, baseline-regression
  blocking visibility, configured validation artifacts, contact-sheet evidence,
  failure severity, yellow drift, and comparison delta records.
- `morphea self-learn` stdout now surfaces `rip7_audit=pass|fail` and
  `rip8_audit=pass|fail` beside the training gate decision, so repeatable
  own-model smoke runs expose audit completeness without opening JSON.

Remaining:

- none for the current reviewed-label self-learning baseline; additional real
  pseudo-label data should drive future quality thresholds.

## M9: Differentiable and Local Refinement

Status: implemented for the built-in local and soft-raster backends.

Purpose: improve geometry after a good semantic initialization exists.

Deliverables:

- Refinement interface with strict time/iteration limits.
- Robust renderer for deterministic metrics.
- Optional differentiable backend:
  - DiffVG if practical
  - alternative renderer if DiffVG is too brittle on Apple Silicon
- Refinement only changes parameters of accepted semantic shapes unless a
  config explicitly allows structure changes.

Acceptance criteria:

- Refinement improves raster diagnostics without destroying editability.
- A true circle does not become a noisy path just to gain tiny pixel fidelity.

Implemented so far:

- `RefinementConfig`
- `morphea refine manifest.json -o refined.json`
- `local_metric` backend
- structure-preserving manifest output
- refinement metadata and per-anchor metrics
- top-level refinement `structure_audit` records source/refined anchor counts,
  preserved primitive kinds, geometry-change count, and editability preservation.
- optional `--source-image` refinement input
- first structure-preserving local optimizer for circle radius adjustment using
  rendered raster L1 error
- weighted refinement objective using both raster L1 and edge-error diagnostics
  via `--raster-l1-weight` and `--raster-edge-weight`
- structure-preserving local optimizer for quad-like primitives (`rect`,
  `rounded_rect`, `quad`) using bounded translation and scale parameter steps
- structure-preserving local optimizer for stroke-like primitives
  (`stroke_polyline`, `stroke_path`, `arc`) using bounded centerline
  translation and stroke-width steps
- recognized optional differentiable backend names (`differentiable`, `diffvg`)
  behind the same `morphea refine --backend ...` interface, with an explicit
  not-installed/not-configured failure path until a renderer is wired
- `morphea refine --backend differentiable` now runs a built-in soft-raster
  gradient backend for structure-preserving circle-radius refinement and
  quad-like (`rect`, `rounded_rect`, `quad`) plus stroke-like
  (`stroke_polyline`, `stroke_path`, `arc`) transform refinement, including
  renderer metadata and objective deltas.
- `morphea refine --backend diffvg` remains the optional external adapter path
  with explicit not-installed/adapter-pending status until DiffVG is wired.
- refinement backend status reports distinguish active local metric refinement,
  missing optional renderer packages, and adapter-pending optional renderer
  states without allowing structure-changing refinement to run implicitly.
- refinement config validates iteration, timeout, and raster-weight limits, and
  optimizer metadata records elapsed seconds, timeout state, and stopped reason
- `morphea refinement-gate refined.json -o gate.json` turns structure audit and
  optimizer objective metrics into an accept/manual-review/reject decision so
  tiny pixel gains cannot silently break editability

Remaining:

- optionally wire DiffVG when it is practical in the target local environment.

## M10: Curated Real-Image Suite

Status: implemented for the current curated baseline.

Purpose: keep the system honest against actual target images.

Deliverables:

- Local fixture directory policy for real images.
- Per-image notes:
  - observed structures
  - expected anchors
  - known current failures
  - milestone that should address each failure
- Fixed regression runs.

Initial candidate:

- `/Users/sebastian/Desktop/terminaro-tweaked.png`

Acceptance criteria:

- Each curated image has a documented expected-shape checklist.
- CLI can produce SVG, manifest, and report for each image within runtime
  limits.

Implemented so far:

- `docs/real-images/suite.json` for local real-image metadata without checking
  large binaries into git.
- `morphea curated-check suite.json -o report.json` for suite validation.
- `morphea curated-check --config curated-check.json` for repeatable real-image
  suite validation inputs.
- `morphea promotion-review-run suite.json --output-dir review-run` for the
  review-oriented one-command suite run that writes default JSON, Markdown,
  snapshot, review packet, gallery artifacts, and a starter
  `promotion-review-harvest.json` config with per-case terminal decision
  template paths. The generated harvest command is surfaced in the final report
  JSON and Markdown.
- optional `--run` mode using each case's bounded `recommended_config`.
- per-case `output.svg`, `debug.svg`, `manifest.json`, `config.json`,
  `report.md`, `report.html`, and `preview.png` artifacts via `--output-dir`.
- curated artifacts are written through the same run writer as vectorize runs,
  including input copies and raster-fidelity metrics.
- expectation checks for anchor kinds and scene group kinds.
- metric expectation checks for curated cases, including editability,
  simple-shape ratio, and fragmentation envelopes.
- deterministic `morphea curated-check --snapshot snapshot.json` regression
  summaries for important commits/configurations.
- `morphea curated-check --markdown report.md` writes scan-friendly real-image
  suite reports with a Corpus Ledger for quality labels, stress families,
  expected promotion families, issue tags, licensing status, plus case status,
  derived current pipeline labels, failed expectations, key metrics, and
  artifact directories.
- curated reports include `corpus_audit`, a machine-readable RIP1 readiness
  block for source provenance, licensing, bounded config, human-readable intent,
  red/yellow/green labels, red/yellow issue tags, visual-audit status, and
  contact-sheet artifacts when a run writes an output directory.
- curated reports include `quality_gate_audit`, a machine-readable RIP2 gate
  coverage block for bounded region gates, shape-class gates, topology gates,
  fragmentation/layer thresholds, grouping, visual fidelity, per-family visual
  thresholds, contact-sheet gate records, and per-case gate coverage.
- curated reports include `promotion_pipeline_audit`, a machine-readable RIP3
  pipeline block for promotion decisions, configured region-state records,
  failed-gate visibility, review-decision records, review artifact links,
  promoted/fallback SVG exports, promotion-export partitions, and manifest
  promotion annotations.
- promotion-export artifacts include `promotion_export_audit`, a
  machine-readable RIP9 trust-boundary block for complete promoted/fallback
  partitioning, trusted SVG exclusion of fallback/rejected/deferred anchors,
  fallback SVG containment of non-promoted anchors, stable metadata wrappers,
  rejected/deferred visibility, missing-from-promoted records, non-promoted
  region reasons, and summary-count consistency.
- curated reports include `editability_review_audit`, a machine-readable RIP4
  review block for independent v10 component scores, threshold records,
  red-semantic-gate blocking visibility, regression-delta records,
  accepted-output contracts, editability sidecars, and manifest annotations.
- curated reports include `human_review_audit`, a machine-readable RIP6 review
  block for pending review-decision records, review-artifact links, terminal
  decision templates, template evidence, sidecar-only quality-label policy,
  reviewable region records, and suite-level review packet/gallery artifacts.
- curated reports and deterministic snapshots include
  `pipeline_quality_label`, so current green/yellow/red pipeline state can be
  consumed without parsing Markdown.
- the same Markdown report includes a Region Truth table for configured
  source-region gates, with stable region ids, state, bounds, expected kinds,
  actual matching counts, topology evidence including descriptor labels and
  nested-contour counts, per-region layer-depth evidence, selected-anchor kind
  profiles, candidate rejection counts, and source-vs-exported-SVG
  `visual_delta` metrics, thresholds, and failures for checked cases.
- source-region topology gates can require or forbid descriptor labels such as
  `single_component`, `multi_component`, or `nested_contours`.
- the UI radio-control source region uses descriptor gates to require one
  closed component and reject multi-component, holed, cutout, or nested
  candidates.
- the Terminaro gold-circle source region uses descriptor gates to require
  multiple closed circle components and reject single-component, holed, cutout,
  or nested substitutes.
- Lucide circle/badge calibration now includes explicit zero-match contracts:
  `circle` forbids a full-icon irregular `stroke_path`, and `badge-check`
  forbids a full-icon `stroke_circle`; report rows render those constraints as
  `= 0`, with regressions labeled `forbidden_matches`.
- checked promotion runs include Candidate Rejections in `promotion-review.md`,
  so shape/topology reject reasons are reviewable without opening raw JSON.
- checked promotion runs include region-level visual deltas in
  `promotion_regions`, manifests, exports, Region Truth, and
  `promotion-review.md`, so visual drift can be inspected at the same source
  region as the semantic gate.
- source-region gates can set `max_raster_l1_error` and
  `max_raster_edge_error`; the Terminaro gold-circle region now passes a red
  region visual-fidelity gate, while the UI radio-control crop carries a
  yellow region visual-fidelity failure after its topology contract passes.
- checked promotion runs write `region-overlay.png` and include the same
  red/yellow/green source-region outlines as a Contact Sheet panel, making the
  failed or deferred region visually inspectable without opening raw JSON.
- `morphea promotion-review-harvest` consumes a suite-level
  `review-packet.json`, applies only explicit terminal review decisions,
  persists `review_decision_applied` beside case manifests, reports applied,
  harvestable, and pending cases with available terminal template paths, and
  can write a `harvest-curated --config` file with
  `require_applied_review: true`; the CLI regression path proves the generated
  config harvests accepted applied reviews while excluding deferred applied
  reviews as `applied_review_not_accepted`.
- promotion-review-harvest reports include `review_harvest_audit`, a
  machine-readable RIP10 review-to-learning block for case accounting, explicit
  terminal reviewer decisions, reviewer/reason evidence, harvestable
  accepted/corrected gates, reviewed-region evidence, pending review
  visibility, terminal-template readiness, and generated
  `require_applied_review` harvest configs.
- snapshot comparisons include explicit `promotion_region_deltas` and Markdown
  rows that identify the changed, added, or removed source-region id.
- second documented curated case:
  `chatgpt-image-2026-06-11`, covering the opaque white-background version of
  the Greek-figures/table illustration.
- third documented curated case:
  `ui-radio-acceptance-screenshot`, adding a text-heavy UI screenshot family
  with a small radio-circle control and bounded text-fragment expectations.
- checked-in deterministic baseline snapshot at
  `docs/real-images/baselines/current-curated-snapshot.json`.
- the UI radio-control case now exercises neutral composite ring recovery so
  thin anti-aliased controls remain represented by simple circle primitives.

Remaining:

- add more families as new representative local images become available.

## M11: Productized Research CLI

Status: implemented for the current research baseline.

Purpose: make the research loop pleasant enough to use repeatedly.

Deliverables:

- Commands:
  - `morphea generate`
  - `morphea train`
  - `morphea vectorize`
  - `morphea eval`
  - `morphea report`
  - `morphea sweep`
- Config files:
  - preprocessing
  - segmenters
  - anchor thresholds
  - scoring weights
  - training
- Stable output schema.
- Versioned experiment metadata.

Acceptance criteria:

- A user can run a full experiment without editing Python code.
- Results from different commits/configs can be compared.

Implemented so far:

- `morphea generate`
- `morphea train`
- `morphea train-mlx`
- `morphea eval-classifier`
- `morphea vectorize`
- `morphea profile`
- `morphea eval`
- `morphea report`
- `morphea segment`
- `morphea sweep`
- `morphea merge-labels`
- `morphea harvest-curated`
- `morphea compare-training`
- `morphea training-gate`
- `morphea self-learn`
- `morphea retrain`
- `morphea refinement-gate`
- `morphea status`
- `morphea vectorize --config config.json` for repeatable input/output,
  artifact, and runtime knob files
- `morphea generate --config generate.json` for repeatable synthetic corpus
  generation
- `morphea train --config train.json` for repeatable classifier training inputs
- `morphea eval-classifier --config eval-classifier.json` for repeatable
  classifier evaluation reports
- `morphea eval --config eval.json` for repeatable run-directory summaries
- `morphea profile --config profile.json` for repeatable bounded runtime probes
- `morphea profile-curated --config profile-curated.json` for repeatable
  curated-family runtime profiling and Markdown summaries
- `morphea report --command-config report.json` for repeatable standalone report
  rendering from existing manifests
- `morphea harvest --config harvest.json` for repeatable pseudo-label quality
  gates
- `morphea harvest --markdown harvest.md` for scan-friendly pseudo-label quality
  reports
- `morphea harvest-curated --config harvest-curated.json` for repeatable
  curated real-image pseudo-label collection
- `morphea review --config review.json` and `morphea apply-review --config
  apply-review.json` for repeatable human-review queue processing
- `morphea review --markdown review.md` for scan-friendly review queue summaries
- `morphea apply-review --markdown accepted.md` for scan-friendly review decision
  summaries
- `morphea merge-labels --config merge-labels.json` for repeatable reviewed-label
  dataset export
- `morphea compare-training --config compare.json` for repeatable retraining
  comparisons
- `morphea compare-training --markdown compare.md` for scan-friendly retraining
  comparisons
- `morphea training-gate --config training-gate.json` for repeatable retraining
  acceptance decisions
- `morphea self-learn --config self-learn.json` for repeatable reviewed-label
  self-learning cycles, including optional curated-suite validation
- `morphea retrain --config retrain.json` for repeatable augmented model output
- `morphea refine --config refine.json` for repeatable bounded refinement runs
- `morphea refinement-gate --config refinement-gate.json` for repeatable
  structure-preserving refinement acceptance decisions
- `morphea status` for a stdout Markdown runtime/backend availability report
- `morphea status -o status.json --markdown status.md` for a single
  machine-readable report of segmenter, classifier, and refinement backend
  availability/blockers
- `morphea status --config status.json` for repeatable runtime/backend
  availability checks
- `morphea status` reports blocked backend capabilities such as missing MLX SAM
  checkpoints or optional end-to-end MLX training pieces separately from
  installed package status
- `morphea status` now reports the generic `raster_target` classifier runtime
  beside the primitive `mlx` classifier, so the own-model Lucide/real-image
  target path exposes MLX/autograd availability before training runs
- `morphea curated-check --config curated-check.json` for repeatable curated
  real-image suite validation
- `morphea promotion-review-run suite.json --output-dir review-run` for a
  review-ready curated run with default report, Markdown, snapshot, packet, and
  gallery artifacts plus a starter `promotion-review-harvest.json` config and
  report-level harvest follow-up command. The starter config also lists the
  terminal decision templates available for each queued review case while
  keeping `decisions` and `decision_overrides` empty until explicit reviewer
  selection and evidence entry. The review packet and gallery also surface
  per-case `decision_choice_commands` and evidence-flag hints after the starter
  config is written, so reviewers can choose a terminal outcome through
  `promotion-review-harvest --decision-choice` without editing JSON.
- `morphea segment --config segment.json` for repeatable input/output,
  report, and segment proposal runs
- `morphea segment --markdown proposals.md` for scan-friendly segment proposal
  reports
- `morphea compare-segments before.json after.json -o comparison.json` for
  comparing segment proposal outputs across configs or backends
- `morphea compare-segments --config compare-segments.json` for repeatable
  segment proposal comparisons
- checked-in `docs/real-images/mlx-sam-smoke/*.json` configs replay the
  current MLX/SAM runtime smoke against a Flat-Color baseline without
  committing local checkpoints or generated smoke reports
- checked-in MLX/SAM smoke configs also compare `grid_points` directly against
  `flat_color_centers`, so prompt-strategy work has a repeatable mixed-signal
  baseline rather than relying on ad hoc `/tmp` commands.
- segment configs include component splitting and `max_component_area`
- MLX/SAM adapter proposals respect `max_component_area`, preventing oversized
  AI masks from bypassing the same deferral path used by Flat-Color components
- segment configs include future MLX model/runtime knobs without requiring the
  MLX backend to be installed
- segment configs include `mlx_prompt_strategy` so grid and Flat-Color-guided
  live SAM prompt runs can be compared through the same proposal schema
- vectorize scoring weights for raster error, quality error, complexity, and
  simple-shape bonus
- vectorize anchor threshold config for circle/ring, stroke, quad, rect, and
  rounded-rect candidate gates
- `morphea compare-snapshots before.json after.json` for comparing saved
  summaries from different commits/configurations
- `morphea compare-snapshots --config compare-snapshots.json` for repeatable
  saved-summary comparisons
- snapshot comparisons surface promotion-region deltas when curated snapshots
  include `promotion_regions`
- `morphea compare-git-snapshots before_ref after_ref --path snapshot.json` for
  comparing the same checked-in snapshot file across git refs without changing
  the working tree
- `morphea compare-git-snapshots --config compare-git-snapshots.json` for
  repeatable git-ref snapshot comparisons
- `morphea snapshot-git-ref ref --suite suite.json -o snapshot.json` for
  generating curated snapshots from a detached temporary worktree without
  checking out the current working tree
- `morphea snapshot-git-ref --config snapshot-git-ref.json` for repeatable
  isolated git snapshot generation
- `morphea promotion-review-harvest review-packet.json -o review-harvest.json`
  for repeatable review-to-harvest preparation after terminal promotion review
  decisions have been selected
- `morphea promotion-review-harvest --config promotion-review-harvest.json` for
  repeatable review-to-harvest preparation with case-id decision maps,
  template-backed `decision_choices`, copy/paste decision-choice commands for
  pending cases, template-readiness labels, and case-scoped
  `decision_overrides` that pass reviewer/reason and corrected-review evidence
  into `promotion-apply-review`. The command can also load a portable
  `decision_plan` overlay with `decision_choices` and `decision_overrides`, so
  explicit reviewer decisions can be replayed against fresh run-local terminal
  templates without committing those paths. The same evidence can be supplied
  directly with case-scoped CLI flags such as `--reviewer case=name`,
  `--reason case=reason`, `--correction-notes case=notes`, and
  `--corrected-artifact case=path`, plus `--reviewed-region case=region-id`;
  harvest prep Markdown shows those
  evidence-flag hints and aggregate readiness counts beside decision-choice
  commands when reviewer evidence is missing, pending rows preserve
  review-artifact links and failed-gate details from the packet, and applied
  rows show reviewer, reason, source decision path, promoted-anchor count,
  reviewed region ids, review-promoted region ids, review-promoted anchor
  indexes, harvest block reason, and applied review-artifact links.
- `docs/real-images/reviews/current-deferred-decision-plan.json` records the
  current explicit reviewer outcome for the three real-image cases as
  `deferred`, keeping them visible as review evidence while excluding them from
  harvestable accepted/corrected training labels.
- `docs/real-images/reviews/current-region-decision-plan.json` records the
  current region-scoped accepted review evidence: the transparent Terminaro and
  checked-in opaque generated-illustration gold-circle shape-class and
  visual-fidelity regions are accepted via `reviewed_region_ids`, while the UI
  radio case remains deferred. A checked-in replay regression verifies two
  harvestable cases and 10 trusted pseudo-label anchors.
- `morphea sweep` configs can carry output roots and Markdown report paths for
  repeatable config comparisons
- schema-v1 sweep configs
- schema-v1 scene manifests
- `sweep-summary.json` experiment metadata

Remaining:

- add new schema entries when future milestones introduce new commands or real
  MLX model execution.

## M12: Primitive Fidelity Harness

Status: implemented for the current fixed-fixture and seeded-variant baseline.

Purpose: make the simplest shapes the primary quality gate before homepage or
curated-image polish.

Implemented so far:

- `morphea primitive-check` generates deterministic primitive raster fixtures,
  vectorizes them, renders the recognized scene back to pixels, and writes a
  machine-readable report.
- Optional per-case artifacts include `input.png`, `output.svg`, `debug.svg`,
  `manifest.json`, and `preview.png`.
- The fixed fixture set covers filled square, filled rectangle, filled circle,
  horizontal/vertical/diagonal strokes, outlined ring, rounded rectangle, and a
  simple quad.
- The report records pass/fail status, selected primitive kind, raster L1/edge
  errors, bounding-box IoU, geometry bounds, and concrete failure reasons.
- `primitive-check --variant-count N --variant-seed S` appends deterministic
  seeded variants for simple square, rectangle, circle, stroke, and quad
  families without changing the default fixed-fixture run or checked-in
  baseline discipline.
- seeded primitive variants now also cover diagonal strokes, outlined rings,
  and rounded rectangles with bounded random geometry that keeps the intended
  primitive family distinct from neighboring ellipse/path fallbacks.
- seeded primitive variants also cover arcs and filled ellipses; arcs sample
  from stable hand-verified parameter envelopes, while ellipses keep enough
  aspect-ratio separation to avoid collapsing into circle-like cases.
- primitive quality reports include `variant_summary` and per-case
  `variant_source`, so seeded variant coverage is visible separately from
  hand-authored fixed fixtures.

Acceptance evidence:

- `PYTHONPATH=src python3 -m morphea.cli primitive-check -o /tmp/primitive.json`
- `PYTHONPATH=src python3 -m unittest tests.test_primitive_quality`

Remaining:

- expand seeded variants into cut-out and stroke-ellipse families only after
  those higher-variance families have similarly tight parameter envelopes.

## M13: Ground-Truth Primitive Specs

Status: implemented for the current hand-authored fixture baseline.

Purpose: keep primitive expectations explicit and separate from broad synthetic
training data.

Implemented so far:

- each built-in primitive fixture records canvas size, background, expected
  primitive kind, expected color, expected geometry, coordinate tolerance, raster
  thresholds, and minimum bounding-box IoU.
- fixtures are generated from hand-authored specs at runtime rather than
  checked in as binary assets.
- square/rectangle/quad contracts assert four-corner geometry; circle/ring
  contracts assert center/radius; stroke contracts assert two-point centerlines
  and width.

Remaining:

- move specs to external JSON only if users need to edit the fixture set without
  touching Python.

## M14: Geometry Contract Tests

Status: implemented for the current primitive and topology contract baseline.

Purpose: fail on wrong semantic geometry even when aggregate scene metrics look
acceptable.

Implemented so far:

- primitive quality checks fail wrong primitive kinds, unexpected `cubic_path`
  fallbacks, loose coordinates, poor bounding-box IoU, out-of-canvas bounds, and
  visual round-trip regressions.
- regression tests cover square, circle, stroke, ring, and CLI report behavior.
- the ring/stroke regressions that produced oversized arc or curved-stroke
  candidates are now covered by focused detector tests.
- nested organic fallback cases assert expected even-odd hole counts, and
  primitive quality reports expose expected/actual hole topology plus the path
  anchors that carry those holes.
- cut-out fixture cases assert editable cut-out stroke counts separately from
  closed path holes, and the Markdown/JSON reports show cut-out topology without
  opening manifests.

Remaining:

- add additional topology contracts only when new nested/cut-out regressions
  appear.

## M15: Visual Round-Trip Gates

Status: implemented for the current manifest-rendered preview baseline.

Purpose: compare source raster fixtures against rendered recognized scenes, not
just primitive counts.

Implemented so far:

- primitive-check records `raster_l1_error`, `raster_edge_error`,
  `raster_alpha_error`, `raster_size_match`, and bounding-box IoU per case.
- strict thresholds are used for filled square, rectangle, and quad; slightly
  looser thresholds are used for circles, rings, rounded rectangles, and
  diagonal strokes where rasterization differs by edge pixels.
- per-case artifacts make input/output inspection possible without rerunning
  the harness.

Remaining:

- add an SVG-raster backend only if manifest-rendered previews diverge from the
  browser/editor SVG rendering path.

## M16: Detector Tightening Loop

Status: started and implemented for the first primitive failures.

Purpose: use failing primitive cases to tighten recognition before broader real
image tuning.

Implemented so far:

- straight thick strokes no longer receive artificial control points from edge
  pixels; horizontal and vertical Pillow-style strokes remain two-point
  `stroke_polyline` anchors.
- arc candidates with width samples or visual stroke bounds far outside their
  source component are rejected, preventing outlined rings from becoming giant
  arc strokes.
- arc scoring avoids treating the intended bend of a three-point arc as line
  jitter while still preserving width-variance pressure.

Remaining:

- add rejection diagnostics if future failures need inspectable candidate-level
  reject reasons.
- keep each detector change paired with a primitive contract or focused
  detector regression test.

## M17: Honest Basic Gallery

Status: implemented for the current primitive-check-backed gallery baseline.

Purpose: publish only examples that are backed by passing primitive contracts.

Implemented so far:

- `morphea primitive-gallery` generates deterministic static QA pages from
  `primitive-check` artifacts.
- homepage hero and teaser panels select only passing primitive-check cases;
  failed QA cases can remain inspectable in the full gallery without being
  published as proof examples.
- homepage gallery links count passing cases, not total checked cases.
- gallery cards show bitmap input, exported SVG preview, primitive contract
  family, selected kind, anchor/node counts, raster errors, and detailed QA
  metrics.
- complex real-image illustrations remain out of the homepage until their own
  semantic and visual contracts are strong enough.

Remaining:

- none for the current primitive gallery baseline; future public galleries
  should keep the same passing-contract rule when real-image cases become
  promotable.

## M18: Synthetic Chart Corpus

Status: implemented for the first deterministic `chart_basic` corpus slice.

Purpose: create more labeled model-training and evaluation data without waiting
for every new real-image family to be sourced, licensed, and reviewed.

The first target family is `chart_basic`: compact bar-chart scenes with known
source SVG structure, rendered PNG inputs, chart-specific target labels, and
text-like label regions that do not require OCR.

Implemented so far:

- `morphea synthetic-corpus` command with config support.
- `chart_basic` generator for source SVG, rendered PNG, source-object manifest,
  corpus JSON, and Markdown report
- chart-specific `chart_structure_targets` labels for axes, bars, grid lines,
  legend swatches, plot area, and text-like labels.
- `anchor_kind_targets` and source-object summaries so the same corpus can feed
  generic raster-target diagnostics without a new model architecture.
- checked-in smoke configs under `docs/synthetic-corpus/chart-basic/`.
- MLX-expected `train-raster-targets` / `eval-raster-targets` smoke using
  `target_label_key: chart_structure_targets`; `--allow-unavailable` remains a
  diagnostic escape hatch, not quality evidence.
- Markdown reports and docs that separate synthetic training/evaluation
  evidence from curated real-image promotion evidence.

Acceptance evidence:

- `PYTHONPATH=src python3 -m unittest tests.test_synthetic_corpus`
- `PYTHONPATH=src python3 -m morphea.cli synthetic-corpus --config docs/synthetic-corpus/chart-basic/synthetic-corpus.json`
- `PYTHONPATH=src python3 -m morphea.cli train-raster-targets --config docs/synthetic-corpus/chart-basic/train-raster-targets.json --allow-unavailable`
- `PYTHONPATH=src python3 -m morphea.cli eval-raster-targets --config docs/synthetic-corpus/chart-basic/eval-raster-targets.json`

Acceptance criteria:

- generated chart corpus is deterministic by seed
- every example records source SVG, rendered PNG, split, target labels, and
  source-object provenance
- existing generic raster-target training/evaluation can train and report on
  chart targets without a new model architecture
- reports separate synthetic-family evidence from real-image promotion evidence

Design spec:

- [Synthetic Chart Corpus Design](superpowers/specs/2026-06-15-synthetic-chart-corpus-design.md)

## M19: Typography and Glyph Groups

Status: implemented for the first deterministic `typography_basic` corpus slice.

Purpose: preserve bitmap text as coherent editable vector groups without
requiring OCR, live SVG `<text>`, or font recognition.

Implemented so far:

- `morphea synthetic-corpus` supports `family: typography_basic`.
- synthetic typography corpus for deterministic sans/serif-like vector geometry
  samples without OCR, live SVG `<text>`, or font recognition.
- source SVG, rendered PNG, source-object manifest, group records, corpus JSON,
  and Markdown report per smoke run.
- `text_group_targets` for `glyph`, `split_glyph_fragment`, `word_group`,
  `text_line_group`, `isolated_glyph_fallback`, and `shape_distractor`.
- `anchor_kind_targets`, `group_kind_targets`, source-object summaries, and
  source-SVG summaries so the corpus can use the existing raster-target model
  path without a new architecture.
- checked-in smoke configs under `docs/synthetic-corpus/typography-basic/`.
- MLX-expected `train-raster-targets` / `eval-raster-targets` smoke using
  `target_label_key: text_group_targets`; `--allow-unavailable` remains a
  diagnostic fallback, not quality evidence.
- documentation that text remains grouped vector geometry, not recognized text.

Acceptance criteria:

- generated typography samples are deterministic by seed
- UI screenshot text-like handling remains green
- reports distinguish split glyphs, merged words, and accidental merges with
  boxes, icons, or chart marks
- no OCR dependency is introduced

Acceptance evidence:

- `PYTHONPATH=src python3 -m unittest tests.test_synthetic_corpus`
- `PYTHONPATH=src python3 -m morphea.cli synthetic-corpus --config docs/synthetic-corpus/typography-basic/synthetic-corpus.json`
- `PYTHONPATH=src python3 -m morphea.cli train-raster-targets --config docs/synthetic-corpus/typography-basic/train-raster-targets.json --allow-unavailable`
- `PYTHONPATH=src python3 -m morphea.cli eval-raster-targets --config docs/synthetic-corpus/typography-basic/eval-raster-targets.json`
- `PYTHONPATH=src python3 -m morphea.cli curated-check docs/real-images/suite.json -o /tmp/morphea-m19-curated-report.json --output-dir /tmp/morphea-m19-curated-cases --markdown /tmp/morphea-m19-curated-report.md --run`

Design spec:

- [Synthetic Typography Corpus Design](superpowers/specs/2026-06-15-synthetic-typography-corpus-design.md)

GitHub issue:

- [#10 Add typography and glyph-group reconstruction track](https://github.com/sebastian-software/morphea/issues/10)

## M20: Brand Logo Corpus

Status: implemented first safe synthetic slice.

Purpose: add a reproducible brand-logo corpus flow, similar to the Lucide
corpus, so logo rasterization can be tested against known SVG structure.

Planned deliverables:

- safe synthetic `brand_basic` corpus for project-generated abstract marks
- future Simple Icons-style corpus adapter with pinned upstream package/version
  after license and trademark review
- small first subset of brand marks across curves, geometry, counters, thin
  strokes, and multi-part symbols
- local render-to-PNG flow and source-SVG-derived target labels
- reports for visual similarity, primitive class, node/parameter budget,
  negative-space handling, and editability proxies
- license/trademark documentation that keeps brand assets local-only unless
  redistribution is explicitly reviewed

Acceptance criteria:

- no third-party brand SVGs or rendered PNGs are committed without license and
  trademark review
- corpus results are reproducible from a pinned source package
- evaluation separates visual similarity from editable structure
- reports make trademark-sensitive output easy to keep local-only

Implemented first-slice evidence:

- `morphea synthetic-corpus` supports `family: brand_basic`.
- checked-in smoke configs under `docs/synthetic-corpus/brand-basic/`.
- `brand_mark_targets` labels cover mark body, counter, accent, strokes, and
  wordmark bars.
- generated source manifests record `project_generated_fixture` and
  `abstract_non_trademark_mark`.
- `tests/test_synthetic_corpus.py` covers deterministic generation, CLI config
  loading, target-label training acceptance, and checked-in smoke configs.

GitHub issue:

- [#7 Add Simple Icons-style brand-logo corpus flow](https://github.com/sebastian-software/morphea/issues/7)

## M21: Streamline Corpus Evaluation

Status: implemented first attributed PNG fixture.

Purpose: evaluate Streamline open/free icon and illustration sets as a
professional designer-authored corpus source for dual-tone icons and flat
multi-color illustration regions.

Planned deliverables:

- source audit for Streamline sets, licenses, attribution, and repository usage
- first subset from explicitly open-source or CC BY 4.0 Streamline vectors
- local SVG-to-PNG render flow and Morphēa reconstruction comparison
- reports for palette recovery, grouping, layer depth, primitive structure, and
  cut-out handling
- attribution text and source links when assets are committed or published

Acceptance criteria:

- free-but-not-open-source sets stay out of committed corpus assets until
  explicitly reviewed
- first subset includes at least one dual-tone icon and one flat multi-color
  illustration-like asset if available in the open-source set
- generated render/vectorize artifacts write to `/tmp` or configured output
  directories by default
- evaluation distinguishes source-SVG structure from recovered editable
  structure

GitHub issue:

- [#11 Evaluate Streamline open/free sets as corpus sources](https://github.com/sebastian-software/morphea/issues/11)

Current source-audit stance:

- a small checked-in Streamline subset is acceptable when the selected source is
  explicitly redistributable and attribution is recorded beside the assets
- `docs/source-audits/streamline-icons.md` records the inclusion rule and the
  current npm/source check
- proposed files should be staged under
  `assets/source-candidates/streamline/<set-name>/` with a completed
  `SOURCE.md` before fixture promotion
- `morphea source-candidate-audit` validates candidate metadata, selected SVG
  existence, and SVG parseability before any curated fixture promotion
- `streamline-icons@0.1.12` is not accepted as the source yet because its
  package metadata does not prove official Streamline provenance and the packed
  tarball lacks an inspectable `LICENSE` file

Implemented first-fixture evidence:

- `assets/curated/streamline-ultimate-colors-anchor/input.png`
- `assets/curated/streamline-ultimate-colors-anchor/COPYRIGHT.md`
- `assets/curated/streamline-ultimate-colors-anchor/source-manifest.json`
- `docs/real-images/suite.json` case `streamline-ultimate-colors-anchor`
- `docs/real-images/streamline-ultimate-colors-anchor.md`
- local source-candidate audit for `Streamline Ultimate Colors Free` is
  expected to remain `ok=true`
- `curated-check docs/real-images/suite.json --run` checks the Streamline case
  as one of 8 curated cases

## M22: Curated Diagram and Flowchart Cases

Status: implemented first fixture.

Purpose: add curated process diagram and flowchart cases that stress boxes,
connectors, arrows, labels, and scene grouping.

Planned deliverables:

- curated suite entries for simple but realistic flowcharts
- region contracts for boxes, rounded boxes, decision diamonds, connector
  strokes, arrowheads, and bounded text-like regions
- diagnostics for missing arrowheads, broken connectors, over-fragmented text,
  and wrong box geometry
- focused detector or grouping regression tests for any required behavior

Acceptance criteria:

- existing curated cases remain green
- new diagram cases report explicitly in `curated-check`
- text is bounded/grouped, not OCR-recognized
- connector and arrow failures are visible in JSON and Markdown reports

GitHub issue:

- [#1 Add curated diagram/flowchart reconstruction case](https://github.com/sebastian-software/morphea/issues/1)

Current implementation evidence:

- checked-in deterministic `synthetic-flowchart-basic` fixture under
  `assets/curated/synthetic-flowchart-basic/`, including source PNG, source SVG,
  and source-object manifest
- `docs/real-images/suite.json` case with shape, visual-fidelity, grouping,
  primitive-ratio, and fragmentation expectations
- `docs/real-images/synthetic-flowchart-basic.md` records source provenance and
  quality boundary
- detector regression coverage for filled decision diamonds as `quad` anchors
- review decision plans explicitly defer the synthetic case from accepted
  real-image harvests

Current quality boundary:

- the fixture is project-generated and safe to publish, but remains yellow
  synthetic curated evidence rather than accepted real-image promotion evidence
- the start/process boxes, decision diamond, connector strokes, and connector
  grouping are under mechanical suite gates
- arrowheads and label marks are visible bounded review evidence; OCR remains
  out of scope

## M23: Multi-Color Illustration Cases

Status: implemented first fixture.

Purpose: stress Morphēa against richer flat-color illustration scenes with
overlapping regions, repeated colors, antialiasing, and foreground/background
structure.

Planned deliverables:

- curated suite entries for clean multi-color illustration-style images
- region-level expectations for palette grouping, simple shapes, organic
  fallback limits, and layer depth
- reports that expose same-color grouping versus over-splitting
- optional MLX/SAM proposal comparisons under the same geometry gates

Acceptance criteria:

- existing real-image cases remain green
- new illustration cases document review status before baseline snapshot updates
- reports separate palette recovery, layer depth, and editable structure
- AI proposals do not bypass editability or promotion gates

GitHub issue:

- [#2 Add curated multi-color vector illustration case](https://github.com/sebastian-software/morphea/issues/2)

Current implementation evidence:

- checked-in deterministic `synthetic-multicolor-illustration-basic` fixture
  under `assets/curated/synthetic-multicolor-illustration-basic/`, including
  source PNG, source SVG, and source-object manifest
- `docs/real-images/suite.json` case with primitive, fallback, same-color
  grouping, contact grouping, visual-fidelity, primitive-ratio, and
  fragmentation expectations
- `docs/real-images/synthetic-multicolor-illustration-basic.md` records source
  provenance and quality boundary
- review decision plans explicitly defer the synthetic case from accepted
  real-image harvests

Current quality boundary:

- the fixture is project-generated and safe to publish, but remains yellow
  synthetic curated evidence rather than accepted real-image promotion evidence
- repeated colors, overlapping regions, and contact-pair grouping are under
  mechanical suite gates
- organic and occluded illustration regions remain bounded fallback evidence
  until richer layer-aware illustration handling is added

## M24: Chart and Infographic Curated Cases

Status: implemented for the first safe synthetic chart curated fixture.

Purpose: validate chart reconstruction on real or local screenshot-like
examples after the synthetic chart corpus establishes the first target labels.

Implemented so far:

- checked-in deterministic `synthetic-chart-basic` fixture under
  `assets/curated/synthetic-chart-basic/`, including source PNG, source SVG,
  and source-object manifest generated from the M18 `chart_basic` corpus.
- curated suite entry in `docs/real-images/suite.json` for a compact bar-chart
  scene with axes, grid lines, bars, legend swatch, plot area, and text-like
  label rectangles.
- expectations for editable rect marks, editable stroke linework,
  `parallel_stroke_group` linework grouping, primitive ratio, and bounded
  fragmentation.
- promotion metadata that keeps the case `yellow` as synthetic chart evidence
  rather than real-image promotion proof.
- source-region gates and visual thresholds for plot-region review artifacts.
- focused test coverage that the checked-in real-image suite includes the
  chart fixture and metadata.

Acceptance criteria:

- existing curated cases and primitive quality gates remain green
- chart reports expose whether repeated elements are preserved as coherent
  groups
- OCR remains out of scope for the first curated chart milestone
- synthetic chart evidence and curated chart evidence are reported separately

Acceptance evidence:

- `PYTHONPATH=src python3 -m morphea.cli curated-check docs/real-images/suite.json -o /tmp/morphea-m24-curated-report.json --output-dir /tmp/morphea-m24-curated-cases --markdown /tmp/morphea-m24-curated-report.md --run`
- `synthetic-chart-basic` reports `ok=true`, `pipeline_quality_label=yellow`,
  12 `rect` anchors, 6 editable stroke anchors across `stroke_polyline` and
  `stroke_path`, and 2 `parallel_stroke_group` groups.

GitHub issue:

- [#3 Add curated chart/infographic reconstruction case](https://github.com/sebastian-software/morphea/issues/3)

## M25: Logo/Brand-Mark Curated Case

Status: implemented first fixture; expanded fake-logo benchmark planned.

Purpose: add a safe, inspectable logo or brand-mark case that makes the
editability difference between traced paths and recovered structure easy to
understand.

Planned deliverables:

- curated logo or synthetic/internal brand-mark source image with safe licensing
- contracts for curve quality, symmetry, negative space, visual round-trip
  error, and node/parameter budget
- before/after report artifact suitable for future README or gallery use
- optional local Potrace comparison once publication and licensing are safe
- 50-case fake-logo benchmark using static project-generated PNG fixtures
- wordmark, badge, monogram, and full-logo grouping metrics for logo-heavy
  reconstruction quality
- final `all_50_green_gate` for the fake-logo set

Acceptance criteria:

- no third-party trademarked logos are published without explicit license and
  trademark review
- existing curated cases remain green
- output quality is judged by editable structure as well as visual similarity
- comparison language stays factual and artifact-based
- fake-logo PNG fixtures can be vectorized individually from
  `assets/curated/fake-logo-set/*/input.png`
- fake-logo outputs preserve the main wordmark, mark, badge, frame, underline,
  and accent structures where present
- final fake-logo support target is 50/50 green, 0 red cases, 0 missing main
  wordmarks, and at least one `logo_group` per accepted output

Implemented first-fixture evidence:

- `assets/curated/synthetic-brand-mark-basic/input.png`
- `assets/curated/synthetic-brand-mark-basic/source.svg`
- `assets/curated/synthetic-brand-mark-basic/source-manifest.json`
- `docs/real-images/suite.json` case `synthetic-brand-mark-basic`
- `docs/real-images/synthetic-brand-mark-basic.md`
- review plans defer the synthetic case from accepted real-image harvests
- `curated-check docs/real-images/suite.json --run` checks 8 cases with the
  brand-mark and Streamline fixtures included

Fake-logo expansion evidence:

- `assets/curated/fake-logo-set/index.json` tracks 50 static logo cases.
- `assets/curated/fake-logo-set/fake-logo-001/input.png` through
  `assets/curated/fake-logo-set/fake-logo-050/input.png` are individually
  vectorizable PNG inputs.
- `assets/curated/fake-logo-set/contact-sheet.png` is a review sheet only, not
  the vectorizer source.
- `docs/fake-logo-wordmark-roadmap.md` defines FLW1-FLW7, including the
  refreshed 50-logo rebaseline and the final `all_50_green_gate`.
- `morphea fake-logo-benchmark assets/curated/fake-logo-set/index.json -o
  /tmp/morphea-fake-logo-benchmark-full --config
  /tmp/morphea-fake-logo-benchmark-config.json` is the FLW1 benchmark command.
- The June 15, 2026 FLW1 run vectorized 50/50 cases and produced the current
  baseline: 50 red, 0 yellow, 0 green; mean `raster_l1_error=0.018347`, mean
  `editability_score=0.099455`.
- The current dominant blocker is semantic grouping, not crash handling:
  50/50 cases lack `logo_group`/`wordmark_group` evidence under the new gate.
- After the first FLW3 benchmark-context grouping pass, the fake-logo baseline
  improved to 1 green, 45 yellow, and 4 red; the remaining red cases are
  `fake-logo-002`, `fake-logo-004`, `fake-logo-017`, and `fake-logo-049`.
- After the first FLW5 wordmark-aware scoring pass, the fake-logo baseline
  improved to 46 green, 0 yellow, and 4 red. The remaining red cases are still
  `fake-logo-002`, `fake-logo-004`, `fake-logo-017`, and `fake-logo-049`;
  `fake-logo-002` and `fake-logo-049` are visual-fidelity outliers, while
  `fake-logo-004` and `fake-logo-017` still have no detected anchors.
- After the FLW6/FLW7 repair pass, the local
  `morphea fake-logo-benchmark --require-all-green` gate passes with 50 green,
  0 yellow, and 0 red. `fake-logo-004` and `fake-logo-017` pass after
  low-contrast palette recovery; `fake-logo-002` and `fake-logo-049` pass via
  the high-fidelity retry attempt and remain the weakest visual-fidelity cases
  for future base-attempt quality work.

GitHub issue:

- [#4 Add curated logo/brand-mark reconstruction case](https://github.com/sebastian-software/morphea/issues/4)

## M26: Artifact-Based Vectorizer Comparisons

Status: implemented first local comparison.

Purpose: turn the vectorizer comparison page into artifact-backed evidence
using small reproducible examples rather than broad marketing claims.

Planned deliverables:

- first comparison page or gallery entry with input bitmap, Morphēa output, and
  at least one local baseline such as Potrace
- metrics for primitive count, node/parameter count, layer/group structure, and
  visual error where available
- links from `docs/vectorizer-comparison.md`
- publication review for hosted or commercial tool outputs before including
  them

Acceptance criteria:

- first example uses a synthetic or safely licensed input
- comparisons say where another tool is better when that is true
- hosted/commercial outputs are not published unless terms allow it
- evidence is reproducible from local artifacts where possible

GitHub issue:

- [#6 Add artifact-based vectorizer comparison examples](https://github.com/sebastian-software/morphea/issues/6)

Current implementation evidence:

- `docs/vectorizer-comparisons/synthetic-multicolor-illustration-basic/`
  contains a project-generated input reference, Morphēa output SVG, Morphēa
  manifest, Potrace PBM preprocessing artifact, Potrace output SVG, and
  `summary.json`
- `docs/vectorizer-comparison.md` links to the artifact-backed example
- the comparison says where Potrace is better: compact monochrome silhouette
  tracing after black-and-white preprocessing

Current quality boundary:

- this is a local artifact comparison, not a hosted/commercial output benchmark
- Potrace is used only as a monochrome baseline; palette and semantic primitive
  evidence are intentionally Morphēa-specific in this comparison

## M27: Public AI Story and Evidence Gallery

Status: implemented first evidence gallery.

Purpose: make Morphēa's AI-assisted workflow visible without turning the project
into a vague black-box "AI vectorizer" claim.

Planned deliverables:

- README and site copy that explain primitive classification, raster-target
  models, optional MLX/SAM proposals, reviewed pseudo-label learning, and
  synthetic training data
- simple diagram for the loop: generate/train, vectorize, harvest, review,
  compare/gate, retrain
- artifact-driven example once a stable curated or synthetic case is available
- homepage or gallery links to `docs/ai-in-morphea.md`

Implemented so far:

- README explains primitive classification, raster-target models, optional
  MLX/SAM proposals, reviewed pseudo-label learning, and synthetic training
  data without claiming universal vectorization quality.
- `docs/ai-in-morphea.md` distinguishes model-assisted reconstruction from a
  black-box converter and now names Lucide plus synthetic chart and typography
  rendered target corpora.
- `docs/ai-evidence-gallery.md` links to artifact-backed synthetic corpora,
  curated fixtures, runtime stance, and the local M26 comparison, with explicit
  quality boundaries for each entry.
- `docs/ai-in-morphea.md` includes a generate/train/vectorize/harvest/review/
  gate/retrain workflow diagram for the model-assisted loop.
- README links to both `docs/ai-in-morphea.md` and
  `docs/ai-evidence-gallery.md`, and site copy already avoids presenting
  complex real images as proof before their promotion gates are accepted.

Acceptance criteria:

- a newcomer can identify what AI does in Morphēa within one minute
- copy distinguishes model-assisted reconstruction from black-box generation
- no claims imply universal vectorization or guaranteed production quality
- examples include reports or review evidence, not only final SVG previews

GitHub issue:

- [#5 Clarify the public AI story for Morphēa](https://github.com/sebastian-software/morphea/issues/5)

## M28: MLX Quality Runtime Stance

Status: implemented for the current quality-runtime stance.

Purpose: clarify whether MLX is the expected quality runtime for model-backed
Morphēa workflows and reduce platform-dependent behavior.

Planned deliverables:

- README/setup docs that present Apple Silicon MLX as the recommended runtime
  for serious model-backed Morphēa development
- decision on core dependency versus explicit quality/dev install profile
- self-learning and retraining docs that default quality evidence to MLX paths
- de-emphasized `allow_unavailable` language as a diagnostic/testing escape
  hatch, not an equivalent quality mode
- separate MLX/SAM guidance that keeps SAM opt-in until checkpoint/runtime setup
  and promotion evidence are mature enough

Implemented so far:

- README presents the `.[mlx]` install as the recommended quality runtime for
  model-backed development.
- [MLX quality runtime](mlx-quality-runtime.md) records the current decision:
  core CLI remains portable, MLX is the intended quality path, and
  `allow_unavailable` is diagnostic fallback evidence rather than equivalent
  model-backed quality evidence.
- `docs/synthetic-corpus/chart-basic/train-raster-targets.json` follows the
  quality stance by omitting `allow_unavailable`; the README documents adding
  the flag only for local fallback diagnostics.
- MLX/SAM remains documented separately in `docs/mlx-sam-runtime.md` and the
  MLX/SAM smoke configs; it is not treated as a classifier or raster-target
  quality runtime.
- `morphea status` now writes `quality_runtime_audit`, a machine-readable M28
  readiness check that is `ready` only when both `classifier/mlx` and
  `classifier/raster_target` are available.
- `docs/schema.md` documents the `quality_runtime_audit` report shape and the
  fact that MLX/SAM availability does not satisfy classifier quality readiness.
- the checked-in model-backed quality smoke configs were audited: no checked-in
  JSON quality-smoke config silently enables `"allow_unavailable": true`.

Remaining:

- add a formal ADR if the optional `.[mlx]` profile should become a stricter
  dev/quality profile or a core dependency in a later release.
- local MLX-backed quality smoke still requires running the same commands inside
  an environment with `.[mlx]` installed; this workspace currently reports a
  blocked quality runtime rather than silently accepting fallback evidence.

Acceptance criteria:

- new readers can tell which install path gives intended model-backed quality
  behavior
- model-backed smoke configs do not silently pass through non-MLX fallbacks
- status/docs clearly identify environments below the intended quality runtime
- MLX/SAM remains separate from the classifier/runtime stance

GitHub issue:

- [#8 Clarify and possibly require MLX for quality workflows](https://github.com/sebastian-software/morphea/issues/8)

## Commit Discipline

Continue using small Conventional Commits:

- `docs: ...` for plans, ADRs, and research notes
- `feat: ...` for new pipeline capability
- `test: ...` for test-only additions
- `fix: ...` for behavior corrections
- `perf: ...` for runtime improvements
- `chore: ...` for tooling or cleanup

Each milestone should land in multiple small commits when it naturally splits
into docs, core logic, CLI, tests, and reporting.
