ADR 0011: Add an NDJSON Storage Format for High-Level Catalog Workflows

Accepted architecture decision record: ndjson catalog storage format.

ADR 0011: Add an NDJSON Storage Format for High-Level Catalog Workflows

Status: Accepted
Date: 2026-03-18

Context

ferrocat already has a clean separation between:

the low-level PO core
the higher-level canonical catalog API

That split makes it possible to add a second storage format without rewriting the PO hot path or forcing the public catalog model back into gettext-centric shapes.

There is also growing interest in a storage format that is:

simpler to parse than PO
easy to diff and stream
easier to merge in large repositories because each message can live on its own line
friendlier to AI/pipeline-style tooling
already close to likely translation exchange payloads

For this use case, the important semantic center is not raw PO fidelity. It is the canonical catalog model built around:

msgid
optional msgctxt
current translation value
comments, origins, flags, and obsolete state
ICU/text-first native catalog semantics

Decision

Add a second high-level catalog storage format: CatalogStorageFormat::Ndjson.

This format is intentionally:

available only through the high-level catalog API
explicit rather than inferred from file extension
canonical-model-oriented rather than PO-roundtrip-oriented

The wire shape is:

a small frontmatter header between --- lines
one JSON object per message line

Required frontmatter:

format: ferrocat.ndjson.v1

Supported frontmatter keys in v1:

format
locale
source_locale

Message records use:

id for msgid
ctx for msgctxt
str for the translation string
optional comments, origin, obsolete, and extra

Plural messages stay flattened as ICU strings in id and str.

The later semantics split in ADR 0012 makes this more explicit:

NDJSON belongs to the CatalogSemantics::IcuNative path
classic gettext plural compatibility stays on the PO-only compat path

We explicitly do not introduce:

file-extension autodetect in v1
a low-level borrowed NDJSON document API
full PO header fidelity inside the NDJSON format

Consequences

Positive:

high-level catalog workflows can now read and write a simpler storage format
message payloads become easier to stream, diff, batch, and hand to external tooling
one-message-per-line records make pull request review and ordinary Git conflict handling more practical for large teams, without requiring custom merge drivers for routine catalog edits
the PO core stays specialized and performance-focused
NDJSON stays aligned with the canonical model instead of re-encoding gettext slot details
Palamedes and other host adapters get a storage option that fits generated app catalogs and large-team review flows better than PO in some projects

Negative:

NDJSON is a new public file contract that must now be versioned and documented
some PO-specific fidelity, especially arbitrary headers, is intentionally not represented
the public catalog API now carries an explicit storage-format switch

Alternatives Considered

Replace PO with NDJSON

Rejected because the low-level PO parser and serializer are still core product features with strong performance and compatibility goals.

Infer storage format from file extension

Rejected for v1 because explicit format choice is easier to reason about and safer for public APIs and programmatic callers.

Use YAML or TOML for the new high-level storage format

Rejected for this first experiment because line-delimited JSON is simpler to parse, easier to stream, and naturally aligned with one-message-per-record workflows.