Skip to content

ADR 0005: Treat ICU as the Canonical Model and Gettext as a Compatibility Bridge

Accepted architecture decision record: icu canonical model and gettext bridge.

ADR 0005: Treat ICU as the Canonical Model and Gettext as a Compatibility Bridge

  • Status: Accepted
  • Date: 2026-03-16

Note: The original direction in this ADR is still valid, but the concrete high-level API split is now refined by ADR 0012. The default path is ICU-native and raw-text-first; classic gettext plural handling is behind the explicit CatalogSemantics::GettextCompat bridge mode.

Update: the ferrocat-icu MessageFormat parser now exists and is part of the workspace. The durable decision remains that ICU-native catalog semantics are the long-term center, while classic gettext plural behavior is an explicit compatibility mode rather than the default semantic model.

Update: ADR 0015 adds ICU authoring diagnostics on top of this model. Ferrocat can now analyze ICU arguments, formatters, plurals, selects, and tags, and it can compare source and translation structure without taking ownership of locale-aware runtime formatting.

MessageFormat 2 is intentionally not part of this accepted model today. The project continues to target ICU MessageFormat v1 for catalog semantics because the MF2 implementation surface is still too transitional for Ferrocat's stable catalog contracts, and the near-term user value is better served by stronger MF1 validation, diagnostics, and runtime artifact checks.

Context

ferrocat now has a high-level catalog API layered on top of the PO core:

  • parse_catalog
  • update_catalog
  • update_catalog_file

That API needs to handle two overlapping, but not identical, plural worlds:

  • ICU/CLDR-style plural categories and message structure
  • gettext Plural-Forms, msgid_plural, and msgstr[n]

These are related, but they are not the same semantic model.

ICU/MessageFormat is the richer long-term target:

  • structured plurals and selects
  • better future fit for modern i18n workflows
  • a stronger basis for validation and later compiler work

Gettext remains important because real projects still need to:

  • read and update existing .po catalogs
  • migrate from gettext-style plurals toward ICU
  • export back into gettext-based toolchains when required

The project already uses icu_plurals for locale-aware plural categories and now has a conservative PluralProfile bridge for gettext slot ordering and Plural-Forms handling.

Decision

Treat ICU/MessageFormat v1 as the canonical internal model.

Treat gettext as a compatibility bridge around that model.

Concretely:

  • the high-level API stays ICU-first by default
  • gettext import and export remain supported, but conservatively
  • existing Plural-Forms metadata should be respected where possible
  • automatic gettext header generation should only happen for clearly safe cases
  • unclear, lossy, or mismatched gettext plural situations should produce diagnostics instead of speculative rewrites

We explicitly do not make full gettext plural/header parity a short-term architectural goal.

Consequences

Positive:

  • the long-term semantic center of the library is clearer
  • ICU-focused future work has a cleaner foundation
  • gettext support stays useful without dominating internal design
  • diagnostics become the preferred tool for bridge ambiguity

Negative:

  • gettext support is intentionally "good and conservative", not maximal historical parity
  • some locales or headers will remain partially manual instead of fully auto-generated
  • roundtrip fidelity for gettext edge cases depends more on existing metadata than on aggressive inference