API Overview

Guide to Ferrocat's public API layers and when to use each entry point.

API Overview

Ferrocat's public API is organized around jobs, not around command-line tool names. Start from the catalog problem you need to solve:

read or write translator-friendly catalog files
merge newly extracted messages into existing translations
keep AI translation metadata next to translated catalog entries
audit whether a source and target catalog set is ready to ship
compile runtime payloads with stable keys and explicit fallback behavior
analyze richer messages that contain placeholders, formatting, plurals, selects, or tags

Use this page when you know the catalog task you want to perform and need the right Rust entry point.

If you want the product-level view first, read Getting Started or Catalog Modes. If you already know established Gettext-style tooling and want to map those jobs to Ferrocat, read Gettext Task Landscape.

If you want the application-framework view, read Ferrocat And Palamedes. Palamedes owns JS/TS extraction, macros, framework adapters, and runtime integration; Ferrocat owns the catalog semantics those layers should share.

Supported Catalog Modes

At the high-level catalog layer, ferrocat supports three explicit combinations of storage format and message semantics:

Mode	Storage format	Message model
Classic Gettext catalog mode	Gettext PO	Gettext-compatible plurals
ICU-native Gettext PO mode	Gettext PO	ICU MessageFormat
ICU-native NDJSON catalog mode	NDJSON catalog storage	ICU MessageFormat

NDJSON + Gettext-compatible plurals is intentionally unsupported. In API terms, that means CatalogStorageFormat::Ndjson is only available with CatalogSemantics::IcuNative.

NDJSON is the line-delimited JSON storage choice. It is useful when catalogs are maintained by larger teams, automation, or external systems that benefit from one message per line: diffs stay focused, merge conflicts are easier to isolate, and hosted Git review workflows do not need custom merge-driver support to make routine catalog edits manageable.

Quick Choice

If you want to...	Use
Parse a Gettext PO file into an owned Rust structure	`parse_po`
Parse a Gettext PO file while borrowing from the input string where possible	`parse_po_borrowed`
Turn a `PoFile` back into Gettext PO text	`stringify_po`
Merge fresh extracted Gettext messages into an existing Gettext PO file	`merge_catalog`
Combine multiple catalogs with deterministic conflict and selection rules	`combine_catalogs`
Read a Gettext PO or NDJSON catalog into the higher-level canonical catalog model	`parse_catalog`
Build keyed lookup/helpers on top of a parsed catalog	`ParsedCatalog::into_normalized_view`
Audit source and target catalogs for release readiness	`audit_catalogs`
Derive the default stable runtime key from `msgid` and `msgctxt`	`compiled_key`
Compile a normalized catalog into runtime lookup entries	`NormalizedParsedCatalog::compile`
Compile a requested-locale runtime artifact with fallbacks and missing reports	`compile_catalog_artifact`
Compile only a selected subset of compiled runtime IDs	`compile_catalog_artifact_selected`
Perform a full in-memory catalog update	`update_catalog`
Perform a full catalog update and write the result to disk only when changed	`update_catalog_file`
Compute the metadata hash for a machine-generated translation	`machine_translation_hash`
Parse ICU MessageFormat into a structural AST	`parse_icu`
Only validate ICU syntax	`validate_icu`
Summarize ICU arguments, formatters, plurals, selects, and tags	`analyze_icu`
Compare source and translated ICU message structure	`compare_icu_messages`
Extract only data argument names or only tag names	`extract_argument_names` / `extract_tag_names`
Extract variable names from a parsed ICU message	`extract_variables`

Gettext PO Core

`parse_po`

Use this when you want the normal, owned Rust representation of a Gettext PO file.

Typical use cases:

application code that wants a straightforward editable PoFile
transforms that keep parsed data around beyond the source input lifetime
tools where simplicity matters more than minimizing allocations

`parse_po_borrowed`

Use this when you want to parse without copying more than necessary.

Typical use cases:

read-heavy workflows
performance-sensitive inspection or transformation passes
benchmarks or pipelines where borrowing from the source text is helpful

Important input constraint: parse_po_borrowed currently requires LF-only input. Use parse_po for owned parsing with built-in CRLF normalization, or normalize line endings before calling the borrowed parser.

`stringify_po`

Use this when you already have a PoFile and want canonical Gettext PO output.

Typical use cases:

writing back modified parsed files
generating PO content from your own tooling
normalizing formatting after edits

Catalog Workflows

The high-level catalog request structs are now intentionally borrowing-first:

string inputs such as catalog text and locales are accepted as &str
selected compiled IDs and fallback chains are accepted as borrowed slices
file-oriented updates accept &Path

That keeps the API ergonomic for callers while avoiding avoidable request-side allocation and clone pressure before the real catalog work even starts.

For the common required fields, the option structs also provide new(...) constructors. They set the required content/path/locale/input fields and leave the rest at the documented defaults, which avoids starting from intentionally invalid empty defaults in normal call sites.

`merge_catalog`

Use this for the basic Gettext-style merge step:

start from an existing Gettext PO catalog
feed in freshly extracted messages
keep matching translations
add new entries
mark removed entries as obsolete

This is the closest Ferrocat equivalent to the core "merge updated template/messages into an existing catalog" workflow that users often associate with GNU msgmerge.

Choose merge_catalog when you want the leaner, more direct merge operation and already have data in classic Gettext-like shapes.

In practice this is the fast path workflow API: it stays close to classic Gettext PO merge behavior and avoids the extra canonical catalog projection and post-processing done by update_catalog.

Ferrocat assumes exact catalog identity here: msgid plus optional msgctxt. That works for classic ID-style catalogs and for projects that use real product copy as msgid. Matching messages keep translations, new messages are added, and removed messages follow the obsolete strategy. Fuzzy matching is intentionally outside the default workflow.

`combine_catalogs`

Use this when you have multiple existing catalogs or templates and want one deterministic output catalog.

This is the Rust-native counterpart to the useful parts of GNU msgcat, msgcomm, and msguniq:

combine N catalogs in one call
treat msgid plus msgctxt as the message identity
keep the first translation by default, so existing catalogs can be listed before newer templates
choose UseLast or Error when overlay or strict-conflict behavior is more appropriate
select all, common, less-common, or unique identities with CatalogCombineSelection
skip obsolete definitions by default, with an explicit opt-in when obsolete entries should participate

Ferrocat does not emit GNU-style conflict-marker translations for differing strings. Non-empty translation conflicts are either resolved with a warning diagnostic or rejected with ApiError::Conflict, depending on CatalogConflictStrategy.

`parse_catalog`

Use this when you want more than raw Gettext PO syntax. It projects a Gettext PO or NDJSON catalog into ferrocat's higher-level catalog model, with explicit storage and semantics choices.

Choose this when your application wants semantic catalog data rather than just PO syntax nodes.

ParseCatalogOptions borrows the source text and locale strings, so you can parse directly from existing buffers without first building owned request strings.

High-level catalog parsing now also takes an explicit storage_format:

CatalogStorageFormat::Po for classic Gettext PO catalogs
CatalogStorageFormat::Ndjson for Ferrocat's frontmatter + one-message-per-line NDJSON catalog storage

High-level parsing also takes an explicit CatalogSemantics:

CatalogSemantics::IcuNative is the default and keeps ICU/text messages raw
CatalogSemantics::GettextCompat is the explicit classic gettext plural mode

Important boundaries:

CatalogSemantics::IcuNative only supports PluralEncoding::Icu
CatalogSemantics::GettextCompat only supports PluralEncoding::Gettext
CatalogStorageFormat::Ndjson is available only with CatalogSemantics::IcuNative
native parsing no longer eagerly projects top-level ICU plurals into TranslationShape::Plural

That gives you exactly three supported modes:

classic Gettext catalog mode: Gettext PO + GettextCompat
ICU-native Gettext PO mode: Gettext PO + IcuNative
ICU-native NDJSON catalog mode: NDJSON + IcuNative

parse_catalog intentionally stays as the neutral parse step. If you want keyed lookups or effective-translation helpers, build a richer view explicitly with ParsedCatalog::into_normalized_view().

The normalized view supports both owned-key lookup with CatalogMessageKey and borrowed identity lookup with get_by_parts(msgid, msgctxt), which is useful in host adapters that already have borrowed source strings.

`NormalizedParsedCatalog::compile`

Use this when you want a runtime-facing lookup structure with stable compiled keys rather than raw gettext identities.

This sits one layer above parsed catalog lookup:

start with parse_catalog
build the normalized keyed view
compile to CompiledCatalog for runtime-oriented consumption

The default behavior keeps translations as they exist in the catalog. Optional source-locale fallback is explicit rather than automatic.

The built-in CompiledKeyStrategy::FerrocatV1 contract is intentionally compact:

SHA-256 over a versioned source-identity payload
truncated to 64 bits
encoded as unpadded Base64URL
no visible version prefix in the emitted key
hard compile failure on collisions

`audit_catalogs`

Use this when you want a read-only catalog QA report before a release or CI gate. The audit API accepts normalized catalogs, a required source locale, an optional target-locale filter, optional semantic metadata, and explicit check flags through CatalogAuditOptions.

Default checks cover:

source locale presence
requested target locale presence
missing and empty target translations
target-only active messages that look stale
visible fuzzy flags
obsolete entries
ICU syntax in active source and target messages
ICU source/translation compatibility
semantic metadata duplication, unknown source keys, and source conflicts

Diagnostics are machine-readable. Examples include:

{
  "severity": "error",
  "code": "catalog.missing_translation",
  "message": "Locale `de` is missing translation for `Hello {name}`."
}

{
  "severity": "warning",
  "code": "catalog.extra_translation",
  "message": "Locale `de` contains translation `Old CTA` that is not present in the source catalog."
}

{
  "severity": "info",
  "code": "catalog.fuzzy_flag",
  "message": "Message `Checkout` in locale `de` is marked fuzzy."
}

ICU and metadata checks reuse the stable icu.* and metadata.* diagnostic codes emitted by the underlying validation helpers. The audit layer does not compile runtime artifacts, infer fuzzy matches, use previous_msgid, or repair catalogs.

`compiled_key`

Use this when a host adapter, source transform, or manifest builder needs the same default runtime key that Ferrocat emits during catalog compilation, but only has msgid and optional msgctxt available.

This is the public, host-facing helper for the current default key contract. It corresponds to CompiledKeyStrategy::FerrocatV1.

`compile_catalog_artifact`

Use this when you want the final host-neutral runtime artifact for one requested locale instead of one catalog's typed lookup payload.

This sits one step above NormalizedParsedCatalog::compile:

start from one or more normalized catalogs
choose a requested locale and source locale
optionally provide a fallback chain
compile a final key -> ICU string runtime map
collect missing-message records for non-source locales
validate the final runtime strings as ICU messages

Choose this when your downstream tooling needs locale resolution semantics centralized in Ferrocat instead of rebuilding them in a host adapter.

This is the main boundary for Palamedes-style integrations. Ferrocat should decide the effective locale artifact; Palamedes or another host adapter should decide how that artifact becomes framework modules, sidecars, or runtime payloads.

CompileCatalogArtifactOptions borrows locale strings and the fallback-chain slice, which keeps host-side request assembly cheap even when compilation is performance-sensitive.

Important semantics:

only non-obsolete messages participate in artifact compilation
empty non-source translations are treated as unresolved and can fall through to the fallback chain
source fallback is explicit for non-source locale compilation
source-locale compilation always materializes empty source values from source text
plural messages are emitted as final ICU plural strings using the preserved plural variable
invalid final ICU strings become diagnostics by default and can become hard errors in strict mode
source/translation ICU compatibility diagnostics are opt-in with icu_compatibility

`compile_catalog_artifact_selected`

Use this when a host adapter already knows the exact compiled runtime IDs it needs and wants only that slice of a requested-locale artifact.

This is the narrower companion to compile_catalog_artifact:

build or reuse a CompiledCatalogIdIndex
pass only the selected compiled IDs
keep the same fallback, missing, and ICU-validation semantics
return the same CompiledCatalogArtifact shape, but filtered to the requested subset

Choose this when a bundler/plugin layer has already mapped modules or chunks to the exact message IDs they require.

Like the broader artifact API, the request struct borrows locale data and selection slices, so callers can reuse existing vectors or arrays of compiled IDs without another owned wrapper.

`CompiledCatalogIdIndex`

Use this when you need stable compiled-ID metadata without compiling message payloads immediately.

Useful helpers now include:

iter for deterministic compiled-ID traversal
as_btreemap / into_btreemap when another tool wants the raw ordered mapping
describe_compiled_ids to ask which requested IDs are known, available in a given catalog set, and whether they are singular or plural

describe_compiled_ids returns a structured report:

described for IDs that are known to the index and present in the provided catalog set
unknown_compiled_ids for IDs that do not exist in the index at all
unavailable_compiled_ids for IDs that are known to the index but not present in the provided catalog set

`update_catalog`

Use this for the full high-level catalog update path in memory.

This goes beyond a raw merge. It can:

parse an existing catalog into the canonical model
merge extracted messages from either structured catalog input (CatalogUpdateInput::Structured) or source-first messages (CatalogUpdateInput::SourceFirst)
handle locale/plural logic
apply storage-specific defaults
preserve or report diagnostics
sort and export the final catalog as PO or NDJSON

Choose update_catalog when you want a complete update operation rather than just the lower-level merge step.

Compared with merge_catalog, this is the "full semantics" path. It is the better fit when catalog correctness and consistency matter more than taking the shortest merge route, for example in release pipelines or when you want predictable headers, ordering, plural handling, and diagnostics.

UpdateCatalogOptions borrows locale strings, optional existing content, and optional custom-header maps. The extracted message payload itself stays owned, because that is usually the natural shape for upstream extractor output.

Like parsing, updates now use an explicit storage_format. PO remains the default. NDJSON storage uses a small frontmatter header plus one JSON message record per line.

Choose NDJSON when collaboration and automation matter more than maximum PO fidelity. The one-record-per-line shape keeps large catalog diffs readable and makes ordinary Git conflict handling more practical, including hosted review flows where custom .gitattributes merge drivers may not be part of the web merge path.

Updates also use an explicit CatalogSemantics:

IcuNative is the default and writes raw ICU/text messages
GettextCompat is the explicit PO-interop mode and writes classic gettext plurals

In native mode, CatalogUpdateInput::SourceFirst stays source-text-first; it no longer auto-projects ICU strings into structured plural messages. Use CatalogUpdateInput::Structured when you want explicit plural structure.

In NDJSON storage, arbitrary gettext-style custom headers are intentionally out of scope for v1; only the explicit frontmatter metadata and per-record fields are persisted. Machine-translation metadata uses the optional mt record field and remains part of ferrocat.ndjson.v1.

Machine-translation metadata

CatalogMessage::machine_translation stores optional translation-side metadata for machine-generated catalog entries. The metadata contains model, optional modified, optional confidence, and hash.

Use machine_translation_hash to compute the hash for the current translation. Ferrocat uses SHA-256 over a canonical translation payload plus the fixed ferrocat:mt:v1 namespace, truncates the digest to 128 bits, and encodes it as unpadded Base64URL. This is a change-detection marker, not a security signature.

PO catalogs store the metadata as one compact metadata comment:

#@ ferrocat-mt model=openai/gpt-5.5-high modified=2026-05-12T10:30:00Z confidence=95 hash=...

NDJSON catalogs store the same metadata under mt:

{"id":"Hello","str":"Hallo","mt":{"model":"openai/gpt-5.5-high","confidence":95,"hash":"..."}}

Parsing preserves metadata even when the hash no longer matches. High-level writers such as update_catalog verify the hash while rendering and omit the whole machine-translation metadata block when the translation has changed.

`update_catalog_file`

Use this when you want the same high-level behavior as update_catalog, but against a file path.

It reads the current file if present, runs the full update, and only writes back when the result actually changed.

Choose this for CLI tools, task runners, or build/dev pipelines that work directly on catalog files on disk.

Like update_catalog, it accepts CatalogUpdateInput, so extractor tooling can choose between a raw source-first path and an explicitly structured plural path without having to write PO/NDJSON itself.

UpdateCatalogFileOptions borrows both the path and the locale/header inputs, so file-based automation can call it without constructing throwaway owned request objects.

ICU MessageFormat

Ferrocat's ICU APIs currently target ICU MessageFormat v1. MessageFormat 2 is not parsed, validated, converted, or formatted by the public API. That is a deliberate scope boundary: MF2 is worth tracking, but the current catalog surface gets more practical value from robust MF1 parsing, authoring diagnostics, and artifact validation.

`parse_icu`

Use this when you need the parsed ICU AST.

Typical use cases:

inspecting plural/select structure
converting ICU messages into another internal representation
extracting semantic information from messages

`validate_icu`

Use this when you only need a yes/no syntax check with an error surface.

`analyze_icu`

Use this after parse_icu when you need a structured summary of message arguments, formatter kinds and styles, plural/select selectors, and rich-text tags.

`compare_icu_messages`

Use this to compare source and translated ICU messages before shipping a runtime artifact. The report uses stable diagnostic codes for missing or extra arguments, formatter kind or style changes, tag mismatches, missing select or plural selectors, plural offset changes, and discouraged pattern-style formatters.

At the catalog artifact layer, set icu_compatibility: true on CompileCatalogArtifactOptions or CompileSelectedCatalogArtifactOptions to collect the same diagnostics while compiling the final locale artifact.

`extract_argument_names` / `extract_tag_names`

Use these when data variables and rich-text tags need to be handled separately. The older extract_variables helper still returns the historical mixed view.

`extract_variables`

Use this after parse_icu when you want the variable names referenced by the message.

Semantic message metadata

Use MessageMetadataInput and normalize_message_metadata when an extractor or host adapter wants to describe source-side facts around a message without inventing a separate stable ID scheme. The required identity is still msgid plus optional msgctxt; in Palamedes-style workflows the msgid is usually the source string, while ID-style catalogs can still use an opaque key as msgid.

The metadata shape is progressive. A simple record can be just:

{ "msgid": "Cart" }

Argument metadata can use a shorthand when the host knows only the broad kind:

{
  "msgid": "Hello {name}",
  "args": {
    "name": "string"
  }
}

For ICU MessageFormat v1 messages, Ferrocat can derive normalized arguments, rich-text tags, and select/plural selector metadata from the msgid. Use validate_message_metadata to report conflicts between explicit metadata and the parsed source message. Translations remain catalog data; msgstr is not part of this source-side metadata format.

Error Surface

ApiError and the lower-level ParseError are intentionally small today. ApiError distinguishes parse, I/O, invalid-argument, conflict, and unsupported-operation failures; PO ParseError currently carries a human-readable message but not a line/column position. Adding structured PO error kinds or source positions would be a semver-relevant API change and should ship with matching rustdoc, README/API docs, and an ADR update.

Practical Rule Of Thumb

Editing raw PO files: parse_po + stringify_po
Hot-path PO inspection: parse_po_borrowed
Classic Gettext merge step: merge_catalog
N-way catalog overlays and set operations: combine_catalogs
Full app-level catalog maintenance: update_catalog or update_catalog_file
Parsed catalog consumption with keyed accessors: parse_catalog + into_normalized_view
Locale-specific runtime artifact generation: compile_catalog_artifact
Selected locale artifact generation by compiled ID: compile_catalog_artifact_selected
Release QA across catalog sets: audit_catalogs
ICU analysis: parse_icu

API Overview

Supported Catalog Modes

Quick Choice

Gettext PO Core

parse_po

parse_po_borrowed

stringify_po

Catalog Workflows

merge_catalog

combine_catalogs

parse_catalog

NormalizedParsedCatalog::compile

audit_catalogs

compiled_key

compile_catalog_artifact

compile_catalog_artifact_selected

CompiledCatalogIdIndex

update_catalog

Machine-translation metadata

update_catalog_file

ICU MessageFormat

parse_icu

validate_icu

analyze_icu

compare_icu_messages

extract_argument_names / extract_tag_names

extract_variables

Semantic message metadata

Error Surface

Practical Rule Of Thumb

`parse_po`

`parse_po_borrowed`

`stringify_po`

`merge_catalog`

`combine_catalogs`

`parse_catalog`

`NormalizedParsedCatalog::compile`

`audit_catalogs`

`compiled_key`

`compile_catalog_artifact`

`compile_catalog_artifact_selected`

`CompiledCatalogIdIndex`

`update_catalog`

`update_catalog_file`

`parse_icu`

`validate_icu`

`analyze_icu`

`compare_icu_messages`

`extract_argument_names` / `extract_tag_names`

`extract_variables`