Gettext Task Landscape
A workflow-level map of gettext-inspired catalog jobs, ecosystem patterns, and Ferrocat's product direction.
Gettext Task Landscape
This is an advanced reference page. You do not need to know GNU gettext to use Ferrocat; start with the Guide if you want the product-level path first.
Ferrocat is not trying to be a command-line clone of GNU gettext. It is a new catalog engine that learns from the jobs established translation tooling already proved useful: extract messages, update catalogs, preserve translator context, check release readiness, and produce runtime data applications can load.
That distinction matters because catalog jobs are broader than one tool's command-line flags. Some projects use stable IDs, some use real source strings as msgid, and many need to preserve translator-facing PO metadata either way. Ferrocat's core promise is simpler: keep catalog identity explicit, preserve translations when catalogs evolve, surface conflicts clearly, and compile the result into something applications can load efficiently.
In the Palamedes ecosystem, that means Ferrocat is the catalog and semantics layer while Palamedes is the JS/TS i18n framework layer. The two projects should be read together: one handles the catalog jobs, the other handles application authoring, extraction, adapters, and runtime integration.
Task Matrix
| Task | Gettext idea | Common ecosystem pattern | Ferrocat today | Product direction |
|---|---|---|---|---|
| Extract source messages | xgettext scans code and writes POT files | Babel uses configurable extraction methods for Python apps and templates | Not part of the core crate | Keep extraction host-specific; define clean adapter inputs instead of embedding every language parser |
| Integrate app frameworks | GNU gettext leaves this mostly to language bindings and application code | Framework packages usually own macros, hooks, loaders, and routing conventions | Ferrocat stays host-neutral; Palamedes owns JS/TS framework integration | Keep Ferrocat focused on catalog semantics and let Palamedes bind those semantics into application frameworks |
| Parse and write PO files | PO is the human-editable catalog format | pofile, gettext-parser, and polib expose editable object models | parse_po, parse_po_borrowed, stringify_po | Add richer source positions only when the public error model is ready |
| Update a catalog from new source | msgmerge carries existing translations forward into a new template | Many projects script this around gettext binaries or PO libraries | merge_catalog, update_catalog, update_catalog_file | Prefer exact source-string identity and explicit obsolete strategies over fuzzy-first behavior |
| Combine catalogs | msgcat, msgcomm, and msguniq merge or select messages by occurrence | Usually handled by shell pipelines or ad hoc scripts | combine_catalogs with explicit conflict and selection rules | Expand selection and reporting when real workflows need it |
| Reduce Git merge pain | Usually handled outside gettext through file layout, merge drivers, or process rules | Large teams often rely on line-based review and hosted pull request merges | ICU-native NDJSON stores one message per line | Treat storage format as a collaboration feature, not just an interchange detail |
| Track AI translation metadata | Usually modeled in platform-specific sidecars or external TMS state | AI-assisted workflows often need model, confidence, timestamp, and stale-edit detection | MachineTranslationMetadata on catalog entries, stored in PO or NDJSON | Keep machine-translation provenance with the catalog while clearing it when the translated text changes |
| Filter or edit message attributes | msgattrib and msggrep select fuzzy, untranslated, obsolete, or matching messages | polib-style object traversal makes this easy but unstructured | Low-level parsed data is available; no high-level predicate API yet | Add structured filtering and transform APIs for CI and translator workflows |
| Check completeness | msgcmp checks whether a translation catalog covers a template | CI often shells out or writes custom checks | audit_catalogs reports missing locales, missing translations, empty translations, and target-only messages | Expand reporting only from concrete release workflows |
| Validate translations | msgfmt checks and compiles catalogs; Weblate, FormatJS, and Translate Toolkit add workflow checks | gettext-parser and polib can compile PO/MO; app stacks add their own checks | audit_catalogs reports ICU syntax, ICU compatibility, semantic metadata conflicts, obsolete entries, and visible fuzzy flags | Keep validation reports structured before considering full MO parity |
| Build runtime payloads | msgfmt produces MO files for libintl-style runtime lookup | Node/Python libraries often parse or compile PO/MO directly | compile_catalog_artifact emits host-neutral runtime maps | Keep MO support optional; prioritize modern bundle and runtime artifacts |
| Handle plural semantics | Gettext uses Plural-Forms, msgid_plural, and msgstr[n] | Libraries vary in how much plural logic they understand | GettextCompat for classic PO, IcuNative for ICU messages | Keep semantic mode choices explicit instead of guessing |
| Work with non-UTF-8 catalogs | GNU gettext has deep charset conversion support | gettext-parser handles charset conversion for Buffer input | Ferrocat is UTF-8-oriented today | Document charset boundaries before adding conversion complexity |
Ecosystem Lessons
| Project | Main role | Useful lesson for Ferrocat |
|---|---|---|
| GNU gettext | Full reference toolchain for extraction, merge, manipulation, validation, MO compilation, and runtime lookup | Treat the tools as a catalog of jobs, not as an API shape to copy |
| FormatJS CLI | Extraction and verification for ICU-heavy JavaScript projects | A verify-style report is useful, but Ferrocat should keep it host-neutral |
| Weblate checks | Hosted translation checks with many catalog-quality rules | Checks should be visible, configurable diagnostics rather than hidden runtime fallback |
Translate Toolkit pofilter | Predicate-style PO QA tests | Start with a small report API before adding a broad predicate language |
| Lingui | Application workflow tooling for message extraction, compile, and catalog status | Workflow ergonomics matter, but host-specific extraction belongs outside Ferrocat |
| Paraglide JS | Generated JS runtime modules from message catalogs | Runtime code generation belongs in host adapters; shared catalog semantics belong in Ferrocat |
| Tolgee CLI | Sync-oriented extraction and translation workflow tooling | Sync and repair flows are separate from a read-only shippability report |
pofile | JavaScript PO parse and serialize library | A small editable PO object model is valuable when callers want direct control |
gettext-parser | Node.js PO/MO parse and compile library | MO support, stream parsing, and charset handling are useful comparison points |
Python polib | Mature PO/POT/MO manipulation library | Entry-level mutation and filtering are important everyday workflows |
| Babel | Python localization workflow with pluggable extraction | Extraction belongs near the host language and framework |
Rust polib | Rust PO/MO load, manipulate, and save library | UTF-8-only PO/MO support is a pragmatic baseline to compare against |
gettext-rs | Safe Rust bindings for gettext runtime APIs | Runtime lookup and catalog authoring are different layers and should stay separate |
Product Direction
Ferrocat should keep moving toward task-oriented catalog APIs with exact catalog identity, explicit diagnostics, and practical PO interoperability:
- Predictable catalog maintenance. Whether
msgidis a stable key or real source copy, updates should use exact identity and clear conflicts. Fuzzy matching can be considered later, but it should not hide source changes or silently rewrite translator intent. - Structured diagnostics over terminal text. Library users need data they can show in CI, editors, dashboards, and release tooling.
- Read-only audit before repair. Completeness, ICU, metadata, stale-message, obsolete, and
fuzzychecks should be available as a report before Ferrocat grows any repair or fuzzy-matching behavior. - Explicit conflict policy. Combine and update workflows should make precedence and failure behavior visible in options.
- Collaboration-friendly storage. NDJSON should be positioned as a practical large-team catalog format: one message per line, cleaner pull request diffs, narrower conflicts, and no dependency on custom merge handlers for everyday hosted Git workflows.
- AI translation metadata with cleanup. Machine-generated translations should carry compact model, confidence, modified-time, and hash metadata in the catalog itself, and stale metadata should disappear when the translation text changes.
- Palamedes as the application layer. Ferrocat should expose the catalog behavior Palamedes needs without absorbing framework-specific extraction, bundling, routing, or runtime concerns.
- Host-specific extraction. JavaScript, Rust, Python, Markdown, templates, and framework conventions deserve adapters, not one oversized parser in the core crate.
- Runtime output for modern apps. MO compatibility is useful to evaluate, but host-neutral compiled artifacts are a better default for bundlers, edge runtimes, and multi-language product stacks.
Near-Term Backlog
The most promising gettext-inspired additions are:
- catalog filtering and attribute transforms inspired by
msgattribandmsggrep - catalog filtering and transforms on top of audit results, when the report vocabulary is stable enough
- optional MO read/write evaluation against existing Rust and Node libraries
- clearer extractor adapter contracts for both ID-based and source-as-msgid tools
- tighter Palamedes documentation links for extraction, runtime artifacts, and future chunk-aware sidecars