Skip to content

Gettext Task Landscape

A workflow-level map of gettext-inspired catalog jobs, ecosystem patterns, and Ferrocat's product direction.

Gettext Task Landscape

This is an advanced reference page. You do not need to know GNU gettext to use Ferrocat; start with the Guide if you want the product-level path first.

Ferrocat is not trying to be a command-line clone of GNU gettext. It is a new catalog engine that learns from the jobs established translation tooling already proved useful: extract messages, update catalogs, preserve translator context, check release readiness, and produce runtime data applications can load.

That distinction matters because catalog jobs are broader than one tool's command-line flags. Some projects use stable IDs, some use real source strings as msgid, and many need to preserve translator-facing PO metadata either way. Ferrocat's core promise is simpler: keep catalog identity explicit, preserve translations when catalogs evolve, surface conflicts clearly, and compile the result into something applications can load efficiently.

In the Palamedes ecosystem, that means Ferrocat is the catalog and semantics layer while Palamedes is the JS/TS i18n framework layer. The two projects should be read together: one handles the catalog jobs, the other handles application authoring, extraction, adapters, and runtime integration.

Task Matrix

TaskGettext ideaCommon ecosystem patternFerrocat todayProduct direction
Extract source messagesxgettext scans code and writes POT filesBabel uses configurable extraction methods for Python apps and templatesNot part of the core crateKeep extraction host-specific; define clean adapter inputs instead of embedding every language parser
Integrate app frameworksGNU gettext leaves this mostly to language bindings and application codeFramework packages usually own macros, hooks, loaders, and routing conventionsFerrocat stays host-neutral; Palamedes owns JS/TS framework integrationKeep Ferrocat focused on catalog semantics and let Palamedes bind those semantics into application frameworks
Parse and write PO filesPO is the human-editable catalog formatpofile, gettext-parser, and polib expose editable object modelsparse_po, parse_po_borrowed, stringify_poAdd richer source positions only when the public error model is ready
Update a catalog from new sourcemsgmerge carries existing translations forward into a new templateMany projects script this around gettext binaries or PO librariesmerge_catalog, update_catalog, update_catalog_filePrefer exact source-string identity and explicit obsolete strategies over fuzzy-first behavior
Combine catalogsmsgcat, msgcomm, and msguniq merge or select messages by occurrenceUsually handled by shell pipelines or ad hoc scriptscombine_catalogs with explicit conflict and selection rulesExpand selection and reporting when real workflows need it
Reduce Git merge painUsually handled outside gettext through file layout, merge drivers, or process rulesLarge teams often rely on line-based review and hosted pull request mergesICU-native NDJSON stores one message per lineTreat storage format as a collaboration feature, not just an interchange detail
Track AI translation metadataUsually modeled in platform-specific sidecars or external TMS stateAI-assisted workflows often need model, confidence, timestamp, and stale-edit detectionMachineTranslationMetadata on catalog entries, stored in PO or NDJSONKeep machine-translation provenance with the catalog while clearing it when the translated text changes
Filter or edit message attributesmsgattrib and msggrep select fuzzy, untranslated, obsolete, or matching messagespolib-style object traversal makes this easy but unstructuredLow-level parsed data is available; no high-level predicate API yetAdd structured filtering and transform APIs for CI and translator workflows
Check completenessmsgcmp checks whether a translation catalog covers a templateCI often shells out or writes custom checksaudit_catalogs reports missing locales, missing translations, empty translations, and target-only messagesExpand reporting only from concrete release workflows
Validate translationsmsgfmt checks and compiles catalogs; Weblate, FormatJS, and Translate Toolkit add workflow checksgettext-parser and polib can compile PO/MO; app stacks add their own checksaudit_catalogs reports ICU syntax, ICU compatibility, semantic metadata conflicts, obsolete entries, and visible fuzzy flagsKeep validation reports structured before considering full MO parity
Build runtime payloadsmsgfmt produces MO files for libintl-style runtime lookupNode/Python libraries often parse or compile PO/MO directlycompile_catalog_artifact emits host-neutral runtime mapsKeep MO support optional; prioritize modern bundle and runtime artifacts
Handle plural semanticsGettext uses Plural-Forms, msgid_plural, and msgstr[n]Libraries vary in how much plural logic they understandGettextCompat for classic PO, IcuNative for ICU messagesKeep semantic mode choices explicit instead of guessing
Work with non-UTF-8 catalogsGNU gettext has deep charset conversion supportgettext-parser handles charset conversion for Buffer inputFerrocat is UTF-8-oriented todayDocument charset boundaries before adding conversion complexity

Ecosystem Lessons

ProjectMain roleUseful lesson for Ferrocat
GNU gettextFull reference toolchain for extraction, merge, manipulation, validation, MO compilation, and runtime lookupTreat the tools as a catalog of jobs, not as an API shape to copy
FormatJS CLIExtraction and verification for ICU-heavy JavaScript projectsA verify-style report is useful, but Ferrocat should keep it host-neutral
Weblate checksHosted translation checks with many catalog-quality rulesChecks should be visible, configurable diagnostics rather than hidden runtime fallback
Translate Toolkit pofilterPredicate-style PO QA testsStart with a small report API before adding a broad predicate language
LinguiApplication workflow tooling for message extraction, compile, and catalog statusWorkflow ergonomics matter, but host-specific extraction belongs outside Ferrocat
Paraglide JSGenerated JS runtime modules from message catalogsRuntime code generation belongs in host adapters; shared catalog semantics belong in Ferrocat
Tolgee CLISync-oriented extraction and translation workflow toolingSync and repair flows are separate from a read-only shippability report
pofileJavaScript PO parse and serialize libraryA small editable PO object model is valuable when callers want direct control
gettext-parserNode.js PO/MO parse and compile libraryMO support, stream parsing, and charset handling are useful comparison points
Python polibMature PO/POT/MO manipulation libraryEntry-level mutation and filtering are important everyday workflows
BabelPython localization workflow with pluggable extractionExtraction belongs near the host language and framework
Rust polibRust PO/MO load, manipulate, and save libraryUTF-8-only PO/MO support is a pragmatic baseline to compare against
gettext-rsSafe Rust bindings for gettext runtime APIsRuntime lookup and catalog authoring are different layers and should stay separate

Product Direction

Ferrocat should keep moving toward task-oriented catalog APIs with exact catalog identity, explicit diagnostics, and practical PO interoperability:

  • Predictable catalog maintenance. Whether msgid is a stable key or real source copy, updates should use exact identity and clear conflicts. Fuzzy matching can be considered later, but it should not hide source changes or silently rewrite translator intent.
  • Structured diagnostics over terminal text. Library users need data they can show in CI, editors, dashboards, and release tooling.
  • Read-only audit before repair. Completeness, ICU, metadata, stale-message, obsolete, and fuzzy checks should be available as a report before Ferrocat grows any repair or fuzzy-matching behavior.
  • Explicit conflict policy. Combine and update workflows should make precedence and failure behavior visible in options.
  • Collaboration-friendly storage. NDJSON should be positioned as a practical large-team catalog format: one message per line, cleaner pull request diffs, narrower conflicts, and no dependency on custom merge handlers for everyday hosted Git workflows.
  • AI translation metadata with cleanup. Machine-generated translations should carry compact model, confidence, modified-time, and hash metadata in the catalog itself, and stale metadata should disappear when the translation text changes.
  • Palamedes as the application layer. Ferrocat should expose the catalog behavior Palamedes needs without absorbing framework-specific extraction, bundling, routing, or runtime concerns.
  • Host-specific extraction. JavaScript, Rust, Python, Markdown, templates, and framework conventions deserve adapters, not one oversized parser in the core crate.
  • Runtime output for modern apps. MO compatibility is useful to evaluate, but host-neutral compiled artifacts are a better default for bundlers, edge runtimes, and multi-language product stacks.

Near-Term Backlog

The most promising gettext-inspired additions are:

  • catalog filtering and attribute transforms inspired by msgattrib and msggrep
  • catalog filtering and transforms on top of audit results, when the report vocabulary is stable enough
  • optional MO read/write evaluation against existing Rust and Node libraries
  • clearer extractor adapter contracts for both ID-based and source-as-msgid tools
  • tighter Palamedes documentation links for extraction, runtime artifacts, and future chunk-aware sidecars