PO File Format
Understanding the GNU gettext PO file structure and content formats
The PO (Portable Object) format is the standard for software translation since the 1990s. pofile-ts fully supports this format.
What We Support
| Feature | Supported | Example |
|---|---|---|
| Singular translations | ✅ | msgid / msgstr |
| Plural forms | ✅ | msgid_plural / msgstr[0], msgstr[1], ... |
| Message context | ✅ | msgctxt |
| Comments | ✅ | #, #., #:, #, |
| Flags | ✅ | fuzzy, no-wrap, etc. |
| Metadata | ✅ | #@ key: value |
| Obsolete entries | ✅ | #~ |
| All UTF-8 content | ✅ | — |
Content Agnostic
pofile-ts parses the PO structure — it doesn't interpret what's inside msgid or msgstr. Your strings can contain:
- Plain text:
"Hello, World!" - Unicode ICU MessageFormat (v1):
"{count, plural, one {# item} other {# items}}" - Unicode MessageFormat 2.0:
.match {$count} one {{...}} * {{...}} - Any other format your i18n library uses
This makes pofile-ts a universal PO parser that works with any translation workflow.
Important: There is no normalization between formats. Native Gettext plurals and ICU-embedded plurals result in different data structures.
// Native Gettext: msgid_plural set, msgstr has multiple entries
{ msgid_plural: "{count} items", msgstr: ["Ein Element", "{count} Elemente"] }
// ICU embedded: msgid_plural null, msgstr has one entry with ICU syntax
{ msgid_plural: null, msgstr: ["{count, plural, one {...} other {...}}"] }Gettext Plurals vs ICU Plurals
There are two ways to handle pluralization in PO files.
Native Gettext Plurals
The original Gettext plural system uses indexed forms defined by a formula in the header:
# Header defines plural rules
"Plural-Forms: nplurals=2; plural=(n != 1);\n"
# German (2 forms: singular, plural)
msgid "One item"
msgid_plural "{count} items"
msgstr[0] "Ein Element"
msgstr[1] "{count} Elemente"
# Arabic (6 forms: zero, one, two, few, many, other)
msgstr[0] "..." # zero
msgstr[1] "..." # one
msgstr[2] "..." # two
msgstr[3] "..." # few
msgstr[4] "..." # many
msgstr[5] "..." # otherICU MessageFormat (embedded in msgstr)
Libraries like Lingui embed ICU MessageFormat syntax directly in the msgstr:
# ICU plural embedded in single msgstr
msgid "{count, plural, one {# item} other {# items}}"
msgstr "{count, plural, one {# Element} other {# Elemente}}"Comparison
| Feature | Gettext Plurals | ICU in msgstr |
|---|---|---|
| Plural forms | ✅ Index-based | ✅ Named |
| Select (gender) | ❌ | ✅ |
| SelectOrdinal | ❌ | ✅ |
| Nested constructs | ❌ | ✅ |
| TMS support | ✅ Native | ⚠️ Requires ICU-aware tools |
| Translator UX | ✅ Simple | ⚠️ Must understand ICU syntax |
TL;DR: Use native Gettext plurals for simple pluralization. Use ICU when you need select, selectordinal, or nested constructs.
Metadata Comments
pofile-ts supports custom metadata as key-value pairs on each translation entry. This is useful for tracking translation sources, timestamps, confidence scores, and other tool-specific data.
Format
Metadata uses the #@ comment prefix:
#@ origin: LLM
#@ modified: 2024-01-15
#@ confidence: 0.95
#@ reviewer: john.doe
msgid "Welcome"
msgstr "Willkommen"Usage
import { createItem, parsePo } from "pofile-ts"
// Create item with metadata
const item = createItem()
item.msgid = "Welcome"
item.msgstr = ["Willkommen"]
item.metadata = {
origin: "LLM",
modified: "2024-01-15",
confidence: "0.95"
}
// Parse existing metadata
const po = parsePo(poContent)
const origin = po.items[0]?.metadata.origin // "LLM"Common Use Cases
| Key | Example Value | Description |
|---|---|---|
origin | LLM, TMS, manual | Translation source |
modified | 2024-01-15 | Last modification date |
confidence | 0.95 | Machine translation confidence |
reviewer | john.doe | Human reviewer |
timestamp | 2024-01-15T10:30:00Z | ISO timestamp |
model | gpt-4 | LLM model used |
version | 2 | Translation revision number |
Note: The #@ prefix is a pofile-ts extension. While valid PO syntax, other tools may not
recognize these comments as structured metadata.