Bundler-Aware Message Sidecars

Future-facing note about bundler-aware message sidecars and runtime distribution.

Bundler-Aware Message Sidecars

This note captures a future-facing idea that likely belongs primarily in Palamedes or another host adapter, but is directly informed by Ferrocat's catalog compilation work.

The central problem is not "how do we parse more PO files?" It is:

how do we keep client bundle size under control for large applications
without forcing manual message sharding onto application authors
while still using stable compiled message IDs

Problem Shape

The naive client-side approach is:

build one ESM catalog module per locale
dynamically import that locale module
return a large in-memory message map to the client runtime

That works, but scales poorly as applications grow:

the client often loads many more messages than the current UI actually needs
large locale modules become expensive to ship, parse, and keep resident
locale variants such as de-CH naturally want overlay semantics rather than another full copy of the world

At the same time, manual message code-splitting is not attractive:

authors should not need to decide which file or route "owns" a message
duplicate source strings across shards make conflict management harder
translators and extraction workflows should not be shaped by bundler internals

Observed Constraints

The most important product constraints for this direction are:

runtime language switching in the same loaded client is intentionally out of scope
a fast reload is preferable to trying to live-switch all translated UI state
top-level translation macro expansion is already considered an anti-pattern because locale selection and i18n bootstrapping can be async

That last point is especially important. If top-level translation lookups are already forbidden, message payloads do not need to exist before JavaScript chunk evaluation. They only need to exist before later render or handler code calls into the translation runtime.

This makes an async sidecar-loading model much more feasible.

Core Idea

Keep normal Vite/Rollup chunking as the primary source of truth, then attach message payloads to those chunks as locale-specific sidecars.

In rough terms:

translation macros continue to produce stable compiled IDs in application code
build tooling records which message IDs each module references
after final bundler chunking is known, the IDs are aggregated from module -> ids into chunk -> ids
for each (chunk, locale) pair, emit a compact message sidecar
emit a manifest that maps JS chunks to their locale-specific sidecars

The result is not "split messages by hand." It is:

let the bundler decide code chunk boundaries
derive message payload boundaries from those same chunk boundaries

This keeps message distribution aligned with the actual loading behavior of the application.

Why This Is Interesting

This approach potentially gives the best parts of two worlds:

application authors keep normal bundler-driven code splitting
message payloads can be loaded only for chunks that are actually used
locale-specific payloads stay much smaller than one giant per-locale catalog
no human-maintained "message shard files" are required

For locale overlays such as de-CH, the same model can later support:

a small de-CH sidecar
fallback to a larger de sidecar
optional fallback to source locale data only when needed

That is conceptually similar to the new compile_catalog_artifact semantics, but the delivery unit changes from "entire requested-locale artifact" to "chunk-addressable locale sidecar."

Suggested Architecture

Build-Time Collection

Do not parse final JavaScript output to discover message usage if it can be avoided.

Prefer a two-stage model:

the macro/plugin layer records message IDs per source module
the bundler integration aggregates those IDs after final chunking is known

This is more robust than scraping emitted code and stays compatible with tree-shaking and chunk renaming.

Emitted Artifacts

The likely outputs are:

normal JS chunks from Vite/Rollup
one locale-specific message sidecar per chunk
one manifest describing which sidecar belongs to which chunk and locale

Conceptually:

assets/
  app-ABC123.js
  checkout-XYZ999.js
  i18n-manifest.json
  i18n/de/app-ABC123.messages.json
  i18n/de/checkout-XYZ999.messages.json
  i18n/de-CH/app-ABC123.messages.json

The sidecar format does not need to be ESM. It could be JSON or another compact lookup-oriented artifact.

Runtime Flow

At runtime:

application decides the locale once during boot
when a chunk is about to be used, its locale sidecar is loaded
sidecar messages are registered in the in-memory translation store
UI code that runs later can resolve t(id) as normal

Because top-level translation usage is intentionally disallowed, the runtime can remain async here without trying to beat chunk evaluation itself.

Dev Server Flow

The same idea becomes even more interesting in development:

JS chunk loads can trigger a corresponding sidecar lookup
the dev server can keep Rust-side catalogs resident and watched
changed translations can be pushed through hot reload without rebuilding a giant locale module

The development protocol might eventually become batch-oriented rather than key-oriented:

load all message IDs needed for chunk X
not one HTTP request per translation key

Where Ferrocat Fits

Ferrocat should probably stay focused on catalog semantics and artifact building, not own the bundler integration itself.

Useful Ferrocat responsibilities for this direction:

stable compiled message IDs
locale fallback semantics
host-neutral catalog artifacts
compact, reproducible message payload generation

Likely Palamedes or host-adapter responsibilities:

mapping modules to message IDs
mapping final chunks to those module-level IDs
emitting sidecars and manifests in bundler output
dev-server protocol and runtime chunk/sidecar loading

Current Ferrocat Status

The original shape described in this note is no longer purely hypothetical. Ferrocat now already exposes the core build-time primitives a host adapter would need to prototype chunk-addressable locale sidecars.

Available today:

compile_catalog_artifact for full requested-locale host-neutral runtime artifacts with fallback resolution, missing reports, and final ICU strings
CompiledCatalogIdIndex for deterministic compiled_id -> source_key indexing across one or more normalized catalogs
compile_catalog_artifact_selected for compiling only a selected subset of compiled runtime IDs into a locale artifact slice
compile_catalog_artifact_selected_report for the same selected compile flow with structured reporting of unknown or unavailable compiled IDs
CompiledCatalogIdIndex::describe_compiled_ids for lightweight metadata about requested compiled IDs, including available locales and singular vs plural shape
CompiledCatalogIdIndex::as_btreemap and into_btreemap for exporting the ordered compiled-ID mapping into host-side caches or manifests

This means the chunk-based ideal is already reachable at build time:

a host adapter collects compiled_ids per module or final chunk
Ferrocat builds or reuses a CompiledCatalogIdIndex
the host adapter calls compile_catalog_artifact_selected per (chunk, locale) pair
the result is emitted as a locale-specific sidecar payload

What remains outside Ferrocat is mostly orchestration rather than missing catalog semantics:

module-to-ID collection in macros or build plugins
aggregation from modules into final bundler chunks
manifest generation and sidecar emission format
runtime registration/loading behavior
dev-server hot reload protocol

One later optimization track still intentionally remains out of scope for now: borrowed or lazy subset compilation that avoids fully materializing normalized catalogs before compiling a selected ID subset. That may become interesting for very large catalogs, but it is not required for the first chunk-sidecar prototype.

Non-Goals

This note is not proposing:

manual per-route message files
runtime language switching within the same loaded client
full locale-permutated builds as the only strategy
putting Rust directly in the browser

Locale-permutated builds are still interesting for some deployment models, but the sidecar concept is attractive precisely because it preserves normal client chunking while reducing message payload size.

Open Questions

what should the sidecar wire format be
how aggressively should shared-chunk message sets be deduplicated
should sidecars contain final locale-resolved strings or layered overlay data
how should locale overlays such as de-CH -> de -> en be represented
how should the runtime coordinate "chunk loaded" vs "messages registered"
how much over-approximation is acceptable when tree-shaking changes module contents after the macro/plugin layer recorded message IDs

Current Recommendation

Treat this as a serious follow-up direction for Palamedes:

keep Ferrocat responsible for semantic compile primitives
explore chunk-addressable locale sidecars in the host integration layer
prefer bundler-aware message distribution over manual message sharding

The idea appears especially promising for large applications where the current "one large locale catalog module" approach becomes increasingly wasteful.