Skip to content

ADR 0003: Use Byte-Oriented Scanning and Separate Structure From Semantics

  • Status: Accepted
  • Date: 2026-03-15

Context

Initial profiling showed major costs in:

  • line splitting and trimming through string pattern APIs
  • prefix checks on &str
  • early UTF-8 and string materialization

PO syntax is structurally ASCII even when message content is not. That makes byte-oriented structural scanning a natural fit.

Decision

Implement the hot parsing path around byte scanning and keep structural helpers in dedicated scanner code.

This includes:

  • line scanning on &[u8]
  • keyword and comment classification on bytes
  • memchr-style structural searches
  • delaying string materialization as long as possible

Consequences

Positive:

  • materially better parser throughput
  • clearer future seam for SIMD/NEON backends
  • reduced dependence on general-purpose string search machinery

Negative:

  • more explicit UTF-8 invariants must be documented and upheld
  • code is lower-level than a straightforward string parser
  • some debugging tasks become more byte-centric