ADR 0051 — String literals and template-block delimiter boundary
- Status: accepted
- Date: 2026-05-22
- Spec target: XTL 0.1
- Affects: language.md (Template Blocks, Literals), grammar.ebnf, ADR-0021, ADR-0028, normalizer
Context
Audit of the reference implementation (src/normalizer.ts:3) found
that the template-block tokenizer is a single non-greedy regex —
/\{\{(.*?)\}\}/g — that is not aware of string literals. As a
consequence, the first }} encountered after a {{ always closes
the block, even when that }} sits inside a "..." literal.
Concrete shapes that silently fall through today:
{{ "abc}}def" }}— block closes at the first}}inside the string. The lexer handsexpression =(or a truncated form) downstream; the remainderdef" }}flows back into the cell as literal text. Author intent — a string containing}}— is unreachable.{{ TEXT(x, "MM}}DD") }}— same shape. The format-stringMM}}DDis impossible to express.{{ "x{{y" }}— block opens at the leftmost{{. The inner{{appears inside the matched expression. The next regex iteration re-scans from after the cell's first}}and may try to open a second, overlapping block. Output is silently wrong.{{ "{{nested}}" }}— nested-looking shapes are impossible to author because{{inside the string is consumed as a separator by the outer regex on the next scan pass.
ADR-0028 already pins string-literal internal rules ("no escape
sequences, matched pair only"); ADR-0021 already pins {{ }} as a
parse error. Neither addresses the delimiter-recognition boundary
inside a string literal, and grammar.ebnf is explicit that
delimiter scoping is out of scope ("the block delimiters
themselves … are described in evaluation.md") — yet evaluation.md is
also silent on this surface. This is the largest remaining silent
fallthrough in the lexer.
A parallel ambiguity exists for column_name (allowed to contain
}, }}, {, {{ per grammar.ebnf): a column header named
A}}B would, at a different parse layer, also collide with the
delimiter scan. The reference impl is unaffected because column
names are read from data cells (not template-block contents), but
the spec should name the interaction.
Considered Options
A. Pin "first }} wins" as normative; document the workaround.
The lexer scans for }} without string-literal awareness. Authors
who need a literal }} or {{ inside a value hold it in
__config__ / __inputs__ and reference it via
{{ __config__[key] }}. Mirrors ADR-0028's "-inside-string stance.
Smallest impl change; aligns with the "template is the handover
artifact" thesis (the cell content must be straightforwardly
readable).
B. String-literal-aware tokenizer. Make the lexer track quote
state so }} inside "..." does not close the block. Pro: authors
can express }} in values directly. Con: every port has to
reimplement the same state machine; ADR-0028's "no escape sequences"
stance means "..." cannot itself contain ", so quote tracking is
simpler than full Excel, but still non-trivial; one new ADR-shaped
surface to maintain.
C. Detect ambiguity and raise. Lex normally (option A), but if
the normalized expression body contains an unbalanced ", raise
xl3/parser/unbalanced-quote instead of silent acceptance. Pro:
fail-loud per ADR-0027 / ADR-0029 audit theme. Con: a user who
writes {{ "abc}}def" }} does not "see" the block boundary error —
they see "unbalanced quote" which is the symptom, not the cause.
Decision
Adopt A + C combined:
- First
}}wins is normative. The block delimiter scanner makes a single left-to-right pass and is NOT string-literal-aware. - Detect and raise on unbalanced literals to convert silent fallthrough into a stable error.
Normative rules (added to evaluation.md § "Template Blocks")
A template block is opened by the substring {{ and closed by the
first subsequent substring }} in cell-text order. The block
delimiter scanner MUST NOT track quote, bracket, or parenthesis
state — the lexical boundary is purely textual.
This means:
- A
}}sequence inside a"..."string literal CLOSES the block. - A
{{sequence inside a"..."string literal does NOT re-open a nested block; it appears in the expression body verbatim and triggers a parse error from the expression parser ({{is not a valid token in XTL expression grammar). - Authors who need a literal
}},{{, or{}pair inside a rendered value MUST hold the value in__config__[key]or__inputs__[name](cell content can contain any character) and reference it via{{ __config__[key] }}. This mirrors the"-inside-string workaround pinned by ADR-0028.
Detection — xl3/parser/unbalanced-literal
After block extraction, an expression body whose " character count
is odd raises xl3/parser/unbalanced-literal BEFORE expression
parsing proceeds. Diagnostic substring (stable for fixtures):
Template block contains an unbalanced string literal;
}}inside"..."does not close the block. Use__config__for values containing literal}}or{{.
This catches the most common silent-fallthrough shape:
{{ "abc}}def" }} parses as a block containing "abc (one quote,
unbalanced) → raise. The author sees the actionable cause, not the
downstream "unexpected token" symptom.
The error is also raised for the inverse — an expression body whose
trailing characters include an unmatched closing " from a literal
that crossed an erroneously-detected delimiter.
Precedence vs. ADR-0028 unbalanced quote
ADR-0028 § "String literal constraints" left unbalanced or
duplicated quote shapes ("a"b", "a) as implementation-
defined. This ADR introduces a more specific error
(xl3/parser/unbalanced-literal) for the same shape, fired from a
deterministic odd-quote-count check at the parser front-end.
Precedence rule: an expression body whose " count is odd raises
xl3/parser/unbalanced-literal BEFORE expression parsing
proceeds, in all implementations. This tightens ADR-0028's
"implementation-defined" stance to a uniform error code. ADR-0028
remains the source of truth for what the violation is; this ADR
adds which code fires.
An impl that previously accepted {{ "a"b" }} (treating it as
"first matched pair, rest literal") now errors. The author-fix is
either to balance the quotes or hold the value in __config__.
Column-name interaction
column_name per grammar.ebnf may contain {, }, {{, and
even }} as part of the column name itself (column names exclude
only ], CR, LF). Such names exist in source data cells and are
read correctly during source-header extraction — the data-cell
reader is independent of the template-block delimiter scanner.
BUT: such a column cannot be referenced from a template block
using the bracket form. The expression {{ [A}}B] }} parses as
block body [A (the outer scanner closes at the first inner
}}) followed by literal text B] }}, which is silently wrong.
The "first }} wins" rule (the ADR-0051 decision) makes any
column whose header text contains }} unreachable via the
[Column] shorthand.
Authors with such a column MUST rename upstream. Recommended fix:
strip }} from the header in the source workbook before
conversion. The reference impl emits a warning when a source
header contains }} (warnings MUST NOT change output semantics
per evaluation.md § "Errors"); ports SHOULD do the same.
A column whose header text contains { (but not }}) is
reachable: {{ [A{B] }} — the outer scanner closes at the
trailing }} and the bracket-field regex consumes the { as part
of the column name. Only the }} substring inside a column name
is problematic.
Consequences
- Templates that authored
{{ "abc}}def" }}(extremely rare; no known production template does this — the JXLS / cookbook corpus uses__config__for arbitrary literals) now error loudly at parse time with a precise diagnostic. The same author-fix applies as ADR-0028: hold the value in__config__. - One new error code (
xl3/parser/unbalanced-literal) added to the ADR-0015 catalog and snapshot. - The grammar's silent stance ("delimiters not modeled here") gains a normative companion clause in evaluation.md.
- Conformance fixtures pin three cases:
141-string-literal-with-embedded-delimiter-error—{{ "a}}b" }}raisesxl3/parser/unbalanced-literal.142-string-literal-with-embedded-open-delimiter-error—{{ "x{{y" }}raises the same error (the{{inside the matched body fails the expression parser; the unbalanced-quote pre-check catches it first because the body is{{ "x— odd quote count).143-config-workaround-for-literal-braces— a__config__key holding the literal string}}-marker-{{renders verbatim through{{ __config__[key] }}.
- Reference impl change is small: add a
countUnescapedQuoteshelper, run it on each extracted block body, throw the new code when odd. - Porters MUST implement the same scanner shape (first-
}}-wins) and the same unbalanced-quote pre-check. A more aggressive port that implements option B (literal-aware tokenizer) is a strict superset — it would accept more shapes — but the conformance corpus only asserts the option A baseline.
References
- ADR-0021 — Implementation-defined boundaries (empty-block error precedent)
- ADR-0027 — Reserved column names + directive validation (the "convert silent fallthrough into a coded error" pattern)
- ADR-0028 — Literal syntax constraints (the
"-inside-string workaround precedent this ADR mirrors) - ADR-0029 — Directive composition (audit-pass theme)
- grammar.ebnf § "Lexical convention" and § "string_literal"
src/normalizer.ts§TEMPLATE_BLOCK_RE- evaluation.md § "Template Blocks"