public final class SourceSpanLedger extends Object
SourceSpan.
This is the Codex Round-3 insight that makes "no byte ever lost" a
concrete guarantee. Pp2TokenStreamBuilder (S7) is allowed to
normalize whitespace and CRLF because the ledger preserves the raw
bytes; the assembler (S15) can restore them when emitting output.
For every ledger l produced by build(String, TSourceTokenList):
l.getSpans().get(0).getStartOffset() == 0 l.getSpans().get(N-1).getEndOffset() == input.length() span[i].getEndOffset() == span[i+1].getStartOffset() // no gaps, no overlaps Σ span.getText() == original input // bytes recovered
Any range the tokenizer did not cover — usually a vendor-parser
tokenization gap, occasionally an unclosed string the lexer truncated —
is recorded as a SourceSpan.Kind.RAW_FALLBACK span with the
verbatim source text. The engine emits a FormatDiagnostic for
each such span so the result advertises the recovery. Legal SQL on the
five major vendors produces zero RAW_FALLBACK spans (verified
by SourceSpanLedgerTest).
Plan reference: §7.3/S8, §7.4/S8, §10.3 (the construction sketch).
| Modifier and Type | Method and Description |
|---|---|
static SourceSpanLedger |
build(String source,
TSourceTokenList tokens)
Build a ledger from the original SQL string and the parser's token
list.
|
static SourceSpan.Kind |
classifyToken(ETokenType type)
Classify a token type into a
SourceSpan.Kind. |
List<FormatDiagnostic> |
getDiagnostics()
Diagnostics emitted while building the ledger.
|
int |
getRawFallbackCount()
Number of
SourceSpan.Kind.RAW_FALLBACK spans in the ledger. |
String |
getSource()
The original input the ledger was built from.
|
List<SourceSpan> |
getSpans()
Unmodifiable, byte-order-stable list of every span.
|
String |
reconstruct()
Convenience: rebuild the source by concatenating every span's text.
|
String |
toString() |
public static SourceSpanLedger build(String source, TSourceTokenList tokens)
offset order. Malformed tokenizer
output — negative offsets, tokens past the input boundary, or
overlapping/out-of-order spans (observed with CRLF line endings after an
unbalanced construct on Windows / git autocrlf) — is tolerated,
not fatal: out-of-range tokens are skipped or clamped, overlaps are
clamped to the running cursor, and a FormatDiagnostic records
each recovery. This keeps the fault-tolerant contract (no throw; every
input byte still owned by exactly one span via RAW_FALLBACK gap filling).NullPointerException - if source or tokens is nullpublic static SourceSpan.Kind classifyToken(ETokenType type)
SourceSpan.Kind. Whitespace is
SourceSpan.Kind.TRIVIA; comments, string literals, and
quoted identifiers are SourceSpan.Kind.PROTECTED; everything
else is SourceSpan.Kind.TOKEN. Slice S9 extends this with
finer-grained protected-zone detection (hints, NO_FORMAT blocks).public List<SourceSpan> getSpans()
public int getRawFallbackCount()
SourceSpan.Kind.RAW_FALLBACK spans in the ledger.public List<FormatDiagnostic> getDiagnostics()
FormatDiagnostic.Severity#WARNING entry per
SourceSpan.Kind.RAW_FALLBACK span; the engine appends these
to the per-call Pp2FormatResult.diagnostics list (S16).
Returns an unmodifiable view.public String reconstruct()