public final class MultiWordKeywordMerger extends Object
LEFT OUTER JOIN, UNION ALL, GROUP BY) so the
lexical island pipeline can treat each run as a single keyword unit.
The merger is read-only: it returns a list of MultiWordKeywordMerger.Match
spans and does not mutate any Pp2Token or its roles. (Role mutation
is reserved for the designated annotator stages — S9 zone detector, S21
clause annotator, S33 overlay — not this slice.) Later pipeline stages
consume the match list to know which token indices belong to one keyword.
MultiWordKeywordTable merge. GROUP BY merges;
LEFT JOIN does not (only the three-word LEFT OUTER JOIN
form is in the table).TSQLion lexical scan. The curated
MultiWordKeywordTable allowlist contains no phrases where an
earlier shorter match would preclude a later longer one, so the two
agree here. If a future table introduces such overlaps, revisit this
to a candidate-generate-then-select-by-length strategy.Plan reference: §7.3/S18, §7.4/S18.
| Modifier and Type | Class and Description |
|---|---|
static class |
MultiWordKeywordMerger.Match
An immutable multi-word keyword span over a
Pp2TokenStream:
the inclusive token-index range and the canonical upper-cased phrase. |
| Constructor and Description |
|---|
MultiWordKeywordMerger() |
| Modifier and Type | Method and Description |
|---|---|
List<MultiWordKeywordMerger.Match> |
findMatches(Pp2TokenStream stream,
EDbVendor vendor)
Find all multi-word keyword spans in
stream for vendor. |
public MultiWordKeywordMerger()
public List<MultiWordKeywordMerger.Match> findMatches(Pp2TokenStream stream, EDbVendor vendor)
stream for vendor.stream - the token stream to scan; must not be nullvendor - the dialect whose phrase table to use; must not be nullNullPointerException - if stream or vendor is null