public final class SemanticIRBuilder extends Object
SemanticProgram from a parsed and resolved
TSelectSqlStatement.
Current scope (after slice 9): SELECT with one or more base-table or
CTE sources, optional WHERE, optional JOIN of base tables with ON
conditions, optional GROUP BY (slice 6), optional WITH clause including
chained CTEs (each CTE sees the ones declared strictly before it),
optional FROM-clause subquery (slice 5), optional row-deduplication via
SELECT DISTINCT or Oracle's SELECT UNIQUE synonym
(slice 8 — see StatementGraph.isDistinct()), optional ORDER BY
over physical column references or column-bearing expressions
(slice 9 — see StatementGraph.getOrderByColumnRefs()).
Expression projections like salary * 2 AS doubled or
a.x + a.y are accepted and marked
OutputColumn.isDerived(); aggregate function calls (slice 6)
are flagged via OutputColumn.isAggregate().
Slice 9 lifts ORDER BY for sort keys that are physical
column references or expressions over them. The collected references
surface as StatementGraph.getOrderByColumnRefs(). Sort
direction (ASC/DESC) and null placement
(NULLS FIRST/NULLS LAST) are presentation metadata
and are not modelled. Ordinal forms (ORDER BY 1) and
projection-alias forms (SELECT id AS x ... ORDER BY x) are
rejected so the dependency information is never silently lost; a
later slice can model output-position references explicitly. The
canonical lineage model (slice 7) deliberately ignores ORDER BY —
sort order changes presentation, not column dependency or row-set
membership.
Row-limit clauses (LIMIT, TOP, OFFSET,
FETCH FIRST) are rejected statement-wide, including the
SQL Server-style ORDER BY ... OFFSET ... FETCH NEXT. With
a row-limit present, ORDER BY ceases to be presentation-only
and starts deciding which rows survive — the canonical-model
exclusion would no longer be sound, so the entire statement is out
of scope until a future slice models row-limit semantics.
Slice 10 lifts HAVING: the predicate's column references
are collected into StatementGraph.getHavingColumnRefs() via
buildHavingColumnRefs(gudusoft.gsqlparser.stmt.TSelectSqlStatement, gudusoft.gsqlparser.ir.semantic.binding.NameBindingProvider). The same visitor pattern as projection
and ORDER BY rejects subqueries (scalar, EXISTS, IN-SELECT, ANY/ALL/
SOME) and window functions before collectColumnRefs(gudusoft.gsqlparser.nodes.TParseTreeNode, gudusoft.gsqlparser.ir.semantic.binding.NameBindingProvider) runs, so
inner-scope refs never leak. HAVING without GROUP BY is supported (the
parser still attaches a TGroupBy node with empty items).
HAVING is row-influence semantically but does not contribute to the
canonical lineage model — see
StatementGraph.getHavingColumnRefs() for why.
Slice 11 lifted uncorrelated scalar subqueries in projection;
scalar bodies are extracted as their own statements via
extractScalarSubqueriesAsStatements(gudusoft.gsqlparser.stmt.TSelectSqlStatement, gudusoft.gsqlparser.ir.semantic.binding.NameBindingProvider, java.util.List<gudusoft.gsqlparser.ir.semantic.StatementGraph>, java.util.List<gudusoft.gsqlparser.ir.semantic.LineageEdge>, java.util.Map<java.lang.String, java.lang.Integer>, gudusoft.gsqlparser.ir.semantic.builder.SemanticIRBuilder.EnclosingScope, boolean) with the synthetic-name
convention <scalar_subquery_<index>>.
Slice 12 lifts set operations (UNION / UNION ALL / INTERSECT /
INTERSECT ALL / MINUS / MINUS ALL / EXCEPT / EXCEPT ALL) at the top
level and as CTE bodies. Each branch becomes its own
StatementGraph with synthetic name
<set_op_branch_<index>>; the outer set-op statement carries
empty relations and lineage edges fan out per-position to
each branch. The flatten descends the left-leaning AST iteratively
(per CLAUDE.md — no recursion on leftStmt/rightStmt).
See buildSetOpProgram(gudusoft.gsqlparser.stmt.TSelectSqlStatement, gudusoft.gsqlparser.ir.semantic.binding.NameBindingProvider, java.util.List<gudusoft.gsqlparser.ir.semantic.StatementGraph>, java.util.List<gudusoft.gsqlparser.ir.semantic.LineageEdge>, java.util.Map<java.lang.String, java.lang.Integer>, java.lang.String, boolean).
Slice 22 lifts window-function frame clauses
(ROWS/RANGE/GROUPS BETWEEN ...); the frame
unit, start bound, and optional end bound are captured in
WindowFrame hung off
WindowSpec.getFrame(). Frame info is presentation-only
(dlineage XML harvests no frame information) and does NOT contribute
to the canonical lineage model — same status as slice-13's
PARTITION BY / OVER ORDER BY refs. Per-bound EXCLUDE clauses
(Netezza-reachable) and non-constant offsets (PG
simple_object_name_t, ANSI parenthesis_t) are still
rejected.
Still rejected: WITH RECURSIVE, DISTINCT ON (...)
and other non-DISTINCT/UNIQUE row-filters,
scalar-body constant-only projections (zero column refs),
correlated scalar subqueries, scalar bodies with
subqueries in WHERE/JOIN ON/GROUP BY, multi-column scalar inner,
scalar subqueries embedded in larger projection expressions including
EXISTS-in-projection, embedded window functions in larger projection
expressions, window functions in scalar-subquery bodies, window
functions in WHERE/JOIN ON/GROUP BY/HAVING/ORDER BY, empty
OVER (), frame clauses with non-constant offsets (PG
simple_object_name_t, ANSI parenthesis_t), frame
EXCLUDE clauses (Netezza-reachable), named windows,
vendor-specific window extensions (FILTER (WHERE ...),
WITHIN GROUP,
KEEP DENSE_RANK, Hive DISTRIBUTE BY/CLUSTER BY/
SORT BY/PARTITION BY ... SORT (...)), non-physical
PARTITION BY / OVER ORDER BY refs (literals,
subqueries, function calls, expressions, expression-alias references),
window function names outside the slice-13 allowlist,
(slice 63 lifts explicit CROSS JOIN, slice 64 lifts
JOIN ... USING (...), and slice 66 lifts NATURAL JOIN
at outer / CTE-body / FROM-subquery-body call sites; all three stay
rejected inside scalar / set-op-branch / set-op-CTE / predicate bodies;
NATURAL additionally requires resolvable catalog metadata on both
sides, with a side-specific reject otherwise), duplicate aliases,
Oracle
ORDER SIBLINGS BY, Teradata ORDER BY ... RESET WHEN,
row-limit clauses, ORDER BY ordinals/aliases, Teradata QUALIFY
clause, set operations nested in FROM-subquery / scalar bodies,
mixed-operator and mixed-_ALL set-op chains, set-op outer
ORDER BY / row-limit clauses, set-op internal-node modifiers, branch
column-count mismatch, set-op branches with FROM-subquery / scalar
projection / their own CTE list, nested WITH on set-op CTE body. The
builder fails fast outside this scope so callers see the unsupported
case immediately rather than receiving a half-built IR.
| Modifier and Type | Class and Description |
|---|---|
static class |
SemanticIRBuilder.SemanticIRBuildException
Thrown when the input falls outside current builder scope or a
binding fails.
|
| Modifier and Type | Field and Description |
|---|---|
static String |
PREDICATE_BODY_PREFIX
Reserved name prefix for synthetic predicate-subquery body statements
(slice 23 — uncorrelated EXISTS extracted from outer-SELECT JOIN ON).
|
static String |
SCALAR_BODY_PREFIX
Reserved name prefix for synthetic scalar-subquery body
statements (slice 11).
|
static String |
SET_OP_BRANCH_PREFIX
Reserved name prefix for synthetic set-op-branch body statements
(slice 12).
|
| Modifier and Type | Method and Description |
|---|---|
static SemanticProgram |
build(TSelectSqlStatement select,
NameBindingProvider provider) |
static SemanticProgram |
buildCreateTable(TCreateTableSqlStatement create,
NameBindingProvider provider)
Slice 79 — admit a single
CREATE TABLE target [(c1, ...)] AS
SELECT ... (CTAS) statement. |
static SemanticProgram |
buildCreateView(TCreateViewSqlStatement create,
NameBindingProvider provider)
Slice 79 — admit a single
CREATE [OR REPLACE] VIEW v [(c1, ...)] AS SELECT ...
statement. |
static SemanticProgram |
buildDelete(TDeleteSqlStatement delete,
NameBindingProvider provider)
Slice 81 / slice 84 — admit single-target and joined
DELETE statements and produce a "DELETE"-kind
StatementGraph (§8.1.4 row D11 follow-up via slice 84's
joined-DELETE candidate (a)). |
static SemanticProgram |
buildInsert(TInsertSqlStatement insert,
NameBindingProvider provider)
Slice 78 — admit a single
INSERT INTO target SELECT ...
statement. |
static SemanticProgram |
buildMerge(TMergeSqlStatement merge,
NameBindingProvider provider)
Slice 94 — admit the single-target MERGE skeleton:
|
static SemanticProgram |
buildUpdate(TUpdateSqlStatement update,
NameBindingProvider provider)
Slice 80 / 82 — admit {@code UPDATE target SET c1 = expr1,
c2 = expr2, ...
|
static boolean |
isPredicateSubquerySyntheticName(String name)
True iff
name is a synthetic predicate-subquery-body name
created by this builder (slice 23). |
static boolean |
isScalarSyntheticName(String name)
True iff
name is a synthetic scalar-subquery-body name
created by this builder (slice 11). |
static boolean |
isSetOpBranchSyntheticName(String name)
True iff
name is a synthetic set-op-branch-body name
created by this builder (slice 12). |
public static final String SCALAR_BODY_PREFIX
"<scalar_subquery_<index>>"; the angle brackets ensure
no collision with real CTE names or FROM-clause aliases.
isScalarSyntheticName(String) is the only authorised
detector — both this builder and
SemanticIRProjector.BodyIndexes use it so the convention
lives in one place.public static final String SET_OP_BRANCH_PREFIX
"<set_op_branch_<index>>";
the angle brackets ensure no collision with real CTE names or
FROM-clause aliases. isSetOpBranchSyntheticName(String) is
the only authorised detector — both this builder and
SemanticIRProjector.BodyIndexes use it so the convention
lives in one place (slice-11 process lesson #10 generalised).public static final String PREDICATE_BODY_PREFIX
"<predicate_subquery_<index>>"; the angle
brackets ensure no collision with real CTE names or FROM-clause aliases.
isPredicateSubquerySyntheticName(String) is the only authorised
detector — both this builder and SemanticIRProjector.BodyIndexes
use it so the convention lives in one place.public static boolean isScalarSyntheticName(String name)
name is a synthetic scalar-subquery-body name
created by this builder (slice 11). Used by
SemanticIRProjector.BodyIndexes to skip such bodies when
building the CTE/FROM-subquery name lookup tables — scalar
bodies are reached only via lineage edges, never via relations.
The match is strict: the name must be the full reserved
pattern <scalar_subquery_<digits>>. A real CTE alias
that happens to start with <scalar_subquery_ but
doesn't match the digits-and-closing-bracket suffix is NOT
skipped (codex impl-review round-1 SHOULD 2).
public static boolean isSetOpBranchSyntheticName(String name)
name is a synthetic set-op-branch-body name
created by this builder (slice 12). Used by
SemanticIRProjector.BodyIndexes to skip such bodies when
building the CTE/FROM-subquery name lookup tables — set-op
branches are reached only via lineage edges, never via relations.
The match is strict: the name must be the full reserved
pattern <set_op_branch_<digits>>.
public static boolean isPredicateSubquerySyntheticName(String name)
name is a synthetic predicate-subquery-body name
created by this builder (slice 23). Used by
SemanticIRProjector.BodyIndexes to skip such bodies when
building the CTE/FROM-subquery name lookup tables — predicate-subquery
bodies are unreachable from outer (no relation edge, no lineage edge).public static SemanticProgram build(TSelectSqlStatement select, NameBindingProvider provider)
public static SemanticProgram buildInsert(TInsertSqlStatement insert, NameBindingProvider provider)
INSERT INTO target SELECT ...
statement. Builds the source SELECT via build(gudusoft.gsqlparser.stmt.TSelectSqlStatement, gudusoft.gsqlparser.ir.semantic.binding.NameBindingProvider) (reusing
the existing pipeline unchanged), then appends an "INSERT"-
kind StatementGraph carrying the target relation and
cross-statement lineage edges.
Admitted shape: INSERT INTO <target> [(c1, c2, ...)]
<subquery-SELECT>. Rejections:
EInsertSource.values, values_empty,
default_values, execute,
values_function, values_multi_table,
hive_query, values_oracle_record,
set_column_value, value_table →
DiagnosticCode.INSERT_SOURCE_NOT_SUPPORTED.INSERT ALL / INSERT FIRST →
DiagnosticCode.INSERT_MULTI_TABLE_NOT_SUPPORTED.
Hive multi-insert (multiInsertStatements non-empty) is
routed to buildHiveMultiInsert(gudusoft.gsqlparser.stmt.TInsertSqlStatement, gudusoft.gsqlparser.ir.semantic.binding.NameBindingProvider) instead of rejected.DiagnosticCode.INSERT_TARGET_MISSING.DiagnosticCode.INSERT_COLUMN_COUNT_MISMATCH.The source SELECT is built first via build() and its
full SemanticProgram (CTE bodies + scalar bodies +
FROM-subquery bodies + outer SELECT + cross-stmt lineage) is
appended verbatim to the returned program. The INSERT
StatementGraph is appended LAST; its
relations lists the source
SELECT as a single RelationKind.SUBQUERY entry whose
qualifiedName is the source SELECT's outer-statement name
(synthesised when needed). All other column-ref lists stay empty
on the INSERT — an INSERT has no projection of its own.
Cross-statement LineageEdges for the INSERT are
from = TABLE_COLUMN(target_qname, target_col_i_name)
and to = STATEMENT_OUTPUT(selectIdx, source_output_i_name).
Target column names are the explicit INSERT column-list spellings
when supplied, else the source SELECT's positional output names.
public static SemanticProgram buildCreateTable(TCreateTableSqlStatement create, NameBindingProvider provider)
CREATE TABLE target [(c1, ...)] AS
SELECT ... (CTAS) statement. Builds the source SELECT via
build(gudusoft.gsqlparser.stmt.TSelectSqlStatement, gudusoft.gsqlparser.ir.semantic.binding.NameBindingProvider) unchanged, then appends a "CREATE_TABLE"-
kind StatementGraph carrying the target relation and
cross-statement lineage edges (mirrors slice-78 INSERT).
Admitted shape: CREATE [OR REPLACE] TABLE target
[(c1, c2, ...)] AS <subquery-SELECT>. Plain
CREATE TABLE target (a INT, b VARCHAR) (column DDL with
no AS SELECT) is rejected via
DiagnosticCode.CREATE_AS_NO_SOURCE_SELECT. Explicit
column-list arity mismatch surfaces as
DiagnosticCode.CREATE_AS_COLUMN_COUNT_MISMATCH; a
missing / empty target name surfaces (defensively) as
DiagnosticCode.CREATE_AS_TARGET_MISSING.
For CTAS the explicit column-list spellings come from
TCreateTableSqlStatement.getColumnList() — only the bare
column name from each TColumnDefinition is consumed;
data-type tokens are ignored by slice 79.
public static SemanticProgram buildCreateView(TCreateViewSqlStatement create, NameBindingProvider provider)
CREATE [OR REPLACE] VIEW v [(c1, ...)] AS SELECT ...
statement. Mirrors buildCreateTable(gudusoft.gsqlparser.stmt.TCreateTableSqlStatement, gudusoft.gsqlparser.ir.semantic.binding.NameBindingProvider) except the source
SELECT is fetched via TCreateViewSqlStatement.getSubquery()
(lowercase 'q'), the target name from
TCreateViewSqlStatement.getViewName(), and the explicit
column-list spellings from TViewAliasClause on the AST.public static SemanticProgram buildUpdate(TUpdateSqlStatement update, NameBindingProvider provider)
UPDATE target SET c1 = expr1,
c2 = expr2, ... [FROM source_list] [WHERE pred] statements.
Emits one "UPDATE"-kind StatementGraph carrying
the target relation plus synthetic OutputColumn entries
per SET assignment (output name = SET LHS verbatim spelling;
sources = column refs collected from the RHS expression).
Optional WHERE refs surface on
StatementGraph.getFilterColumnRefs().
Slice 82 lifts the slice-80 UPDATE_JOINED_NOT_SUPPORTED
reject for the common PG / MSSQL / BigQuery / Snowflake / Redshift
FROM-side joined UPDATE shapes. The IR shape gains two slots:
relations[] now carries TABLE-kind RelationSources for
FROM-side sources (slice 80 left empty), and
joinColumnRefs[] now carries ON-clause column refs from
FROM-side JOINs. The target stays on
StatementGraph.getTarget(); a reference-identity filter
excludes the target's own TTable instance from relations[].
Admitted shape:
relations[] stays empty.UPDATE t SET ... FROM source
(single FROM source).UPDATE t SET ... FROM s1, s2, ...
(comma-FROM list).UPDATE t SET ... FROM s1 [INNER|LEFT|RIGHT|FULL OUTER] JOIN s2 ON ...
— ON refs populate joinColumnRefs[].UPDATE t SET ... FROM t INNER JOIN s ON ...
— target may appear in FROM; reference-identity filter
excludes the target's own TTable instance from
relations[].CROSS JOIN (no ON; semantically equivalent
to comma-FROM).EExpressionType.simple_object_name_t
column reference (qualified t.x or bare x).
Oracle tuple SET (a, b) = (...) (LHS = list_t)
rejects via
DiagnosticCode.UPDATE_TUPLE_ASSIGNMENT_NOT_SUPPORTED.DiagnosticCode.UPDATE_SET_HAS_SUBQUERY_NOT_SUPPORTED;
window functions reuse the existing
DiagnosticCode.CLAUSE_WINDOW_FUNCTION_LEAK
routed through rejectWindowFunctionInScope(gudusoft.gsqlparser.nodes.TParseTreeNode, java.lang.String).containsAnySubquery(gudusoft.gsqlparser.nodes.TParseTreeNode) +
rejectWindowFunctionInScope helpers used by SELECT
WHERE.Slice 82 reject scope, with slice 83 admitting subquery FROM
sources (the slice-82 UPDATE_FROM_SUBQUERY_NOT_SUPPORTED
code stays declared but unreached — slice-71/72
retain-for-documentation precedent):
processDirectSubqueryTable extractor,
publishing a SUBQUERY-kind RelationSource and a
cross-statement LineageEdge per subquery-bound
output source.DiagnosticCode.UPDATE_FROM_JOIN_USING_NOT_SUPPORTED.DiagnosticCode.UPDATE_FROM_JOIN_NATURAL_NOT_SUPPORTED.DiagnosticCode.UPDATE_JOIN_ON_HAS_SUBQUERY_NOT_SUPPORTED.DiagnosticCode.CLAUSE_WINDOW_FUNCTION_LEAK via
rejectWindowFunctionInScope(gudusoft.gsqlparser.nodes.TParseTreeNode, java.lang.String).Deferred (rejected at the outer level before any SET processing):
DiagnosticCode.UPDATE_CTE_NOT_SUPPORTED.DiagnosticCode.UPDATE_RETURNING_CLAUSE_NOT_SUPPORTED.DiagnosticCode.UPDATE_OUTPUT_CLAUSE_NOT_SUPPORTED.DiagnosticCode.UPDATE_ORDER_BY_OR_LIMIT_NOT_SUPPORTED.DiagnosticCode.UPDATE_NO_SET_CLAUSE.DiagnosticCode.UPDATE_TARGET_MISSING.Cross-statement LineageEdges, one per SET assignment:
from = LineageRef.tableColumn(targetQName, target_col_i) to = LineageRef.statementOutput(0, output_name_i)Statement index 0 is the UPDATE statement itself — the synthetic output IS the per-assignment "projection" that flows into the target column. This is the slice-78 INSERT contract (TABLE_COLUMN → STATEMENT_OUTPUT) with the source SELECT replaced by the UPDATE's own per-assignment outputs; consumers read
outputs[i].sources to enumerate the RHS column refs that
feed the target column.public static SemanticProgram buildDelete(TDeleteSqlStatement delete, NameBindingProvider provider)
DELETE statements and produce a "DELETE"-kind
StatementGraph (§8.1.4 row D11 follow-up via slice 84's
joined-DELETE candidate (a)).
Structurally mirrors slice-80 + slice-82 + slice-83
buildUpdate(gudusoft.gsqlparser.stmt.TUpdateSqlStatement, gudusoft.gsqlparser.ir.semantic.binding.NameBindingProvider) but with no SET clause and an empty
outputColumns list — DELETE has no projection of its
own (RETURNING / OUTPUT projections are deferred to a later
slice). The target relation is exposed via the slice-78
TargetRelation slot; its columns list is
intentionally empty because DELETE removes whole rows rather
than writing specific columns.
WHERE-side reads still surface on
StatementGraph.getFilterColumnRefs() so downstream
governance can audit "what predicates does this DELETE depend
on". Cross-statement LineageEdges are NOT emitted (the
slice-78 / slice-80 target.col_i ← STATEMENT_OUTPUT(…)
contract has no DELETE analogue: there is no source
projection).
Slice 84 admit scope (lifts slice-81's blanket joined-DELETE reject for the common PG / MSSQL FROM-side shapes; mirrors slice 82 + slice 83 onto DELETE):
DELETE FROM t USING
source_list [WHERE] — source_list = simple table,
comma-separated tables, or chain of explicit JOIN ... ON
(driver is taken from referenceJoins).DELETE FROM t FROM driver_table [JOIN other
ON ...] [WHERE] — the target may itself appear in the
FROM-FROM clause as a different TTable instance.DELETE alias FROM t alias INNER JOIN ... ON …
— the alias-form DELETE where target is matched by alias.DELETE FROM t USING (SELECT …) s [WHERE] —
FROM-subquery as a USING source; mirrors slice-83 UPDATE
FROM-subquery extraction.Slice 84 reject scope (preserves slice-81 reject coverage for shapes that still need a refinement slice):
DiagnosticCode.DELETE_JOINED_NOT_SUPPORTED — any
shape with delete.getJoins().size() > 0: MySQL
multi-target DELETE T1, T2 FROM …, MySQL
self-reference DELETE T1 FROM T1, MySQL
multi-USING DELETE FROM T1 USING T1, T2.
Candidates (c) and (d) in §8.1.4 lift these later.DiagnosticCode.DELETE_FROM_JOIN_USING_NOT_SUPPORTED
— USING(col1, col2) on a FROM-side join item;
mirror of slice-82 UPDATE_FROM_JOIN_USING_*.DiagnosticCode.DELETE_FROM_JOIN_NATURAL_NOT_SUPPORTED
— NATURAL JOIN on a FROM-side join item.DiagnosticCode.DELETE_FROM_NESTED_JOIN_NOT_SUPPORTED
— defensive: TTable wrapping a TJoin in the FROM source
(not reached by any observed parser path on supported
dialects, but kept distinct from the subquery code per
slice-80 message-text-discrimination contract).DiagnosticCode.DELETE_JOIN_ON_HAS_SUBQUERY_NOT_SUPPORTED
— subquery in a JOIN ON predicate.Other rejected shapes (slice-81 baseline preserved):
DiagnosticCode.DELETE_CTE_NOT_SUPPORTED,
DiagnosticCode.DELETE_TARGET_MISSING,
DiagnosticCode.DELETE_RETURNING_CLAUSE_NOT_SUPPORTED,
DiagnosticCode.DELETE_OUTPUT_CLAUSE_NOT_SUPPORTED,
DiagnosticCode.DELETE_ORDER_BY_OR_LIMIT_NOT_SUPPORTED.
WHERE-side subqueries reuse the existing
DiagnosticCode.WHERE_HAS_SUBQUERY_NOT_SUPPORTED (no
new DELETE-side code) — consistent with slice-80 UPDATE WHERE
handling. Window functions in WHERE / ON reuse
DiagnosticCode.CLAUSE_WINDOW_FUNCTION_LEAK.
IR shape (slice 84 changes from slice 81):
relations[] — now carries TABLE-kind
RelationSources for joined-DELETE FROM-side
sources, plus SUBQUERY-kind sources for USING
(SELECT …) extractions. Slice 81 left it empty.
Reference-identity filter excludes the target's own
TTable instance; the slice-82 walker-order swap (target
before relations[] in
SqlSemanticAnalyzer.collectCatalogMissWarnings(gudusoft.gsqlparser.ir.semantic.SemanticProgram, gudusoft.gsqlparser.ir.semantic.catalog.Catalog))
handles same-qualified-name target+driver collisions
(e.g. MSSQL DELETE FROM t FROM t spqh JOIN sp).joinColumnRefs[] — now carries ON-clause refs
collected from each JoinItem under a per-DELETE
LinkedHashSet for cross-JoinItem dedup
(slice-82 codex round-1 Q2 BLOCKING precedent).LineageEdges — empty outputColumns[]
means there is no STATEMENT_OUTPUT(deleteIdx, …) anchor
for slice-83's SUBQUERY-kind emitter. Extracted
FROM-subqueries DO emit their own internal lineage edges
via emitLineageForStatement inside
processDirectSubqueryTable(gudusoft.gsqlparser.nodes.TTable, gudusoft.gsqlparser.ir.semantic.binding.NameBindingProvider, java.util.List<gudusoft.gsqlparser.ir.semantic.StatementGraph>, java.util.List<gudusoft.gsqlparser.ir.semantic.LineageEdge>, java.util.Map<java.lang.String, java.lang.Integer>, java.util.Map<java.lang.String, java.util.List<java.lang.String>>, java.util.Map<java.lang.String, java.lang.Integer>).public static SemanticProgram buildMerge(TMergeSqlStatement merge, NameBindingProvider provider)
MERGE INTO target [AS] tgt
USING (source_table | (SELECT ...) ) [AS] src
ON <join condition>
WHEN MATCHED [AND <cond>] THEN UPDATE SET c1 = expr1 [, ...]
WHEN NOT MATCHED [AND <cond>] THEN INSERT [(c1, ...)] VALUES (expr1, ...)
WHEN MATCHED [AND <cond>] THEN DELETE
Emits one "MERGE"-kind StatementGraph carrying:
TargetRelation on getTarget() only — slice
78/80 contract: target lives on the dedicated target slot,
NOT in relations[]. The slice-77/79 catalog walker
fires the kind-discriminated "MERGE target relation 'X'"
message via targetWarnMessage("MERGE").relations[] = one entry for the USING source
(TABLE-kind base table or SUBQUERY-kind aliased subquery).
The slice-77 FROM walker fires "FROM relation 'X'" for
missing source.outputColumns[] = empty (MERGE has no projection).joinColumnRefs[] = ON condition refs + per-WHEN AND
condition refs, LinkedHashSet-deduplicated (slice 82
pattern).filterColumnRefs[] = per-WHEN action WHERE refs
(UPDATE WHERE, UPDATE...DELETE WHERE, INSERT WHERE; slice
95). Empty when no action WHERE is present.Per-WHEN action lineage:
WHEN MATCHED THEN UPDATE SET col_i = expr_i: emit
one LineageEdge per (target col, RHS source ref)
pair as TABLE_COLUMN(target,col) ← <ref> — direct,
no STATEMENT_OUTPUT intermediate (MERGE has no SELECT
projection). Codex round-2 Q4 confirmed YES.WHEN NOT MATCHED THEN INSERT (c1, ...) VALUES (e1, ...):
same pattern — one edge per (insert col, source ref).WHEN MATCHED THEN DELETE: no per-column lineage
(slice 81 DELETE contract).WHEN MATCHED [AND <cond>] THEN DO NOTHING (PG 15+,
slice 96): admitted as a no-op action. No per-column
lineage (slice 81 DELETE precedent). Per-WHEN AND
condition refs still feed joinColumnRefs[] via
the pre-dispatch block.WHEN NOT MATCHED BY SOURCE [AND <cond>] THEN
UPDATE SET ... | DELETE (SQL Server, slice 97):
admitted with the SQL Server semantic invariant that
SET RHS and per-WHEN AND cond may not reference USING
source columns (no source row exists when the action
fires). Source-side refs reject with
DiagnosticCode.MERGE_NOT_MATCHED_BY_SOURCE_REFERENCES_SOURCE.
INSERT on BY SOURCE is parser-admitted but semantically
invalid; rejects with
DiagnosticCode.MERGE_NOT_MATCHED_BY_SOURCE_INSERT_NOT_VALID.
UPDATE target self-refs (t.a = t.b) emit no
lineage edges (slice-94 alias-filter convention; codex
round-1 Q2 confirmed). PG 17+ BY SOURCE syntax still
parses as type 2 plain NOT MATCHED in parser 4.1.5.0
— that parser gap is not addressed in slice 97.For USING-subquery, the inner SELECT is built via build(gudusoft.gsqlparser.stmt.TSelectSqlStatement, gudusoft.gsqlparser.ir.semantic.binding.NameBindingProvider)
and appended as a preceding StatementGraph; its inner
lineage edges are rebased by the current statement-list offset so
STATEMENT_OUTPUT indices stay valid (slice 78 INSERT pattern).
Resolver2 already handles MERGE via MergeScope
— both USING base tables and USING subqueries surface as
sourceTable + EXACT_MATCH bindings on RHS / VALUES /
ON / WHEN-AND refs. Codex round-2 Q5 BLOCKING fix: we install
an explicit slice-83-style published-column map only for
USING subqueries (deterministic; cheap; matches the SELECT-
side FROM-subquery pattern even when redundant).