public final class StatementGraph extends Object
name is non-null when this statement is the body of a named
CTE or a FROM-clause subquery. For top-level outer SELECTs it is null.
filterColumnRefs, joinColumnRefs,
groupByColumnRefs, havingColumnRefs, and
orderByColumnRefs are flat lists of column references that
appear in the WHERE, JOIN predicate (ON / USING), GROUP BY, HAVING,
and ORDER BY clauses respectively. For JOIN ... USING (k)
(slice 64) joinColumnRefs contains one ref per
(relation, key) pair on both sides — left side first via
catalog-aware narrowing, then the right side. The IR deliberately
does not model structured
Filter, Join, or GroupBy nodes with predicate
trees yet; later slices will add them. Listing the affected columns
is enough to answer the roadmap's questions about
filter/join/grouping/having/ordering influence.
orderByColumnRefs only ever contains references to physical
(base or in-statement) columns. Ordinal references (ORDER BY 1)
and bare-constant sort keys are rejected by the builder — emitting
[] for them would lose the dependency information silently.
Slice 9 (single-SELECT) rejects projection-alias references like
SELECT id AS x ... ORDER BY x. Slice 21 (set-op outer)
accepts alias references positionally against branch[0]'s
outputs — the alias IS the set-op output schema. The two paths
diverge intentionally; see
SemanticIRBuilder.buildOrderByColumnRefs (slice 9) versus
SemanticIRBuilder.buildSetOpOuterOrderByColumnRefs (slice 21).
| Constructor and Description |
|---|
StatementGraph(String name,
String kind,
List<RelationSource> relations,
List<OutputColumn> outputColumns,
List<ColumnRef> filterColumnRefs,
List<ColumnRef> joinColumnRefs,
List<ColumnRef> groupByColumnRefs,
List<ColumnRef> havingColumnRefs,
List<ColumnRef> orderByColumnRefs,
boolean distinct,
SetOperator setOperator,
RowLimit rowLimit)
Pre-slice-73 constructor preserved so hand-built test fixtures
(e.g.
|
StatementGraph(String name,
String kind,
List<RelationSource> relations,
List<OutputColumn> outputColumns,
List<ColumnRef> filterColumnRefs,
List<ColumnRef> joinColumnRefs,
List<ColumnRef> groupByColumnRefs,
List<ColumnRef> havingColumnRefs,
List<ColumnRef> orderByColumnRefs,
List<ColumnRef> distinctOnColumnRefs,
boolean distinct,
SetOperator setOperator,
RowLimit rowLimit)
Slice 73 constructor preserved so SELECT-kind production code that
predates slice 78 keeps compiling unchanged.
|
StatementGraph(String name,
String kind,
List<RelationSource> relations,
List<OutputColumn> outputColumns,
List<ColumnRef> filterColumnRefs,
List<ColumnRef> joinColumnRefs,
List<ColumnRef> groupByColumnRefs,
List<ColumnRef> havingColumnRefs,
List<ColumnRef> orderByColumnRefs,
List<ColumnRef> distinctOnColumnRefs,
boolean distinct,
SetOperator setOperator,
RowLimit rowLimit,
TargetRelation target)
Slice 78 constructor preserved so production code that predates
slice 85 keeps compiling unchanged.
|
StatementGraph(String name,
String kind,
List<RelationSource> relations,
List<OutputColumn> outputColumns,
List<OutputColumn> returningColumns,
List<ColumnRef> filterColumnRefs,
List<ColumnRef> joinColumnRefs,
List<ColumnRef> groupByColumnRefs,
List<ColumnRef> havingColumnRefs,
List<ColumnRef> orderByColumnRefs,
List<ColumnRef> distinctOnColumnRefs,
boolean distinct,
SetOperator setOperator,
RowLimit rowLimit,
TargetRelation target)
Slice 85 primary constructor — adds the optional
returningColumns slot for INSERT / UPDATE / DELETE RETURNING
(PG / Oracle) and OUTPUT (SQL Server) projections. |
| Modifier and Type | Method and Description |
|---|---|
List<ColumnRef> |
getDistinctOnColumnRefs()
Column references in the
DISTINCT ON (cols) partition list
(PostgreSQL / Greenplum). |
List<ColumnRef> |
getFilterColumnRefs() |
List<ColumnRef> |
getGroupByColumnRefs() |
List<ColumnRef> |
getHavingColumnRefs()
Column references that appear in the
HAVING clause's
predicate. |
List<ColumnRef> |
getJoinColumnRefs() |
String |
getKind() |
String |
getName()
Nullable: name for a CTE body or FROM-subquery alias, else null.
|
List<ColumnRef> |
getOrderByColumnRefs()
Column references that appear in the
ORDER BY clause's sort
keys. |
List<OutputColumn> |
getOutputColumns() |
List<RelationSource> |
getRelations() |
List<OutputColumn> |
getReturningColumns()
Slice 85 — RETURNING / OUTPUT projection columns for INSERT / UPDATE /
DELETE statements.
|
RowLimit |
getRowLimit()
Per-statement row-limit metadata (slice 70).
|
SetOperator |
getSetOperator()
Set-operation kind for the outer statement of a set-op program
(slice 12).
|
TargetRelation |
getTarget()
Slice 78 — write-side target for INSERT statements.
|
boolean |
isDistinct()
Whether the statement applies row-deduplication.
|
public StatementGraph(String name, String kind, List<RelationSource> relations, List<OutputColumn> outputColumns, List<OutputColumn> returningColumns, List<ColumnRef> filterColumnRefs, List<ColumnRef> joinColumnRefs, List<ColumnRef> groupByColumnRefs, List<ColumnRef> havingColumnRefs, List<ColumnRef> orderByColumnRefs, List<ColumnRef> distinctOnColumnRefs, boolean distinct, SetOperator setOperator, RowLimit rowLimit, TargetRelation target)
returningColumns slot for INSERT / UPDATE / DELETE RETURNING
(PG / Oracle) and OUTPUT (SQL Server) projections. The slot is
always non-null (use Collections.emptyList() when absent);
non-empty only on DML statements that supplied a RETURNING or
OUTPUT projection.public StatementGraph(String name, String kind, List<RelationSource> relations, List<OutputColumn> outputColumns, List<ColumnRef> filterColumnRefs, List<ColumnRef> joinColumnRefs, List<ColumnRef> groupByColumnRefs, List<ColumnRef> havingColumnRefs, List<ColumnRef> orderByColumnRefs, List<ColumnRef> distinctOnColumnRefs, boolean distinct, SetOperator setOperator, RowLimit rowLimit, TargetRelation target)
returningColumns.public StatementGraph(String name, String kind, List<RelationSource> relations, List<OutputColumn> outputColumns, List<ColumnRef> filterColumnRefs, List<ColumnRef> joinColumnRefs, List<ColumnRef> groupByColumnRefs, List<ColumnRef> havingColumnRefs, List<ColumnRef> orderByColumnRefs, List<ColumnRef> distinctOnColumnRefs, boolean distinct, SetOperator setOperator, RowLimit rowLimit)
target=null.public StatementGraph(String name, String kind, List<RelationSource> relations, List<OutputColumn> outputColumns, List<ColumnRef> filterColumnRefs, List<ColumnRef> joinColumnRefs, List<ColumnRef> groupByColumnRefs, List<ColumnRef> havingColumnRefs, List<ColumnRef> orderByColumnRefs, boolean distinct, SetOperator setOperator, RowLimit rowLimit)
SemanticIRProjectorBodyIndexesTest) continue to
compile without touching every call site. Delegates to the
slice-73 constructor with an empty distinctOnColumnRefs
list. New production code should call the slice-78 primary
constructor directly.public List<RelationSource> getRelations()
public List<OutputColumn> getOutputColumns()
public List<OutputColumn> getReturningColumns()
For PG / Oracle RETURNING, each entry's
OutputColumn.getName() is the explicit alias when present,
else the verbatim bare column spelling.
OutputColumn.getSources() lists the underlying column refs;
the relationAlias resolves through the same provider used
for SET RHS / WHERE / JOIN ON, so a joined-UPDATE with
RETURNING t.a, s.x produces refs against both target and
FROM-side relations.
For SQL Server OUTPUT pseudo-table refs (INSERTED.col,
DELETED.col), the relationAlias is preserved as the
uppercase pseudo-table name ("INSERTED" or
"DELETED") so consumers can distinguish post-write from
pre-write row state. Lineage edges still flow to
LineageRef.tableColumn(String, String) pointing at the
physical target table column — both INSERTED and DELETED ultimately
reference the same physical column; only the temporal phase differs.
public List<ColumnRef> getFilterColumnRefs()
public List<ColumnRef> getJoinColumnRefs()
public List<ColumnRef> getGroupByColumnRefs()
public List<ColumnRef> getHavingColumnRefs()
HAVING clause's
predicate. The list is per-statement and per-clause: a HAVING
predicate that names d.id contributes one entry; a HAVING
predicate inside an aggregate (HAVING SUM(salary) > 1000)
contributes the underlying column (salary) — the same
convention used for projection-side aggregate arguments
(slice 6 OutputColumn.sources).
Subqueries in HAVING (scalar, EXISTS, IN-SELECT, ANY/ALL/SOME) and window functions in HAVING are rejected by the builder rather than silently captured, because the visitor would descend into inner scopes and leak refs (mirrors the slice-9 ORDER BY guards).
HAVING is row-influence semantically (it filters out groups),
but it deliberately does not contribute to the canonical
lineage model (slice 7 / CanonicalLineageEdge). The
canonical model is a parity contract between IR and dlineage, and
dlineage exposes no per-clause HAVING field — it folds HAVING refs
into aggregate-function fdr/fdd edges. Including HAVING-derived
canonical edges only on the IR side would manufacture
divergence-by-design. The havingColumnRefs field remains
useful for downstream consumers (SQL Guard, lineage explainers)
that don't depend on the dlineage parity contract.
public List<ColumnRef> getOrderByColumnRefs()
ORDER BY clause's sort
keys. Only physical column references are recorded — ordinal
(ORDER BY 1) and projection-alias (ORDER BY x)
forms are rejected by the builder, not silently emitted as
[]. Sort direction (ASC/DESC) and null
placement (NULLS FIRST/NULLS LAST) are presentation
metadata and are not modelled.
The flag is per-statement: in
WITH x AS (... ORDER BY id) SELECT id FROM x the inner
statement's orderByColumnRefs contains id while the
outer's is empty.
public boolean isDistinct()
SELECT DISTINCT, Oracle's deprecated synonym
SELECT UNIQUE, AND PostgreSQL / Greenplum
SELECT DISTINCT ON (cols); false for SELECT,
SELECT ALL, and the absence of any row-filter clause.
The flag is per-statement, never per-output.
For DISTINCT ON (cols) the partition keys live on
getDistinctOnColumnRefs(); the boolean here pins the
semantic invariant that the statement deduplicates rows
regardless of which key shape is used.
public List<ColumnRef> getDistinctOnColumnRefs()
DISTINCT ON (cols) partition list
(PostgreSQL / Greenplum). Empty for plain SELECT DISTINCT,
SELECT UNIQUE, SELECT ALL, and the absence of any
row-filter clause.
Invariant: !distinctOnColumnRefs.isEmpty() implies
isDistinct() == true. The reverse does not hold
(plain DISTINCT also returns true).
The list collects physical column refs the same way
groupByColumnRefs does: column refs inside compound
expressions (a + b, CASE WHEN ...) and aggregate
arguments (COUNT(x)) are descended into; subqueries and
window functions in DISTINCT ON are rejected by the
builder so they cannot leak inner-scope refs.
Oracle, MySQL, Redshift and other non-PG vendors silently
accept DISTINCT ON (...) as plain DISTINCT —
their parser drops the ON expression list, so this slot stays
empty for those vendors regardless of the surface SQL.
public SetOperator getSetOperator()
_ALL variants encode TSelectSqlStatement#isAll();
MINUS (Oracle / Spark / Hive) and EXCEPT
(PostgreSQL / SQL Server / standard) are kept distinct because the
parser exposes them as separate
ESetOperatorType values, even though
they are semantically equivalent.public RowLimit getRowLimit()
LIMIT N OFFSET M, MySQL inline LIMIT M, N,
set-op outer row-limit) — those surfaces continue to be rejected
by the builder with their existing diagnostic codes.
When non-null, the RowLimit.getKind() captures which
surface SQL form was used (LIMIT vs FETCH FIRST)
and RowLimit.getCount() captures the verbatim count text.
Row-limit metadata does not change column lineage. The
canonical lineage model (slice 7 / CanonicalLineageEdge)
deliberately ignores it: row-limit is presentation-time pruning,
not a column-flow influence. ORDER BY refs, output sources,
filter / join / group-by / having refs are all unaffected.
public TargetRelation getTarget()
"INSERT"-kind statements; null on every "SELECT"-kind
statement (whether the SELECT is an outer, CTE body, FROM-subquery
body, scalar-subquery body, or set-op branch).
When non-null, TargetRelation.getBinding() is the target
table (kind = RelationKind.TABLE) and
TargetRelation.getColumns() holds the verbatim SQL column-list
spellings (empty list when the SQL author omitted the column list).
Cross-statement LineageEdges for INSERT use
LineageRef.tableColumn(String, String) as the from
endpoint (target_table, target_col) and
LineageRef.statementOutput(int, String) as the to
endpoint (source SELECT body statement index + output name).