CASE WHEN Data Lineage Behavior¶
This page explains how the Data Lineage Analyzer (DataFlowAnalyzer) treats columns referenced inside a CASE WHEN expression, why the default behavior changed in version 4.1.0.11, and how to restore the earlier behavior when needed.
Summary¶
Starting with GSP Java 4.1.0.11 (released 2026-04-08), columns that appear in the condition of a CASE WHEN expression are classified as direct dataflow (fdd) by default. In earlier versions they were classified as indirect dataflow (fdr).
The new default aligns with what most users expect when analyzing lineage for auditing, impact analysis, and data-governance use cases: every column that contributes to determining the resulting value — including the columns that drive branch selection — is reported as a direct source.
If you rely on the previous behavior, a CLI flag (/showCaseWhenAsIndirect) and a Java API setter (Option.setShowCaseWhenAsDirect(false)) are available to restore it.
The anatomy of a CASE WHEN expression¶
A searched CASE WHEN expression has three logical parts:
1 2 3 4 | |
- Condition columns: columns referenced in the
WHENpredicate (e.g.status). They determine which branch is taken, but their value is not copied into the result. - Result columns: columns referenced in
THEN/ELSEexpressions (e.g. a column returned directly, or used in an arithmetic/string expression). Their value flows into the result.
GSP's data lineage model distinguishes two kinds of relationship:
| Relation type | Meaning |
|---|---|
fdd — Forward Direct Dataflow |
The source column's value flows directly into the target column. |
fdr — Forward Dataflow Reverse (indirect) |
The source column influences the target (used in WHERE, JOIN, GROUP BY, etc.) but its value does not flow into it. |
Before 4.1.0.11, condition columns were classified as fdr (indirect) because their value technically does not propagate into the output. Result columns were — and still are — classified as fdd (direct).
The behavior change¶
Before 4.1.0.11¶
1 2 | |
Lineage produced for the target column status:
| Source column | Relation type |
|---|---|
exams.score |
fdr (indirect) |
Since 'Pass' and 'Fail' are string literals, no fdd source would be produced for status at all — the target would appear to have no direct source, which is confusing for downstream tools that only render fdd edges.
From 4.1.0.11 onward¶
The same query now produces:
| Source column | Relation type |
|---|---|
exams.score |
fdd (direct) |
This ensures that every column that influences the result of a CASE WHEN expression — whether as a branch selector or as a result value — shows up as a direct lineage edge.
Why the default changed¶
The change was driven by consistency with how customers use column-level lineage in practice:
- Auditing and impact analysis: Governance tools need to know every source column that determines the target value, including those in the condition. A rename, type change, or deprecation of
statusclearly impacts the downstreamlabelcolumn, so the relationship should be direct. - Parity with Oracle
DECODE: Thefunction.propertiesmechanism that drivesDECODE(search_value, match_value, result_value)already classifies search/match arguments as direct. TreatingCASE WHENconditions as indirect madeCASEandDECODEreport different lineage for semantically-equivalent expressions. - Missing-source problem: When both
THENandELSEare literals (a common pattern, e.g.CASE WHEN x IS NULL THEN 0 ELSE 1 END), the target column had nofddsource under the old default. Many consumers of the lineage output expect every non-literal target to have at least one direct source.
How to restore the previous behavior¶
If your downstream system relies on the pre-4.1.0.11 classification, you can opt out on a per-analysis basis.
Command-line interface¶
1 | |
The /showCaseWhenAsIndirect flag reverts to treating CASE WHEN condition columns as indirect (fdr).
Java API¶
1 2 3 | |
Set Option.setShowCaseWhenAsDirect(false) before calling generateDataFlow().
Worked example¶
1 2 3 4 5 6 | |
Default (4.1.0.11+, showCaseWhenAsDirect = true)¶
| Target column | Source column | Relation type |
|---|---|---|
adjusted_salary |
employees.dept_id |
fdd (direct) |
adjusted_salary |
employees.salary |
fdd (direct) |
Opt-out (showCaseWhenAsDirect = false)¶
| Target column | Source column | Relation type |
|---|---|---|
adjusted_salary |
employees.dept_id |
fdr (indirect) |
adjusted_salary |
employees.salary |
fdd (direct) |
With the old behavior, only columns that flow through THEN/ELSE branches produce direct edges. dept_id is demoted to an indirect (fdr) source.
Scope¶
- Applies to SQL
CASE WHENexpressions represented byTCaseExpressionin the AST. - Applies to all supported database dialects (Oracle, SQL Server, PostgreSQL, Snowflake, BigQuery, and the rest of the 25+ dialects GSP supports).
- Does not change how Oracle's
DECODE()function is classified —DECODEis driven byfunction.properties. To adjust it, edit the corresponding entry indataflow/oracle/function.properties. - The SQLFlow web UI exposes the underlying option through the API but does not yet surface a dedicated toggle; the CLI and Java API are the current switches.
Related¶
- Configuration Options —
/showCaseWhenAsDirect - Configuration Options —
function.properties - Release Notes
Change history¶
| Version | Date | Change |
|---|---|---|
| 4.1.0.10 and earlier | up to 2026-03 | Option.showCaseWhenAsDirect defaults to false. Condition columns are reported as fdr (indirect). |
| 4.1.0.11 | 2026-04-08 | Default flipped to true. Condition columns are reported as fdd (direct). Demo CLI flag renamed from /showCaseWhenAsDirect (opt-in) to /showCaseWhenAsIndirect (opt-out). |