Skip to content

CASE WHEN Data Lineage Behavior

This page explains how the Data Lineage Analyzer (DataFlowAnalyzer) treats columns referenced inside a CASE WHEN expression, why the default behavior changed in version 4.1.0.11, and how to restore the earlier behavior when needed.

Summary

Starting with GSP Java 4.1.0.11 (released 2026-04-08), columns that appear in the condition of a CASE WHEN expression are classified as direct dataflow (fdd) by default. In earlier versions they were classified as indirect dataflow (fdr).

The new default aligns with what most users expect when analyzing lineage for auditing, impact analysis, and data-governance use cases: every column that contributes to determining the resulting value — including the columns that drive branch selection — is reported as a direct source.

If you rely on the previous behavior, a CLI flag (/showCaseWhenAsIndirect) and a Java API setter (Option.setShowCaseWhenAsDirect(false)) are available to restore it.

The anatomy of a CASE WHEN expression

A searched CASE WHEN expression has three logical parts:

1
2
3
4
CASE WHEN <condition>  -- e.g. status = 'A'
     THEN <then_value> -- e.g. 'Active'
     ELSE <else_value> -- e.g. 'Inactive'
END
  • Condition columns: columns referenced in the WHEN predicate (e.g. status). They determine which branch is taken, but their value is not copied into the result.
  • Result columns: columns referenced in THEN / ELSE expressions (e.g. a column returned directly, or used in an arithmetic/string expression). Their value flows into the result.

GSP's data lineage model distinguishes two kinds of relationship:

Relation type Meaning
fdd — Forward Direct Dataflow The source column's value flows directly into the target column.
fdr — Forward Dataflow Reverse (indirect) The source column influences the target (used in WHERE, JOIN, GROUP BY, etc.) but its value does not flow into it.

Before 4.1.0.11, condition columns were classified as fdr (indirect) because their value technically does not propagate into the output. Result columns were — and still are — classified as fdd (direct).

The behavior change

Before 4.1.0.11

1
2
SELECT CASE WHEN score >= 60 THEN 'Pass' ELSE 'Fail' END AS status
FROM exams;

Lineage produced for the target column status:

Source column Relation type
exams.score fdr (indirect)

Since 'Pass' and 'Fail' are string literals, no fdd source would be produced for status at all — the target would appear to have no direct source, which is confusing for downstream tools that only render fdd edges.

From 4.1.0.11 onward

The same query now produces:

Source column Relation type
exams.score fdd (direct)

This ensures that every column that influences the result of a CASE WHEN expression — whether as a branch selector or as a result value — shows up as a direct lineage edge.

Why the default changed

The change was driven by consistency with how customers use column-level lineage in practice:

  1. Auditing and impact analysis: Governance tools need to know every source column that determines the target value, including those in the condition. A rename, type change, or deprecation of status clearly impacts the downstream label column, so the relationship should be direct.
  2. Parity with Oracle DECODE: The function.properties mechanism that drives DECODE(search_value, match_value, result_value) already classifies search/match arguments as direct. Treating CASE WHEN conditions as indirect made CASE and DECODE report different lineage for semantically-equivalent expressions.
  3. Missing-source problem: When both THEN and ELSE are literals (a common pattern, e.g. CASE WHEN x IS NULL THEN 0 ELSE 1 END), the target column had no fdd source under the old default. Many consumers of the lineage output expect every non-literal target to have at least one direct source.

How to restore the previous behavior

If your downstream system relies on the pre-4.1.0.11 classification, you can opt out on a per-analysis basis.

Command-line interface

1
java -jar dataflowanalyzer.jar /f query.sql /t oracle /showCaseWhenAsIndirect

The /showCaseWhenAsIndirect flag reverts to treating CASE WHEN condition columns as indirect (fdr).

Java API

1
2
3
DataFlowAnalyzer dlineage = new DataFlowAnalyzer(sqlText, EDbVendor.dbvoracle, true);
dlineage.getOption().setShowCaseWhenAsDirect(false);   // pre-4.1.0.11 behavior
dlineage.generateDataFlow();

Set Option.setShowCaseWhenAsDirect(false) before calling generateDataFlow().

Worked example

1
2
3
4
5
6
SELECT
  CASE WHEN e.dept_id = 10 THEN e.salary
       WHEN e.dept_id = 20 THEN e.salary * 1.1
       ELSE 0
  END AS adjusted_salary
FROM employees e;

Default (4.1.0.11+, showCaseWhenAsDirect = true)

Target column Source column Relation type
adjusted_salary employees.dept_id fdd (direct)
adjusted_salary employees.salary fdd (direct)

Opt-out (showCaseWhenAsDirect = false)

Target column Source column Relation type
adjusted_salary employees.dept_id fdr (indirect)
adjusted_salary employees.salary fdd (direct)

With the old behavior, only columns that flow through THEN/ELSE branches produce direct edges. dept_id is demoted to an indirect (fdr) source.

Scope

  • Applies to SQL CASE WHEN expressions represented by TCaseExpression in the AST.
  • Applies to all supported database dialects (Oracle, SQL Server, PostgreSQL, Snowflake, BigQuery, and the rest of the 25+ dialects GSP supports).
  • Does not change how Oracle's DECODE() function is classified — DECODE is driven by function.properties. To adjust it, edit the corresponding entry in dataflow/oracle/function.properties.
  • The SQLFlow web UI exposes the underlying option through the API but does not yet surface a dedicated toggle; the CLI and Java API are the current switches.

Change history

Version Date Change
4.1.0.10 and earlier up to 2026-03 Option.showCaseWhenAsDirect defaults to false. Condition columns are reported as fdr (indirect).
4.1.0.11 2026-04-08 Default flipped to true. Condition columns are reported as fdd (direct). Demo CLI flag renamed from /showCaseWhenAsDirect (opt-in) to /showCaseWhenAsIndirect (opt-out).