Skip to content

Oracle: DB Sample Schemas

Metric Count
Total lineage found by LLM 28
Total lineage found by DFA 27
DFA matches (correct) 27
DFA crashes (lineage missed) 1
DFA silent misses 0
DFA robustness issues 3
Accuracy 96%

Source: Oracle DB Sample Schemas Date: 2026-04-02


Lineage Inventory

Views (21 found by both LLM and DFA)

# View Name Schema Source Tables Type Pattern
1 emp_details_view HR employees, departments, jobs, locations, countries, regions view Implicit joins, WHERE conditions
2 customer_order_products CO orders, order_items, customers, products view Explicit JOINs, SUM, LISTAGG
3 store_orders CO stores, orders, order_items view JOINs, GROUPING SETS, GROUPING_ID
4 product_orders CO orders, order_items, customers, products view JOINs, GROUP BY, SUM, COUNT
5 products OE product_information, product_descriptions view Outer join (+), CASE, TRANSLATE USING NCHAR_CS
6 sydney_inventory OE inventories, warehouses, products (view) view 3-table join, WHERE filter
7 bombay_inventory OE inventories, warehouses, products (view) view 3-table join, WHERE filter
8 toronto_inventory OE inventories, warehouses, products (view) view 3-table join, WHERE filter
9 product_prices OE product_information view GROUP BY, COUNT, MIN, MAX
10 oc_inventories OE inventories, warehouses object view OF type, warehouse_typ() constructor
11 oc_product_information OE product_information, oc_inventories object view CAST(MULTISET(SELECT...))
12 oc_customers OE customers, orders, order_items, oc_product_information object view Nested CAST(MULTISET), MAKE_REF, self-ref
13 oc_corporate_customers OE customers, orders, order_items, oc_customers, oc_product_information object view UNDER subtype, CAST(MULTISET), MAKE_REF
14 oc_orders OE orders, order_items, oc_customers, oc_product_information object view CAST(MULTISET), MAKE_REF
15 account_managers OE customers, countries view GROUP BY ROLLUP, object attribute access
16 customers_view OE customers, countries view Object dot-notation, UDF calls, SDO_GEOMETRY
17 orders_view OE orders view TO_DATE(TO_CHAR(...))
18 DATABASE_SUMMARY OE/XDB v$system_parameter, v$database, v$instance, nls_database_parameters view System dynamic performance views
19 database_summary OE/XDB DATABASE_SUMMARY, V$VERSION, ALL_REGISTRY_BANNERS XMLType view xmlElement, xmlForest, xmlAgg

Materialized Views (2 found by both)

# MV Name Schema Source Tables Pattern
20 cal_month_sales_mv SH sh.sales, sh.times GROUP BY, SUM, ENABLE QUERY REWRITE
21 fweek_pscat_sales_mv SH sh.sales, sh.times, sh.products 3-table join, GROUP BY, SUM

View (1 found by LLM only - DFA crash)

# View Name Schema Source Tables Pattern DFA Status
22 product_reviews CO products (JSON column) JSON_TABLE with NESTED PATH, AVG OVER window CRASH: NullPointerException

Triggers with Data Flow (5 found by both)

# Trigger Schema Source Target Pattern
23 update_job_history HR employees (:old) job_history (via add_job_history proc) AFTER UPDATE trigger, procedure INSERT
24 insert_ord_line OE order_items (SELECT MAX) order_items (:new) BEFORE INSERT trigger, self-referencing
25 orders_trg OE oc_orders (view :NEW) orders INSTEAD OF INSERT
26 orders_items_trg OE oc_orders nested table (:NEW) order_items INSTEAD OF INSERT, DEREF
27 no_dml_operations_allowed OE/XDB database_summary (view) (none - NULL body) INSTEAD OF, no-op

INSERT...SELECT (1 pattern, 22 statements - found by both)

# Source Target Pattern
28 oc_product_information, categories_tab categories_tab INSERT VALUES with CAST(MULTISET(SELECT...))

Critical Issues (DFA Crashes - Lineage Missed)

Issue 1: product_reviews view - JSON_TABLE with NESTED PATH

  • Target: product_reviews (view)
  • Expected lineage: products -> product_reviews (view)
  • Error: NullPointerException: Cannot invoke "gudusoft.gsqlparser.nodes.TObjectName.toString()" because "column" is null
  • Location: DFA analyze sql stmt, line 223, column 1 of co_create.sql
  • SQL pattern:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
CREATE OR REPLACE VIEW product_reviews AS
SELECT p.product_name,
       j.rating,
       ROUND(AVG(j.rating) OVER (PARTITION BY p.product_name), 2) avg_rating,
       j.review
FROM   products p,
       JSON_TABLE ( p.product_details, '$.reviews[*]'
         COLUMNS (
           rating  NUMBER        PATH '$.rating',
           review  VARCHAR2(100) PATH '$.review'
         )
       ) j;

The DFA crashes with NullPointerException when processing the JSON_TABLE function with COLUMNS clause. The column references inside JSON_TABLE COLUMNS clause produce a null TObjectName, causing the crash.


Robustness Issues

Issue R1: Sales History OOM on directory-level run

  • Error: OutOfMemoryError: Java heap space at DataFlowAnalyzer.analyzeAndOutputResult()
  • Workaround: Running individual files with -Xmx4g succeeds
  • Impact: No lineage missed - all SH views and MVs correctly detected per-file

Issue R2: Product Media JSON parsing errors

  • Error: RuntimeException: Unknown JSON value type. in TJSONSQLEnvParser.getJSONSQLEnv()
  • Impact: The PM schema has no data lineage (only DDL + LOB loading via SQL*Loader), so no lineage missed

Issue R3: xdbConfiguration ALTER View unsupported

  • Error: UnsupportedOperationException: Can't handle this alter view statement case. at line 206
  • Impact: The ALTER VIEW statement is not a lineage source - the CREATE VIEW statements were correctly processed. No lineage missed.

Side-by-Side: Where Each Approach Excels

Summary

Dimension Winner Evidence Impact
Column-level precision DFA DFA traces exact column-to-column fdd relationships across all 21 views, including through object type constructors and CAST(MULTISET) High
Multi-source expressions DFA DFA correctly traces computed columns like SUM(quantity * unit_price) to both source columns in store_orders and customer_order_products Medium
FK/ER relationships DFA DFA extracts FK relationships from ALTER TABLE ADD CONSTRAINT across HR (6 FKs), CO (9 FKs), OE (9+ FKs) Medium
fdd vs fdr distinction DFA DFA distinguishes direct dataflow (fdd: SELECT columns) from indirect (fdr: WHERE, JOIN ON, GROUP BY conditions) across all views Medium
Novel/vendor-specific syntax LLM LLM correctly analyzed JSON_TABLE with NESTED PATH (product_reviews) where DFA crashed High
Semantic understanding LLM LLM identified trigger -> procedure -> INSERT pipeline as a single logical data flow (update_job_history -> add_job_history -> job_history) Medium
Graceful degradation LLM DFA crashed on 1 SQL pattern (JSON_TABLE), LLM handled all 28 lineage patterns Medium
Aggregate/string functions Tie Both correctly identified LISTAGG, SUM, COUNT, AVG window functions as lineage sources Low
Trigger/procedure bodies Tie Both traced trigger data flow through :OLD/:NEW pseudo-records and INSTEAD OF triggers with DEREF Low

Detail

DFA Advantages in This Codebase

  1. Column-level precision: DFA traces exact column-to-column mappings. For example, in emp_details_view, DFA maps employees.first_name -> emp_details_view.first_name as a direct fdd relationship, while also tracking employees.department_id -> departments.department_id as an indirect fdr (JOIN condition). The LLM identified the same columns but cannot formally distinguish fdd from fdr.

  2. FK/ER extraction: DFA extracted all foreign key relationships from DDL across all schemas (e.g., employees.department_id -> departments.department_id, countries.region_id -> regions.region_id). These are structural relationships that the LLM explicitly excluded from its analysis.

  3. Object view column tracing: DFA traced column lineage through Oracle object type constructors (warehouse_typ, inventory_typ) and CAST(MULTISET) patterns in the complex object-relational views (oc_inventories, oc_product_information, oc_customers, etc.).

  4. INSERT...VALUES constant tracing: DFA traced all 88+ INSERT...VALUES statements in HR population data and 86+ in CO population data, mapping literal values to target columns.

LLM Advantages in This Codebase

  1. JSON_TABLE handling: LLM correctly analyzed the product_reviews view using JSON_TABLE with COLUMNS clause, identifying that products.product_details (JSON column) feeds rating and review via JSON path expressions $.reviews[*].rating and $.reviews[*].review. DFA crashed with NullPointerException on this pattern.

  2. Holistic trigger-procedure chains: LLM identified the complete data flow chain: employees table UPDATE -> update_job_history trigger -> add_job_history procedure -> job_history table INSERT, presenting this as a single logical lineage relationship. DFA found the individual components but required the user to assemble the chain.

  3. Business context: LLM noted that customers table is joined in product_orders view but no customer columns appear in output (unnecessary join), and that no_dml_operations_allowed is a no-op trigger. DFA processed these without noting the semantic observations.

Tie / Both Correct

  1. Standard views: Both approaches correctly identified all 18 standard/object views with their source tables.
  2. Materialized views: Both found profits, cal_month_sales_mv, and fweek_pscat_sales_mv with correct source tables.
  3. INSTEAD OF triggers: Both correctly traced data flow through the orders_trg and orders_items_trg INSTEAD OF triggers, including the DEREF pattern.
  4. CAST(MULTISET) INSERT...SELECT: Both identified the 22 INSERT statements into categories_tab using CAST(MULTISET(SELECT...FROM oc_product_information)).

Recommendations

  1. Fix JSON_TABLE support in DFA: The NullPointerException when processing JSON_TABLE COLUMNS clause is a parser/analyzer bug. The column references inside JSON_TABLE should be recognized as derived columns from the JSON source expression.

  2. Add ALTER XMLType view support: The UnsupportedOperationException on ALTER VIEW for XMLType views should be handled gracefully (or supported if possible).

  3. Increase default heap for directory-level runs: The SH schema OOM at 2GB suggests the default should be 4GB for directories with many INSERT...VALUES statements.

  4. Handle non-SQL files gracefully: The JSON parsing error in PM directory is caused by non-SQL files being processed. DFA should skip files that don't parse as SQL rather than throwing exceptions.