Snowflake: Mattermost Data Warehouse (dbt)¶
| Metric | Count |
|---|---|
| Total lineage found by LLM | 46 |
| Total lineage found by DFA | 0 |
| DFA matches (correct) | 0 |
| DFA crashes (lineage missed) | 0 |
| DFA silent misses | 46 |
| DFA robustness issues | 1 |
| Accuracy | 0% |
Source Repository: mattermost-data-warehouse MantisBT: #4387 Date: 2026-04-03
Performance Comparison¶
| Metric | LLM | DFA |
|---|---|---|
| Wall-clock time | 55s | 1s |
| Token usage | N/A | N/A |
| Agent count | 2 parallel agents | 1 process |
| Approach | Read & understand SQL + Jinja | Parse AST & trace dataflow |
Notes:
- DFA completed in 1 second because it immediately failed on all Jinja-templated SQL files.
- LLM analyzed 47 dbt model files and correctly identified 46 lineage relationships.
- This is a dbt project — SQL files contain Jinja templates ({{ ref() }}, {{ source() }}, {% if is_incremental() %}).
Root Cause: dbt/Jinja Incompatibility¶
DFA found zero lineage because dbt SQL files are incompatible with DFA in two ways:
1. Raw dbt SQL contains Jinja templates (not valid SQL)¶
1 2 3 4 5 6 7 8 9 10 | |
DFA's Snowflake parser cannot tokenize {{, }}, {%, %}, or Jinja expressions. Every file fails at the tokenizer level with no output.
2. Compiled dbt SQL lacks DDL context¶
dbt's compiled output (in target/compiled/) renders Jinja to pure SQL, but the result is a bare SELECT statement with no CREATE VIEW or INSERT INTO wrapper:
1 2 3 4 | |
DFA found 342 table references across compiled SQL but traced zero lineage relationships because there is no target object (no CREATE VIEW, INSERT INTO, or CTAS wrapping the SELECT).
All 46 LLM-Detected Lineage Relationships (DFA: 0 of 46)¶
Product Marts (28 models)¶
| # | Target Model | Source Models | Type |
|---|---|---|---|
| 1 | dim_board_customers | _int_server_installation_id_bridge, stg_stripe__subscriptions, stg_stripe__customers, int_boards_active_days_spined | view |
| 2 | fct_board_activity | int_boards_active_days_spined | table |
| 3 | fct_calls_activity | int_calls_active_days_spined, int_server_hosting_type | table |
| 4 | grp_copilot_tracks | stg_copilot__tracks | table |
| 5 | dim_daily_license | int_server_license_daily, int_known_licenses | view |
| 6 | dim_daily_server_config | int__config_oauth, int__config_ldap, int__config_saml, int__config_plugin, int__config_service | view |
| 7 | dim_daily_server_info | int_server_active_days_spined | view |
| 8 | dim_excludable_servers | int_excludable_servers | view |
| 9 | dim_installation_summary | int_server_active_days_spined | view |
| 10 | dim_server_info | int_user_active_days_spined, int_server_telemetry_summary, int_server_license_daily, int_known_licenses, int_server_hosting_type, int_server_ip_to_country, stg_stripe__subscriptions, stg_stripe__customers | view |
| 11 | dim_version | int_server_active_days_spined | view |
| 12 | dim_ip_daily_summary | stg_releases__log_entries | incremental |
| 13 | fct_downloads | stg_releases__log_entries | incremental |
| 14 | fct_active_servers | int_server_active_days_spined, int_server_license_daily | view |
| 15 | fct_active_users | int_user_active_days_spined, int_server_active_days_spined | view |
| 16 | fct_feature_daily_snapshot | int_feature_daily_spine, telemetry_days, int_server_active_days_spined | view |
| 17 | bdg_license_server | int_server_license_daily | view |
| 18 | dim_latest_server_customer_info | int_latest_server_customer_info | view |
| 19 | dim_license_customers | int_active_licenses | view |
| 20 | dim_license_type | stg_stripe__products | view |
| 21 | fct_active_production_licenses | int_active_licenses | view |
| 22 | fct_licenses | int_active_licenses | view |
| 23 | dim_known_licenses | int_known_licenses | view |
| 24 | fct_notification_stats | int_notifications_eu_hourly, int_notifications_us_hourly, int_notifications_test_hourly, int_notifications_logs_eu_hourly, int_notifications_logs_us_hourly, int_notifications_logs_test_hourly | view |
| 25 | fct_nps_feedback | int_nps_feedback, int_nps_score | view |
| 26 | fct_nps_score | int_user_nps_score_spined, int_nps_server_version_spined | view |
| 27 | rpt_tedau_at_day_28 | fct_active_users, dim_daily_server_info, dim_server_info | view |
| 28 | fct_trial_request_history | int_cloud_trial_requests_history, int_onprem_trial_requests_history, stg_salesforce__lead | view |
Sales Marts (7 models)¶
| # | Target Model | Source Models | Type |
|---|---|---|---|
| 29 | fct_subscription_history | stg_cws__subscription_history, stg_stripe__subscriptions, stg_stripe__products | view |
| 30 | dim_self_serve_customers | stg_stripe__customers | view |
| 31 | fct_in_product_trial_requests | stg_cws__trial_requests, country_codes, stg_salesforce__lead, stg_salesforce__campaign_member, int_onprem_trial_license_information | view |
| 32 | fct_cloud_trial_requests | int_cloud_trial_requests, stg_salesforce__lead, stg_salesforce__campaign_member | view |
| 33 | sync_opportunity_update_ending_arr | int_opportunity_ext, stg_salesforce__opportunity, int_opportunity_daily_arr | view |
| 34 | sync_account_arr_and_type | stg_salesforce__account, int_account_arr_seats | view |
| 35 | sync_lead_account_link | stg_salesforce__lead, stg_salesforce__account | view |
Release Marts (5 models)¶
| # | Target Model | Source Models | Type |
|---|---|---|---|
| 36 | fct_issues_daily_snapshot | stg_mattermost_jira__issues, stg_mattermost__version_release_dates | view |
| 37 | dim_labels | stg_mattermost_jira__issues | view |
| 38 | dim_releases | stg_mattermost__version_release_dates | view |
| 39 | dim_fix_versions | stg_mattermost_jira__issues, stg_mattermost__version_release_dates | view |
| 40 | dim_projects | stg_mattermost_jira__projects | view |
Common Marts (2 models)¶
| # | Target Model | Source Models | Type |
|---|---|---|---|
| 41 | dim_user_agent | int_user_agents | incremental |
| 42 | dim_date | telemetry_days | view |
Other Marts (4 models)¶
| # | Target Model | Source Models | Type |
|---|---|---|---|
| 43 | dim_events | int_events_aggregated_to_date | view |
| 44 | fct_events_daily_snapshot | int_events_aggregated_to_date | view |
| 45 | fct_security_newsletter | stg_cws__marketing, stg_salesforce__lead, stg_salesforce__campaign_member | view |
| 46 | fct_onprem_trial_requests_history | stg_cws__trial_requests, stg_salesforce__lead | view |
Robustness Issues¶
Issue 1: JSON SQL Environment parse failures¶
DFA's TJSONSQLEnvParser throws RuntimeException: Unknown JSON value type when processing dbt project directories. The dbt project contains .json and .yml files (e.g., manifest.json, schema.yml) that the SQL environment parser attempts to read but cannot handle. This generates warning-level stack traces but does not crash the process.
Issue 2: Compiled SQL syntax errors (8 files)¶
When run against compiled dbt SQL (in target/compiled/), 8 of 251 files had syntax errors. These are primarily in files using Snowflake-specific syntax patterns that the tokenizer doesn't handle (e.g., calls_events.sql with nested WITH clauses).
Side-by-Side: Where Each Approach Excels¶
Summary¶
| Dimension | Winner | Evidence | Impact |
|---|---|---|---|
| Column-level precision | N/A | DFA found 0 relationships; cannot compare | High |
| Multi-source expressions | N/A | No DFA output to evaluate | N/A |
| FK/ER relationships | N/A | No DDL with FK constraints in dbt models | N/A |
| fdd vs fdr distinction | N/A | DFA found 0 relationships | N/A |
| Novel/vendor-specific syntax | LLM | Jinja templates, dbt macros, incremental strategies | Critical |
| Semantic understanding | LLM | Understood ref()/source() as model dependencies | Critical |
| Graceful degradation | LLM | DFA produced 0 output; LLM found all 46 relationships | Critical |
| Aggregate/string functions | N/A | No DFA output to evaluate | N/A |
| Trigger/procedure bodies | N/A | No triggers/procedures in dbt models | N/A |
Detail¶
DFA Advantages in This Codebase¶
None. DFA cannot process dbt SQL in any form:
- Raw SQL with Jinja: tokenizer fails on {{, {%
- Compiled SQL without Jinja: no DDL context (bare SELECT, no CREATE VIEW)
If DFA were provided with properly wrapped SQL (e.g., CREATE VIEW model_name AS <compiled_select>), it would likely produce column-level lineage for many of these models. But that pre-processing step is not part of the current DFA workflow.
LLM Advantages in This Codebase¶
- Jinja comprehension: LLM read through
{{ ref('model_name') }}and{{ source('schema', 'table') }}to understand model dependencies without any template pre-processing - dbt conventions: LLM understood that each
.sqlfile IS the model definition, the filename is the target name, andref()calls are upstream dependencies - Incremental logic: LLM identified
{% if is_incremental() %}blocks and understood the delete+insert strategy - Macro expansion: LLM understood
{{ dbt_utils.star() }},{{ dbt_utils.pivot() }},{{ dbt_utils.generate_surrogate_key() }}as column-generating macros - Architecture: LLM identified the staging/intermediate/marts layered pattern, star schema design, and multi-source integration
Tie / Both Correct¶
- Both approaches correctly identified that dbt project directories contain non-SQL files — DFA warned about JSON parse failures, LLM skipped non-SQL files
- Neither approach can trace lineage through dynamic SQL or runtime-determined table names
Recommendations¶
- dbt pre-processor for DFA: Implement a dbt-aware pre-processing step that:
- Runs
dbt compileto generate rendered SQL (or reads fromtarget/compiled/) - Wraps each compiled model in
CREATE VIEW {model_name} AS ...using metadata frommanifest.json -
Feeds the wrapped SQL to DFA for column-level lineage analysis
-
Jinja-aware tokenizer: As a lighter alternative, add a pre-pass to the Snowflake lexer that strips or replaces Jinja expressions with SQL-safe placeholders (e.g.,
{{ ref('foo') }}->foo). -
manifest.json parser: dbt's
manifest.jsoncontains the complete dependency graph. DFA could parse this file to obtain model-level lineage without touching SQL at all, then use compiled SQL for column-level detail. -
No parser fixes needed for core SQL: The 8 syntax errors in compiled SQL are minor (nested WITH in specific Snowflake patterns) and affect only 3% of compiled files.