Class DatabricksSqlParser
- All Implemented Interfaces:
SqlParser
This parser handles Databricks-specific SQL syntax including:
- Databricks SQL dialect and extensions
- Databricks PL/SQL blocks
- Special handling for VALUES keyword in INSERT statements
- Datatype casting with literals (e.g., DATE '2021-2-1')
Implementation Status: MIGRATED
- Phase: Complete migration from delegation to full AbstractSqlParser implementation
- Current: Self-contained Databricks parser using AbstractSqlParser template
- Goal: No delegation to legacy TGSqlParser
- Since:
- 3.2.0.0
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class gudusoft.gsqlparser.parser.AbstractSqlParser
AbstractSqlParser.PreparedSqlReader -
Field Summary
FieldsModifier and TypeFieldDescriptionThe Databricks lexer used for tokenizationFields inherited from class gudusoft.gsqlparser.parser.AbstractSqlParser
defaultDelimiterStr, delimiterChar, frameStack, globalContext, globalFrame, lexer, parserContext, sourcetokenlist, sqlcmds, sqlEnv, sqlstatements, syntaxErrors, vendor -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected voidPost-processing hook after each statement is parsed.protected voidHook method: Extract raw Databricks SQL statements.protected TCustomLexergetLexer(ParserContext context) Return the Databricks lexer instance.protected TCustomParsergetParser(ParserContext context, TSourceTokenList tokens) Return the Databricks SQL parser instance with updated token list.protected TCustomParsergetSecondaryParser(ParserContext context, TSourceTokenList tokens) Databricks uses a single parser, no secondary parser needed.protected voidHandle error recovery for CREATE TABLE statements.protected voidperformInterpreter(ParserContext context, TStatementList statements) Perform interpretation (execute SQL in interpreter mode).protected TStatementListperformParsing(ParserContext context, TCustomParser parser, TCustomParser secondaryParser, TSourceTokenList tokens, TStatementList rawStatements) Parse all raw statements to build AST.protected voidperformSemanticAnalysis(ParserContext context, TStatementList statements) Perform semantic analysis (resolve column-table relationships, etc.).protected voidHook method: Setup parsers for raw statement extraction.protected voidHook method: Tokenize Databricks SQL by calling vendor-specific tokenization.toString()Methods inherited from class gudusoft.gsqlparser.parser.AbstractSqlParser
attemptErrorRecovery, copyErrorsFromStatement, doAfterTokenize, doExtractRawStatements, extractRawStatements, getanewsourcetoken, getDefaultDelimiterStr, getDelimiterChar, getErrorCount, getrawsqlstatements, getSyntaxErrors, getVendor, handleStatementParsingException, initializeGlobalContext, isDollarFunctionDelimiter, onRawStatementComplete, onRawStatementCompleteVendorSpecific, parse, performTokenization, prepareSqlReader, processTokensBeforeParse, processTokensInTokenTable, setTokenHandle, tokenize, towinlinebreak
-
Field Details
-
flexer
The Databricks lexer used for tokenization
-
-
Constructor Details
-
DatabricksSqlParser
public DatabricksSqlParser()Construct Databricks SQL parser.Configures the parser for Databricks database with default delimiter: semicolon (;)
Following the original TGSqlParser pattern, the lexer and parser are created once in the constructor and reused for all parsing operations.
-
-
Method Details
-
getLexer
Return the Databricks lexer instance.The lexer is created once in the constructor and reused for all parsing operations.
- Specified by:
getLexerin classAbstractSqlParser- Parameters:
context- parser context (not used, lexer already created)- Returns:
- the Databricks lexer instance created in constructor
-
getParser
Return the Databricks SQL parser instance with updated token list.The parser is created once in the constructor and reused for all parsing operations.
- Specified by:
getParserin classAbstractSqlParser- Parameters:
context- parser context (not used, parser already created)tokens- source token list to parse- Returns:
- the Databricks SQL parser instance created in constructor
-
getSecondaryParser
Databricks uses a single parser, no secondary parser needed.- Overrides:
getSecondaryParserin classAbstractSqlParser- Parameters:
context- parser contexttokens- source token list- Returns:
- null (no secondary parser)
-
tokenizeVendorSql
Hook method: Tokenize Databricks SQL by calling vendor-specific tokenization.- Specified by:
tokenizeVendorSqlin classAbstractSqlParser
-
setupVendorParsersForExtraction
Hook method: Setup parsers for raw statement extraction. Inject sqlcmds and sourcetokenlist into parser.- Specified by:
setupVendorParsersForExtractionin classAbstractSqlParser
-
extractVendorRawStatements
Hook method: Extract raw Databricks SQL statements.- Specified by:
extractVendorRawStatementsin classAbstractSqlParser- Parameters:
builder- the result builder to populate
-
performParsing
protected TStatementList performParsing(ParserContext context, TCustomParser parser, TCustomParser secondaryParser, TSourceTokenList tokens, TStatementList rawStatements) Parse all raw statements to build AST.This method iterates through all raw statements and calls parsestatement() on each one to build the Abstract Syntax Tree. It handles error recovery for CREATE TABLE statements and collects syntax errors.
- Specified by:
performParsingin classAbstractSqlParser- Parameters:
context- parser context with configurationparser- primary parser instancesecondaryParser- secondary parser (null for Databricks)tokens- source token listrawStatements- raw statements from extraction phase- Returns:
- statement list with parsed AST
-
afterStatementParsed
Post-processing hook after each statement is parsed.Default implementation does nothing. Override if needed for vendor-specific post-processing.
- Overrides:
afterStatementParsedin classAbstractSqlParser- Parameters:
stmt- the statement that was just parsed
-
handleCreateTableErrorRecovery
Handle error recovery for CREATE TABLE statements.This method attempts to recover from parse errors in CREATE TABLE statements by marking unparseable table properties (like ROW FORMAT, STORED AS, etc.) as sqlpluscmd and retrying.
Databricks/Hive DDL allows complex table properties after the column definition that may not be fully supported in the grammar. This error recovery allows partial parsing of the main table structure.
Extracted from TGSqlParser.doparse() lines 16916-16971
- Parameters:
stmt- the statement with errors
-
performSemanticAnalysis
Perform semantic analysis (resolve column-table relationships, etc.).This method runs the TSQLResolver to build semantic relationships between columns and tables, among other analysis.
- Overrides:
performSemanticAnalysisin classAbstractSqlParser- Parameters:
context- parser contextstatements- statement list to analyze
-
performInterpreter
Perform interpretation (execute SQL in interpreter mode).This method runs the TASTEvaluator to interpret/execute the SQL.
- Overrides:
performInterpreterin classAbstractSqlParser- Parameters:
context- parser contextstatements- statement list to interpret
-
toString
- Overrides:
toStringin classAbstractSqlParser
-