Class AbstractSqlParser

Object
gudusoft.gsqlparser.parser.AbstractSqlParser
All Implemented Interfaces:
SqlParser
Direct Known Subclasses:
AnsiSqlParser, AthenaSqlParser, BigQuerySqlParser, CouchbaseSqlParser, DatabricksSqlParser, DaxSqlParser, Db2SqlParser, GaussDbSqlParser, GreenplumSqlParser, HanaSqlParser, HiveSqlParser, ImpalaSqlParser, InformixSqlParser, MdxSqlParser, MssqlSqlParser, MySqlSqlParser, NetezzaSqlParser, OdbcSqlParser, OpenEdgeSqlParser, OracleSqlParser, PostgreSqlParser, PrestoSqlParser, RedshiftSqlParser, SnowflakeSqlParser, SoqlSqlParser, SparkSqlParser, SparksqlSqlParser, SybaseSqlParser, TeradataSqlParser, VerticaSqlParser

public abstract class AbstractSqlParser extends Object implements SqlParser
Abstract base class providing common logic and template methods for SQL parsing.

This class implements the Template Method Pattern, defining the skeleton of the parsing algorithm while allowing subclasses to override specific steps. It provides default implementations for common operations and hooks for vendor-specific customization.

Design Pattern: Template Method

Parsing Algorithm (Template Method):

  1. Get lexer (getLexer(ParserContext))
  2. Tokenize SQL (performTokenization(ParserContext, TCustomLexer))
  3. Process tokens (processTokensBeforeParse(ParserContext, TSourceTokenList))
  4. Get parser(s) (getParser(ParserContext, TSourceTokenList))
  5. Parse SQL (#performParsing(ParserContext, TCustomParser, TCustomParser, TSourceTokenList))
  6. Semantic analysis (performSemanticAnalysis(ParserContext, TStatementList))

Subclass Responsibilities:

 public class OracleSqlParser extends AbstractSqlParser {
     public OracleSqlParser() {
         super(EDbVendor.dbvoracle);
         this.delimiterChar = '/';
     }

     // Must implement abstract methods
     protected TCustomLexer getLexer(ParserContext context) {
         return new TLexerOracle();
     }

     protected TCustomParser getParser(ParserContext context, TSourceTokenList tokens) {
         return new TParserOracleSql(tokens);
     }

     // ... other abstract methods

     // Optionally override hook methods
     protected void processTokensBeforeParse(ParserContext context, TSourceTokenList tokens) {
         // Oracle-specific token processing
     }
 }
 
Since:
3.2.0.0
See Also:
  • Field Details

    • vendor

      protected final EDbVendor vendor
    • delimiterChar

      protected char delimiterChar
    • defaultDelimiterStr

    • syntaxErrors

    • sourcetokenlist

      Token list container - created once in constructor, cleared before each parse.

      This follows the component reuse pattern to avoid allocation overhead.

    • sqlstatements

      Statement list container - created once in constructor, cleared before each extraction.

      This follows the component reuse pattern to avoid allocation overhead.

    • parserContext

      Current parser context for the ongoing parse operation.

      Set at the beginning of each parse operation, contains input SQL and options.

    • sqlcmds

      protected ISqlCmds sqlcmds
      SQL command resolver for identifying statement types (SELECT, INSERT, etc.).

      Initialized lazily using SqlCmdsFactory.get(vendor) - vendor-specific implementation.

    • lexer

      protected TCustomLexer lexer
      The lexer instance used for tokenization.

      Subclasses should set this field in their constructor to their specific lexer instance. This allows common tokenization logic in AbstractSqlParser to access the lexer generically.

    • globalContext

      protected gudusoft.gsqlparser.compiler.TContext globalContext
      Global context for semantic analysis.

      Created during performParsing phase, contains SQL environment and statement references.

    • sqlEnv

      protected TSQLEnv sqlEnv
      SQL environment for semantic analysis.

      Vendor-specific environment configuration, used by resolver and semantic analysis.

    • frameStack

      protected Stack<gudusoft.gsqlparser.compiler.TFrame> frameStack
      Frame stack for scope management during parsing.

      Used to track nested scopes (global, statement, block-level) during parsing.

    • globalFrame

      protected gudusoft.gsqlparser.compiler.TFrame globalFrame
      Global frame pushed to frame stack during parsing.

      Represents the outermost scope, must be popped after parsing completes.

  • Constructor Details

    • AbstractSqlParser

      protected AbstractSqlParser(EDbVendor vendor)
      Construct parser for given database vendor.
      Parameters:
      vendor - the database vendor
  • Method Details

    • getVendor

      public EDbVendor getVendor()
      Description copied from interface: SqlParser
      Get the database vendor this parser handles.
      Specified by:
      getVendor in interface SqlParser
      Returns:
      the database vendor (e.g., dbvoracle, dbvmysql)
    • setTokenHandle

      public void setTokenHandle(ITokenHandle tokenHandle)
      Set an event handler which will be fired when a new source token is created by the lexer during tokenization.
      Parameters:
      tokenHandle - the event handler to process the new created source token
    • parse

      public final SqlParseResult parse(ParserContext context)
      Template method for full parsing.

      This method defines the skeleton of the parsing algorithm. Subclasses should NOT override this method; instead, they should override the abstract methods and hook methods called by this template.

      Algorithm:

      1. Create lexer
      2. Tokenize (time tracked)
      3. Process tokens (vendor-specific preprocessing)
      4. Create parser(s)
      5. Parse (time tracked)
      6. Semantic analysis (time tracked)
      7. Interpreter (time tracked)
      Specified by:
      parse in interface SqlParser
      Parameters:
      context - immutable context with all inputs
      Returns:
      immutable result with all outputs
    • tokenize

      public final SqlParseResult tokenize(ParserContext context)
      Template method for tokenization only (without full parsing).

      This method is used by getrawsqlstatements() which only needs tokenization and raw statement extraction, without detailed syntax checking or semantic analysis.

      Algorithm:

      1. Get lexer
      2. Tokenize (time tracked)
      3. Extract raw statements (no parsing)
      Specified by:
      tokenize in interface SqlParser
      Parameters:
      context - immutable context with all inputs
      Returns:
      immutable result with tokens and raw statements
    • getrawsqlstatements

      Template method for extracting raw statements without full parsing.

      This method performs tokenization and raw statement extraction, but skips the expensive full parsing and semantic analysis steps.

      Algorithm:

      1. Tokenize SQL (via tokenize(ParserContext))
      2. Extract raw statements (via extractRawStatements(ParserContext, TSourceTokenList, TCustomLexer, long))
      3. Return result with tokens and raw statements

      Equivalent to legacy API: TGSqlParser.getrawsqlstatements()

      Specified by:
      getrawsqlstatements in interface SqlParser
      Parameters:
      context - immutable context with all inputs
      Returns:
      immutable result with tokens and raw statements (no AST)
    • getLexer

      protected abstract TCustomLexer getLexer(ParserContext context)
      Get the lexer for this vendor.

      Subclass Responsibility: Return vendor-specific lexer instance. The lexer may be created fresh or cached/reused for performance.

      Example:

       protected TCustomLexer getLexer(ParserContext context) {
           TLexerOracle lexer = new TLexerOracle();
           lexer.delimiterchar = delimiterChar;
           lexer.defaultDelimiterStr = defaultDelimiterStr;
           return lexer;
       }
       
      Parameters:
      context - the parser context
      Returns:
      configured lexer instance (never null)
    • getParser

      protected abstract TCustomParser getParser(ParserContext context, TSourceTokenList tokens)
      Get the main parser for this vendor.

      Subclass Responsibility: Return vendor-specific parser instance. The parser may be created fresh or cached/reused for performance. If reusing, the token list should be updated.

      Example:

       protected TCustomParser getParser(ParserContext context, TSourceTokenList tokens) {
           TParserOracleSql parser = new TParserOracleSql(tokens);
           parser.lexer = getLexer(context);
           return parser;
       }
       
      Parameters:
      context - the parser context
      tokens - the source token list
      Returns:
      configured parser instance (never null)
    • performTokenization

      Perform tokenization using vendor-specific lexer.

      Template Method: This method implements the common tokenization algorithm across all database vendors. Subclasses customize through one hook: tokenizeVendorSql() - Call vendor-specific tokenization logic

      Algorithm:

      1. Store parser context
      2. Prepare SQL reader (file/string with charset detection)
      3. Configure lexer with input reader and charset
      4. Reset lexer state
      5. Clear token list and reset position
      6. Reset token table cache
      7. Call tokenizeVendorSql() hook
      8. Return populated token list
      Parameters:
      context - parser context with SQL input configuration
      lexer - the lexer instance (same as this.flexer)
      Returns:
      token list populated by vendor-specific tokenization
      Throws:
      RuntimeException - if tokenization fails
    • tokenizeVendorSql

      protected abstract void tokenizeVendorSql()
      Call vendor-specific tokenization logic.

      Hook Method: Called by performTokenization(gudusoft.gsqlparser.parser.ParserContext, gudusoft.gsqlparser.TCustomLexer) to execute vendor-specific SQL-to-token conversion logic.

      Subclass Responsibility: Call the vendor-specific tokenization method (e.g., dooraclesqltexttotokenlist, domssqlsqltexttotokenlist) which reads from lexer and populates sourcetokenlist.

      Example (Oracle):

       protected void tokenizeVendorSql() {
           dooraclesqltexttotokenlist();
       }
       

      Example (MSSQL):

       protected void tokenizeVendorSql() {
           domssqlsqltexttotokenlist();
       }
       

      Example (PostgreSQL):

       protected void tokenizeVendorSql() {
           dopostgresqltexttotokenlist();
       }
       
    • doExtractRawStatements

      Extract raw statements without full parsing (public API).

      This public method allows external callers (like TGSqlParser) to extract raw statements from an already-tokenized source list without re-tokenization.

      Specified by:
      doExtractRawStatements in interface SqlParser
      Parameters:
      context - the parser context
      tokens - the source token list (already tokenized)
      Returns:
      statement list (never null)
      Since:
      3.2.0.0
    • extractRawStatements

      protected SqlParseResult extractRawStatements(ParserContext context, TSourceTokenList tokens, TCustomLexer lexer, long tokenizationTimeMs)
      Extract raw statements without full parsing.

      Template Method: This method implements the common algorithm for extracting raw statements across all database vendors. Subclasses customize the process through two hook methods:

      Algorithm:

      1. Create SqlParseResult.Builder
      2. Set common fields (lexer, tokens, tokenization time)
      3. Store context and tokens for extraction
      4. Initialize SQL command resolver
      5. Call setupVendorParsersForExtraction() hook
      6. Time the extraction
      7. Call extractVendorRawStatements(SqlParseResult.Builder) hook
      8. Set parsing time
      9. Build and return result
      Parameters:
      context - the parser context
      tokens - the source token list
      lexer - the lexer instance (for including in result)
      tokenizationTimeMs - tokenization time from tokenize() step
      Returns:
      complete SqlParseResult with raw statements and metadata
    • setupVendorParsersForExtraction

      protected abstract void setupVendorParsersForExtraction()
      Setup vendor-specific parsers for raw statement extraction.

      Hook Method: Called by extractRawStatements(gudusoft.gsqlparser.parser.ParserContext, gudusoft.gsqlparser.TSourceTokenList, gudusoft.gsqlparser.TCustomLexer, long) after initializing sqlcmds but before calling the vendor-specific extraction logic.

      Subclass Responsibility: Inject sqlcmds into vendor parser(s) and update their token lists. Examples:

      • Single parser (MSSQL): Inject into fparser only
      • Dual parsers (Oracle): Inject into both fparser and fplsqlparser

      Example (MSSQL):

       protected void setupVendorParsersForExtraction() {
           this.fparser.sqlcmds = this.sqlcmds;
           this.fparser.sourcetokenlist = this.sourcetokenlist;
       }
       

      Example (Oracle with dual parsers):

       protected void setupVendorParsersForExtraction() {
           this.fparser.sqlcmds = this.sqlcmds;
           this.fplsqlparser.sqlcmds = this.sqlcmds;
           this.fparser.sourcetokenlist = this.sourcetokenlist;
           this.fplsqlparser.sourcetokenlist = this.sourcetokenlist;
       }
       
    • extractVendorRawStatements

      protected abstract void extractVendorRawStatements(SqlParseResult.Builder builder)
      Call vendor-specific raw statement extraction logic.

      Hook Method: Called by extractRawStatements(gudusoft.gsqlparser.parser.ParserContext, gudusoft.gsqlparser.TSourceTokenList, gudusoft.gsqlparser.TCustomLexer, long) to execute the vendor-specific logic for identifying statement boundaries.

      Subclass Responsibility: Call the vendor-specific extraction method (e.g., dooraclegetrawsqlstatements, domssqlgetrawsqlstatements) passing the builder. The extraction method will populate the builder with raw statements.

      Example (Oracle):

       protected void extractVendorRawStatements(SqlParseResult.Builder builder) {
           dooraclegetrawsqlstatements(builder);
       }
       

      Example (MSSQL):

       protected void extractVendorRawStatements(SqlParseResult.Builder builder) {
           domssqlgetrawsqlstatements(builder);
       }
       
      Parameters:
      builder - the result builder to populate with raw statements
    • performParsing

      protected abstract TStatementList performParsing(ParserContext context, TCustomParser parser, TCustomParser secondaryParser, TSourceTokenList tokens, TStatementList rawStatements)
      Perform actual parsing with syntax checking.

      Subclass Responsibility: Parse SQL using vendor-specific parser and optional secondary parser (e.g., PL/SQL for Oracle).

      Important: This method receives raw statements that have already been extracted by getrawsqlstatements(ParserContext). Subclasses should NOT re-extract statements - just parse each statement to build the AST.

      Example:

       protected TStatementList performParsing(ParserContext context,
                                               TCustomParser parser,
                                               TCustomParser secondaryParser,
                                               TSourceTokenList tokens,
                                               TStatementList rawStatements) {
           // Use the passed-in rawStatements (DO NOT re-extract!)
           for (int i = 0; i < rawStatements.size(); i++) {
               TCustomSqlStatement stmt = rawStatements.get(i);
               stmt.parsestatement(...);  // Build AST for each statement
           }
           return rawStatements;
       }
       
      Parameters:
      context - the parser context
      parser - the main parser instance
      secondaryParser - secondary parser (may be null)
      tokens - the source token list
      rawStatements - raw statements already extracted (never null)
      Returns:
      statement list with parsed AST (never null)
    • getSecondaryParser

      Get secondary parser (e.g., PL/SQL for Oracle).

      Hook Method: Default implementation returns null. Override if vendor needs a secondary parser. The parser may be created fresh or cached/reused for performance.

      Example (Oracle):

       protected TCustomParser getSecondaryParser(ParserContext context, TSourceTokenList tokens) {
           TParserOraclePLSql plsqlParser = new TParserOraclePLSql(tokens);
           plsqlParser.lexer = getLexer(context);
           return plsqlParser;
       }
       
      Parameters:
      context - the parser context
      tokens - the source token list
      Returns:
      secondary parser instance, or null if not needed
    • doAfterTokenize

      protected void doAfterTokenize(TSourceTokenList tokens)
      Post-tokenization normalization.

      Handles matching parentheses wrapping around SQL and marks semicolons before closing parens to be ignored.

      Extracted from: TGSqlParser.doAfterTokenize() (lines 5123-5161)

      Parameters:
      tokens - the source token list (mutable)
    • processTokensInTokenTable

      protected void processTokensInTokenTable(ParserContext context, TCustomLexer lexer, TSourceTokenList tokens)
      Process tokens using token table (vendor-specific token code adjustments).

      Currently handles BigQuery and Snowflake to convert DO keywords to identifiers when there's no corresponding WHILE/FOR.

      Extracted from: TGSqlParser.processTokensInTokenTable() (lines 5186-5209)

      Parameters:
      context - the parser context
      lexer - the lexer (for accessing TOKEN_TABLE)
      tokens - the source token list (mutable)
    • processTokensBeforeParse

      protected void processTokensBeforeParse(ParserContext context, TSourceTokenList tokens)
      Process tokens before parsing (vendor-specific adjustments).

      Hook Method: Default implementation handles Snowflake consecutive semicolons. Override if vendor needs additional token preprocessing.

      Extracted from: TGSqlParser.processTokensBeforeParse() (lines 5165-5184)

      Example:

       protected void processTokensBeforeParse(ParserContext context, TSourceTokenList tokens) {
           super.processTokensBeforeParse(context, tokens); // Call base implementation
           // Add vendor-specific processing...
       }
       
      Parameters:
      context - the parser context
      tokens - the source token list (mutable)
    • performSemanticAnalysis

      protected void performSemanticAnalysis(ParserContext context, TStatementList statements)
      Perform semantic analysis on parsed statements.

      Hook Method: Default implementation does nothing. Override to provide vendor-specific semantic analysis.

      Typical Implementation:

      • Column-to-table resolution (TSQLResolver)
      • Dataflow analysis
      • Reference resolution
      • Scope resolution
      Parameters:
      context - the parser context
      statements - the parsed statements (mutable)
    • performInterpreter

      protected void performInterpreter(ParserContext context, TStatementList statements)
      Perform interpretation/evaluation on parsed statements.

      Hook Method: Default implementation does nothing. Override to provide AST interpretation/evaluation.

      Typical Implementation:

      • Execute simple SQL statements
      • Evaluate expressions
      • Constant folding
      • Static analysis
      Parameters:
      context - the parser context
      statements - the parsed statements (mutable)
    • copyErrorsFromStatement

      protected void copyErrorsFromStatement(TCustomSqlStatement statement)
      Copy error messages from a statement to the parser's error collection.

      This method should be called by performParsing implementations when a statement has syntax errors.

      Parameters:
      statement - the statement with errors
    • attemptErrorRecovery

      protected int attemptErrorRecovery(TCustomSqlStatement statement, int parseResult, boolean onlyNeedRawParseTree)
      Attempt error recovery for CREATE TABLE/INDEX statements with unsupported options.

      When parsing CREATE TABLE or CREATE INDEX statements, the parser may encounter vendor-specific options that are not in the grammar. This method implements the legacy error recovery behavior by marking unsupported tokens after the main definition as SQL*Plus commands (effectively ignoring them).

      Recovery Strategy:

      1. Find the closing ')' of the column/index definitions (nested=0)
      2. Mark all remaining tokens (except ';') as sqlpluscmd to ignore them
      3. Clear errors and re-parse the statement

      When to call: After parsing a statement that has errors. Only recovers if ENABLE_ERROR_RECOVER_IN_CREATE_TABLE is true.

      Parameters:
      statement - the statement to attempt recovery on
      parseResult - the result code from parsing (0 = success)
      onlyNeedRawParseTree - whether only raw parse tree is needed
      Returns:
      new parse result after recovery attempt, or original if no recovery
    • getSyntaxErrors

      Get the syntax errors collected during parsing.
      Returns:
      list of syntax errors (never null)
    • getErrorCount

      public int getErrorCount()
      Get the count of syntax errors.
      Returns:
      number of syntax errors
    • isDollarFunctionDelimiter

      protected boolean isDollarFunctionDelimiter(int tokencode, EDbVendor dbVendor)
      Check if a token is a dollar function delimiter ($$, $tag$, etc.) for PostgreSQL-family databases.

      Migrated from TGSqlParser.isDollarFunctionDelimiter() (lines 5074-5080).

      Dollar-quoted strings are used in PostgreSQL-family databases to delimit function bodies. Each vendor has its own delimiter token code.

      Parameters:
      tokencode - the token code to check
      dbVendor - the database vendor
      Returns:
      true if the token is a dollar function delimiter for the given vendor
    • onRawStatementComplete

      protected void onRawStatementComplete(ParserContext context, TCustomSqlStatement statement, TCustomParser mainParser, TCustomParser secondaryParser, TStatementList statementList, boolean isLastStatement, SqlParseResult.Builder builder)
      Hook method called when a raw statement is complete.

      This method is called by vendor-specific raw statement extraction methods (e.g., dooraclegetrawsqlstatements) when a statement boundary is detected. It sets up the statement with parser references and adds it to the statement list.

      Parameters:
      context - parser context
      statement - the completed statement
      mainParser - main parser instance
      secondaryParser - secondary parser instance (may be null)
      statementList - statement list to add to
      isLastStatement - true if this is the last statement
      builder - optional result builder (used during raw statement extraction, may be null)
    • onRawStatementCompleteVendorSpecific

      Hook for vendor-specific logic when a raw statement is completed.

      Migrated from TGSqlParser.doongetrawsqlstatementevent() (lines 5129-5178).

      This method is called after basic statement setup but before adding to the statement list. Subclasses can override to add vendor-specific token manipulations or metadata.

      Default implementation handles PostgreSQL-family routine body processing.

      Parameters:
      statement - the completed statement
    • prepareSqlReader

      Throws:
      IOException
    • initializeGlobalContext

      protected void initializeGlobalContext()
      Initialize global context and frame stack for statement parsing.

      This method sets up the semantic analysis infrastructure required during the parsing phase. It creates:

      • Global context (TContext) for semantic analysis
      • SQL environment (TSQLEnv) with vendor-specific configuration
      • Frame stack for scope management
      • Global scope frame as the outermost scope

      When to call: At the beginning of performParsing(), before parsing statements.

      Cleanup required: Must call globalFrame.popMeFromStack(frameStack) after all statements are parsed to clean up the frame stack.

      Extracted from: Identical implementations in OracleSqlParser and MssqlSqlParser to eliminate ~16 lines of duplicate code per parser.

    • handleStatementParsingException

      protected void handleStatementParsingException(TCustomSqlStatement stmt, int statementIndex, Exception ex)
      Handle exceptions that occur during individual statement parsing.

      This method provides robust error handling that allows parsing to continue even when individual statements throw exceptions. It:

      • Creates a detailed TSyntaxError with exception information
      • Captures statement location (line, column) from first token
      • Includes statement number, exception type, and message
      • Optionally logs full stack trace if debugging is enabled
      • Adds error to syntaxErrors list for user feedback

      Benefits:

      • Parsing continues for remaining statements after exception
      • Users get complete error feedback for all statements
      • Developers get stack traces for debugging parser issues

      Example error message:
      "Exception during parsing statement 3: NullPointerException - Cannot invoke..."

      Extracted from: Identical implementations in OracleSqlParser and MssqlSqlParser to eliminate ~51 lines of duplicate code per parser.

      Parameters:
      stmt - the statement that failed to parse
      statementIndex - 0-based index of the statement in the statement list
      ex - the exception that was thrown during parsing
    • afterStatementParsed

      Hook method for vendor-specific post-processing after a statement is parsed.

      This method is called after each statement is successfully parsed but before error recovery and error collection. Subclasses can override this to perform vendor-specific operations such as:

      • Checking for vendor-specific syntax errors in nested statements
      • Validating vendor-specific constraints
      • Collecting vendor-specific metadata

      Default implementation: Does nothing (no-op).

      Example override (Oracle):

      
       @Override
       protected void afterStatementParsed(TCustomSqlStatement stmt) {
           if (stmt.isoracleplsql()) {
               findAllSyntaxErrorsInPlsql(stmt);
           }
       }
       

      When called: After stmt.parsestatement() succeeds, before handleCreateTableErrorRecovery() and copyErrorsFromStatement().

      Parameters:
      stmt - the statement that was just parsed
    • getanewsourcetoken

      Get next source token from the lexer.

      This method wraps the lexer's yylexwrap() call and performs several important tasks:

      • Fetches the next raw token from the lexer
      • Combines consecutive whitespace/newline tokens for cleaner token stream
      • Sets token metadata (vendor, status, container, position in list)
      • Optionally calls token handler callback

      Token Consolidation Rules:

      • Whitespace after a newline is merged into the newline token
      • Consecutive newlines are merged into a single newline token

      Implementation Note: This method is extracted from TGSqlParser.getanewsourcetoken() and made available to all database-specific parsers to avoid code duplication.

      Returns:
      next source token, or null if end of input
    • towinlinebreak

      protected String towinlinebreak(String s)
      Convert line breaks to Windows format.

      Currently returns the input unchanged. This method exists for compatibility with the original TGSqlParser implementation.

      Parameters:
      s - Input string
      Returns:
      String with Windows line breaks (currently unchanged)
    • getDelimiterChar

      public char getDelimiterChar()
      Get the delimiter character for this vendor.
      Returns:
      delimiter character (e.g., ';', '/', '$')
    • getDefaultDelimiterStr

      Get the default delimiter string for this vendor.
      Returns:
      default delimiter string
    • toString

      public String toString()
      Overrides:
      toString in class Object