Class TGSqlParser

Object
gudusoft.gsqlparser.TGSqlParser

public class TGSqlParser extends Object
This is the first class people start to use this SQL parser library. This class includes a lexer and a parser. The lexer is used to tokenize the input SQL, turn the input SQL text into a list of source tokens. The parser use this list source token as input, using the grammar rule of the specified database vendor to check the syntax of the input SQL and build the parse tree of this SQL if there is no syntax error in the input SQL.

Creating a SQL parser by specifing a database vendor, then set SQL script text via setSqltext(java.lang.String) method or reading the input SQL from a file via setSqlfilename(java.lang.String) method.

After that, call one of the following methods to achieve what you need:
  • tokenizeSqltext(), turns the input SQL into a sequence of token which is the basic lexis element of SQL syntax. Token is categorized as keyword, identifier, number, operator, whitespace and other types. All source tokens can be fetched via the getSourcetokenlist() method
  • getrawsqlstatements(), separates the SQL statements in the input SQL script without doing syntax check, use the getSqlstatements() method to get a list of SQL statements which is the sub-class of TCustomSqlStatement, get SQL statement type via the TCustomSqlStatement.sqlstatementtype field, and string representation of each SQL statement via the TParseTreeNode.toString() method. All source tokens in this SQL statement is available by using TCustomSqlStatement.sourcetokenlist filed. Since no parse tree is built by calling this method, no further detailed information about the SQL statement is available.
  • parse(), Check syntax of the input SQL, doing some kind of semantic analysis without connecting to a real database. This method will do a in-depth analysis of the input SQL such as building the link between table and columns. The parse tree of the input SQL is available after calling this method.
The parser checks the syntax of those SQL statements one by one. If syntax error is found in a SQL statement, an error will be logged, no parse tree will be built for this SQL statement, the error message can be fetched using the getErrormessage() method.

The syntax error in one SQL statement doesn't prevent the parser continue to check the syntax of the next SQL statement. After checking syntax of all SQL statements, use the getErrorCount() method to get the total number of errors.

A syntax error in a SQL stored procedure will cease this parser to check syntax of the rest SQL statements in this stored procedure.

Format SQL script can be done after calling parse(). int ret = sqlparser.parse(); if (ret == 0){ GFmtOpt option = GFmtOptFactory.newInstance(); String result = FormatterFactory.pp(sqlparser, option); System.out.println(result); }else{ System.out.println(sqlparser.getErrormessage()); }

After paring SQL script, all parse tree nodes are available for use, some of use cases are:

  • Table/column impact analysis
  • SQL rewriting
  • SQL translate between different databases
  • SQL migration analysis
  • Help to anti SQL injection
  • More use cases

Typically, SQL parse tree nodes generated by this SQL Parser were closely related to SQL elements defined in database vendor's SQL reference book. here is a brief summary of some most used SQL elements and corresponding classes defined in this SQL parser.

Some major SQL statements:

Stored procedure

For all available SQL parse tree node classes, please check the API reference.
  • Field Details

    • currentDBVendor

      public static EDbVendor currentDBVendor
    • sqltext

      public String sqltext
      The input SQL Text. If sqlfilename is specified, then this field will be ignored.
    • sqlfilename

      The input SQL will be read from this file If field is specified, then sqltext will be ignored. This must be the full path to the file, relative path doesn't work.
    • sourcetokenlist

      Tokens generated by lexer from the input SQL script. Tokens are always available even if there are syntax errors in input the SQL script.
    • sqlstatements

      SQL statements generated by this parser from the input SQL script. statements are always available even if there are syntax errors in input SQL script. if there is no syntax error in a statement, you can access the parse tree of the statement to fetch more information such as tables, columns, etc.
  • Constructor Details

    • TGSqlParser

      public TGSqlParser(EDbVendor pdbvendor)
      Class constructor, create a new instance of the parser by setting the database vendor
      Parameters:
      pdbvendor - the database vendor whose grammar rule will be used to validate the syntax of the input SQL
  • Method Details

    • setSqlCharset

      public void setSqlCharset(String sqlCharset)
    • getSqlCharset

    • getDBVendorByName

      public static EDbVendor getDBVendorByName(String dbVendorName)
      Turn the string name of database to dbvendor
      • access: EDbVendor.dbvaccess
      • ansi: EDbVendor.dbvansi
      • bigquery: EDbVendor.dbvbigquery
      • clickhouse: EDbVendor.dbvclickhouse
      • couchbase: EDbVendor.dbvcouchbase
      • dax: EDbVendor.dbvdax
      • db2: EDbVendor.dbvdb2
      • firebird: EDbVendor.dbvfirebird
      • generic: EDbVendor.dbvgeneric
      • greenplum: EDbVendor.dbvgreenplum
      • hana: EDbVendor.dbvhana
      • hive: EDbVendor.dbvhive
      • impala: EDbVendor.dbvimpala
      • informix: EDbVendor.dbvinformix
      • mdx: EDbVendor.dbvmdx
      • mssql or sqlserver: EDbVendor.dbvmssql
      • mysql: EDbVendor.dbvmysql
      • netezza: EDbVendor.dbvnetezza
      • odbc: EDbVendor.dbvodbc
      • openedge: EDbVendor.dbvopenedge
      • oracle: EDbVendor.dbvoracle
      • postgresql or postgres: EDbVendor.dbvpostgresql
      • redshift: EDbVendor.dbvredshift
      • snowflake: EDbVendor.dbvsnowflake
      • sybase: EDbVendor.dbvsybase
      • teradata: EDbVendor.dbvteradata
      • vertica: EDbVendor.dbvvertica
      Parameters:
      dbVendorName -
      Returns:
      dbvendor
    • getFrameStack

      public Stack<gudusoft.gsqlparser.compiler.TFrame> getFrameStack()
    • setFrameStack

      public void setFrameStack(Stack<gudusoft.gsqlparser.compiler.TFrame> frameStack)
    • getSourcetokenlist

      A sequence of source tokens created by the lexer after tokenize the input SQL
      Returns:
      a sequence of source tokens
    • getSqlstatements

      A list of SQL statements created by the parser. If this list is created after calling the getrawsqlstatements() method, the syntax of each SQL statement is not checked and the parse tree of each statement is not created. If the parse() method is called to build this SQL statement list, then every thing is ready.
      Returns:
      a list of SQL statement
    • setSqltext

      public void setSqltext(String sqltext)
      set the input SQL text, If sqlfilename is specified before this method, the parser will using the SQL text in this field instead of read SQL from sqlfilename.
      Parameters:
      sqltext - the input SQL text
    • getSqltext

      public String getSqltext()
      The SQL text that being processed.
      Returns:
      the SQL text that being processed
    • setSqlfilename

      public void setSqlfilename(String sqlfilename)
      set the filename from which the input SQL will be read.
      Parameters:
      sqlfilename - the SQL file name from which the input SQL will be read
    • getSqlfilename

      The input SQL filename. This parser can process the unicode encoded SQL file.
      Returns:
      the input SQL filename
    • setSqlInputStream

      public void setSqlInputStream(InputStream sqlInputStream)
      set the InputStream from which SQL will be read. If this method is called, sqlfilename and sqltext will be ignored.
      Parameters:
      sqlInputStream - the InputStream from which SQL will be read
    • getSqlInputStream

      the InputStream from which SQL will be read
      Returns:
      the InputStream from which SQL will be read
    • getResolver2

      Get the TSQLResolver2 instance used for name resolution. Returns null if resolver2 was not used or if parsing failed.
      Returns:
      the TSQLResolver2 instance or null
    • getResolverType

      Get the resolver type used for name resolution.
      Returns:
      the resolver type
    • setResolverType

      public void setResolverType(EResolverType resolverType)
      Set the resolver type to use for name resolution.

      This instance-level setting takes precedence over global TBaseType settings. When set to DEFAULT (the default value), behavior is determined by TBaseType.isEnableResolver() and TBaseType.isEnableResolver2().

      Usage Example:

       TGSqlParser parser = new TGSqlParser(EDbVendor.dbvoracle);
       parser.setResolverType(EResolverType.RESOLVER2);  // Use new resolver
       parser.sqltext = "SELECT * FROM employees";
       parser.parse();
      
       // Access resolver2 results
       TSQLResolver2 resolver = parser.getResolver2();
       
      Parameters:
      resolverType - the resolver type to use
      See Also:
    • getResolver2Config

      Get the TSQLResolverConfig used for resolver2. Returns null if not explicitly set (default config will be used during parsing).
      Returns:
      the resolver2 config or null
    • setResolver2Config

      public void setResolver2Config(TSQLResolverConfig config)
      Set the TSQLResolverConfig to use for resolver2.

      This allows customizing resolver2 behavior such as:

      • guessColumnStrategy - how to handle ambiguous columns
      • legacyCompatibilityEnabled - sync results to legacy structures
      • maxIterations - maximum iterations for iterative resolution

      If not set, a default configuration will be created during parsing with the database vendor automatically set.

      Usage Example:

       TGSqlParser parser = new TGSqlParser(EDbVendor.dbvoracle);
       parser.setResolverType(EResolverType.RESOLVER2);
      
       // Configure resolver2 to not pick ambiguous columns
       TSQLResolverConfig config = new TSQLResolverConfig();
       config.setGuessColumnStrategy(TSQLResolverConfig.GUESS_COLUMN_STRATEGY_NOT_PICKUP);
       parser.setResolver2Config(config);
      
       parser.sqltext = "SELECT id FROM users, orders";
       parser.parse();
       
      Parameters:
      config - the resolver2 configuration, or null for default
      See Also:
    • getSyntaxErrors

      The array of syntax error generated by the parser during checking the syntax of the input SQL, element of this list is type of TSyntaxError
      Returns:
      the array of errors
    • getDbVendor

      The database vendor specified when creating this parser. The grammar rule of this database vendor will be used to validate the syntax of the input SQL.
      Returns:
      the database vendor
    • setEnablePartialParsing

      public void setEnablePartialParsing(boolean enablePartialParsing)
      Deprecated.
      As of v1.4.3.4 enable GSP to parse the rest of sql statements inside stored procedure when a SQL statement in the stored procedure cannot be parsed

      Available to parse sybase stored procedure currently.

      Parameters:
      enablePartialParsing - set true to enable this partial parsing, default is false
    • setSinglePLBlock

      public void setSinglePLBlock(boolean singlePLBlock)
    • getUserName

      public static String getUserName()
      Not used.
      Returns:
      the user name
    • getMachineId

      public static String getMachineId()
      Not used.
      Returns:
      the machine id
    • getLicenseMessage

      public static String getLicenseMessage()
      Not used.
      Returns:
      the license message
    • getLicenseType

      public static String getLicenseType()
      Not used.
      Returns:
      trial license or developer license or distribution license
    • getFlexer

      The lexer which is used to tokenize the input SQL. For delegated vendors (MSSQL), lazily creates the vendor parser to get its lexer.
      Returns:
      the lexer
    • getDelimiterChar

      public char getDelimiterChar()
      Returns the delimiter character used to separate SQL statements. Uses the flexer's delimiter if available (lexer-dependent), otherwise falls back to parser's value.
      Returns:
      the delimiter character
    • setSqlStatementHandle

      public void setSqlStatementHandle(ISQLStatementHandle sqlStatementHandle)
    • setTokenHandle

      public void setTokenHandle(ITokenHandle tokenHandle)
      Set an event handler which will be fired when a new source token is created by the lexer during tokenize the input SQL.
      Parameters:
      tokenHandle - the event handler to process the new created source token
    • setTokenListHandle

      public void setTokenListHandle(ITokenListHandle tokenListHandle)
    • setMetaDatabase

      public void setMetaDatabase(IMetaDatabase metaDatabase)
      Deprecated.
      As of v2.0.3.1, please use getSqlEnv() instead set an instance of a class which implement the interface: IMetaDatabase. The parser will call IMetaDatabase.checkColumn(java.lang.String, java.lang.String, java.lang.String, java.lang.String, java.lang.String) method when it needs to know whether a column is belonged to a table.

      The class that implements the interface: IMetaDatabase usually fetch the metadata from the database by connecting to a database instance.

      If the class is not provided, the parser has to guess the relationship between a un-qualified column and table in the input SQL which may lead to a un-determined result between the column and table.
      Parameters:
      metaDatabase - a new instance of the class which implements the IMetaDatabase interface
      See Also:
    • getMetaDatabase

      Deprecated.
      As of v2.0.3.1, please use getSqlEnv() instead a new instance of the class which implements the IMetaDatabase interface
      Returns:
      a new instance of the class which implements the IMetaDatabase interface
      See Also:
    • freeParseTable

      public void freeParseTable()
      Not used.
    • parseSubquery

      Create a select statement object from the parameter: subquery
      Parameters:
      subquery - a plain text select statement which need to be converted to a TSelectSqlStatement object
      Returns:
      a select statement object, return null if the input select string is syntax invalid.
    • parseSubquery

      public static TSelectSqlStatement parseSubquery(EDbVendor dbVendor, String subquery)
      this method is thread safe. C:\prg\gsp_java\gsp_java_core\src\test\java\gudusoft\gsqlparser\ParseSubqueryThreadSafetyTest.java
    • parseExpression

      Create an expression object from the parameter: expr
      Parameters:
      expr - a plain text expression which will be converted to TExpression object
      Returns:
      an expression object, return null if the input expression string is not syntax valid.
    • parseExpression

      public static TExpression parseExpression(EDbVendor dbVendor, String expr)
    • parseFunctionCall

      public TFunctionCall parseFunctionCall(String newFunction)
      Create a function object from the parameter: newFunction
      Parameters:
      newFunction - a plain text function which will be converted to TFunctionCall object
      Returns:
      a function object, or return null if the input string is not a valid function call.
    • parseFunctionCall

      public static TFunctionCall parseFunctionCall(EDbVendor dbVendor, String newFunction)
    • parseObjectName

      public TObjectName parseObjectName(String newObjectName)
      Create a database objectName from the parameter: newObjectName
      Parameters:
      newObjectName - a plain text objectName which will be converted to TObjectName object
      Returns:
      a database objectName object, return null is the input string is not a valid objectName.
    • parseObjectName

      public static TObjectName parseObjectName(EDbVendor dbVendor, String newObjectName)
      Parameters:
      dbVendor -
      newObjectName -
      Returns:
    • parseConstant

      public TConstant parseConstant(String newConstant)
      Create an constant object from the parameter: newConstant
      Parameters:
      newConstant - a plian text constant which will be converted to TConstant object
      Returns:
      new constant object, returns null if the input is not a valid constant string.
    • parseConstant

      public static TConstant parseConstant(EDbVendor dbVendor, String newConstant)
    • getErrorCount

      public int getErrorCount()
      The total number of syntax errors founded in the input SQL script.
      Only the first syntax error in a SQL statement is counted if there are more than one syntax errors in a single SQL statement.
      In a SQL script, only the first syntax error in each SQL statement is counted.
      Returns:
      The total number of syntax errors founded in the input SQL script. Returns 0 if no syntax error is founded.
    • getErrormessage

      The text of error message generated by iterating all items in getSyntaxErrors(). User may generate error message in their own format by iterating all items in getSyntaxErrors().
      Returns:
      error message
    • checkSyntax

      public int checkSyntax()
      check syntax of the input SQL. This method works exactly the same as parse() method.
      Returns:
      0 means parse SQL script successfully, otherwise, use getErrorCount(), getErrormessage() to get detailed error information.
      See Also:
    • parse

      public int parse()
      Check syntax of the input SQL, doing some kind of semantic analysis without connecting to a real database.

      This method will do a in-depth analysis of the input SQL such as building the link between table and columns. The parse tree of the input SQL is available after calling this method. The parser checks the syntax of those SQL statements one by one. If syntax error is found in a SQL statement, an error will be logged, no parse tree will be built for this SQL statement, the error message can be fetched using the getErrormessage() method.

      The syntax error in one SQL statement doesn't prevent the parser continue to check the syntax of the next SQL statement. After checking syntax of all SQL statements, use the getErrorCount() method to get the total number of errors.

      A syntax error in a SQL stored procedure will cease this parser to check syntax of the rest SQL statements in this stored procedure.
      Returns:
      0 means parse SQL script successfully, otherwise, use getErrorCount(), getErrormessage() to get detailed error information.
      See Also:
    • validate

      public int validate()
    • getLastLineNoOfLastStatementBeenValidated

      Returns the 1-based line number of the end of the last SQL statement that was successfully recognized during raw-statement separation (e.g. via vendor-specific do*getrawsqlstatements routines).

      The value corresponds to the ending line of the last validated statement in the most recent parse operation. Any trailing, incomplete statement at the end of the input is intentionally excluded and will not affect this value.

      Notes:

      • The line number is relative to the current input provided to the parser (not an absolute position in an external, larger file).
      • This is useful when splitting huge SQL files by safe statement boundaries — callers can cut the source at this line without risking a partial statement.
      Returns:
      the 1-based line number of the last validated statement's ending line, or -1 if no statement has been validated yet
    • getrawsqlstatements

      public int getrawsqlstatements()
      separates the SQL statements in the input SQL script without doing syntax check.

      Use the getSqlstatements() method to get the list of SQL statements. The SQL statement object is the instance of the sub-class of TCustomSqlStatement, get SQL statement type via the TCustomSqlStatement.sqlstatementtype field, get string representation of each SQL statement via the TParseTreeNode.toString() method.

      All source tokens in this SQL statement is available by using TCustomSqlStatement.sourcetokenlist filed. Since no parse tree is built by calling this method, no further detailed information about the SQL statement is available.
      Returns:
      0 if get SQL statements successfully
    • tokenizeSqltext

      public void tokenizeSqltext()
      turns the input SQL into a sequence of token which is the basic lexis element of SQL syntax. Token is categorized as keyword, identifier, number, operator, whitespace and other types. All source tokens can be fetched via the getSourcetokenlist() method.
    • getSqlEnv

      public TSQLEnv getSqlEnv()
      SQL environment includes the database metadata such as procedure, function, trigger, table and etc. In order to link column to table correctly without connecting to database, we need to provide a class which implements TSQLEnv to TGSqlParser. this class tells TGSqlParser the relationship between column and table.

      Take this SQL for example:

       SELECT Quantity,b.Time,c.Description
       FROM
       (SELECT ID2,Time FROM bTab) b
       INNER JOIN aTab a on a.ID=b.ID
       INNER JOIN cTab c on a.ID=c.ID
       

      General SQL Parser can build relationship between column: ID2 and table: bTable correctly without metadata information from database because there is only one table in from clause. But it can't judge column: Quantity belong to table: aTab or cTab, since no table alias was prefixed to column: Quantity. If no metadata provided, General SQL Parser will link column: Quantity to the first valid table (here it is aTab)

      If we create a class TRealDatabaseSQLEnv implements TSQLEnv,then setSqlEnv(TSQLEnv), General SQL Parser can take this advantage to create a correct relationship between column and tables.

       class TSQLServerEnv extends TSQLEnv{
      
              public TSQLServerEnv(){
                      super(EDbVendor.dbvmssql);
                      initSQLEnv();
              }
      
          @Override
          public void initSQLEnv() {
      
                      // add a new database: master
                      TSQLCatalog sqlCatalog = createSQLCatalog("master");
                      // add a new schema: dbo
                      TSQLSchema sqlSchema = sqlCatalog.createSchema("dbo");
                      //add a new table: aTab
                      TSQLTable aTab = sqlSchema.createTable("aTab");
                      aTab.addColumn("Quantity1");
      
                      //add a new table: bTab
                      TSQLTable bTab = sqlSchema.createTable("bTab");
                      bTab.addColumn("Quantity2");
      
                      //add a new table: cTab
                      TSQLTable cTab = sqlSchema.createTable("cTab");
                      cTab.addColumn("Quantity");
      
          }
       }
       
      Returns:
      SQL environment
    • setOnlyNeedRawParseTree

      public void setOnlyNeedRawParseTree(boolean onlyNeedRawParseTree)
    • setSqlEnv

      public void setSqlEnv(TSQLEnv sqlEnv)
    • setEnableTimeLogging

      public void setEnableTimeLogging(boolean enable)
      Enable or disable time logging for parser steps
      Parameters:
      enable - true to enable time logging, false to disable
    • isTimeLoggingEnabled

      public boolean isTimeLoggingEnabled()
      Check if time logging is enabled
      Returns:
      true if time logging is enabled, false otherwise
    • resetTimeCounters

      public void resetTimeCounters()
      Reset all accumulated time counters to zero
    • getRawSqlStatementsTime

      public long getRawSqlStatementsTime()
      Get accumulated time spent getting raw SQL statements in milliseconds
      Returns:
      time in milliseconds
    • getParsingTime

      public long getParsingTime()
      Get accumulated time spent parsing in milliseconds
      Returns:
      time in milliseconds
    • getSemanticAnalysisTime

      public long getSemanticAnalysisTime()
      Get accumulated time spent on semantic analysis in milliseconds
      Returns:
      time in milliseconds
    • getInterpreterTime

      public long getInterpreterTime()
      Get accumulated time spent in interpreter in milliseconds
      Returns:
      time in milliseconds
    • getTotalTime

      public long getTotalTime()
      Get total accumulated time spent in all steps
      Returns:
      total time in milliseconds
    • prepareForReuse

      public void prepareForReuse()
      Prepare this parser for reuse by clearing cached state. This is useful when reusing a TGSqlParser instance (e.g., in TExecImmeStmt) to avoid state pollution between parsing operations. Clears: vendorParser, sqlEnv, sqlfilename, and resets parsing options.
      Since:
      3.2.0.0
    • getTimeStatistics

      Returns time statistics as a formatted string
      Returns:
      formatted string with time statistics
    • setTeradataUtilityType

      public void setTeradataUtilityType(TeradataUtilityType teradataUtilityType)
    • dispose

      public void dispose()
      Dispose of parser resources and clean up references.

      This method releases internal resources used by the parser. After calling this method, the parser should not be used further.

      If a ManagedSourceBuffer was set, the source text remains in the buffer so that tokens can still access it. The user is responsible for calling ManagedSourceBuffer#release() when all tokens are no longer needed.

      Note: Tokens created by this parser will remain usable after dispose() if a ManagedSourceBuffer was used, as they reference the buffer by ID rather than holding direct parser references.

      Since:
      3.1.0.9