Class IdentifierRules

Object
gudusoft.gsqlparser.sqlenv.IdentifierRules

public final class IdentifierRules extends Object
标识符规则四元组(per vendor, per object group)

定义数据库厂商的标识符大小写规则,区分 quoted 和 unquoted 标识符的处理方式。

设计来源:dbobject_search.md 资深设计师方案

使用示例:

 // Oracle: unquoted 折叠为大写且不敏感,quoted 保留原样且敏感
 IdentifierRules oracleRules = IdentifierRules.forOracle();

 // ClickHouse: 全部大小写敏感
 IdentifierRules clickhouseRules = IdentifierRules.forClickHouse();
 
Since:
3.1.0.9
  • Field Details

  • Constructor Details

  • Method Details

    • forOracle

      public static IdentifierRules forOracle()
      Oracle 标识符规则

      实际数据库行为(Oracle 12c+):

      • Unquoted: 折叠为大写,比较不敏感 (CREATE TABLE foo → stored as FOO, foo=FOO=Foo)
      • Quoted: 保留原样,比较敏感 (CREATE TABLE "foo" → stored as foo, "foo"!="FOO")

      与 Legacy TSQLEnv 兼容性:

      • columnCollationCaseSensitive = true → ❌ INCOMPATIBLE (legacy preserved case, new folds to UPPER)
      • functionCollationCaseSensitive = true → ❌ INCOMPATIBLE (legacy preserved case, new folds to UPPER)
      • tableCollationCaseSensitive = true → ❌ INCOMPATIBLE (legacy preserved case, new folds to UPPER)
      • catalogCollationCaseSensitive = false → ✅ COMPATIBLE (both fold to UPPER)

      测试用例影响:

      • ✅ 新规则正确:Oracle 确实将 unquoted identifiers 折叠为大写
      • ⚠️ 如果旧测试期望保留原始大小写 (如 "myTable" 保持为 "myTable"),则测试会失败
      • ⚠️ 应更新测试期望为大写 (如 "myTable" → "MYTABLE")

      IdentifierRules 配置:

    • forPostgreSQL

      public static IdentifierRules forPostgreSQL()
      PostgreSQL / Redshift / Greenplum 标识符规则

      实际数据库行为(PostgreSQL 12+):

      • Unquoted: 折叠为小写,比较不敏感 (CREATE TABLE MyTable → stored as mytable, MyTable=mytable=MYTABLE)
      • Quoted: 保留原样,比较敏感 (CREATE TABLE "MyTable" → stored as MyTable, "MyTable"!="mytable")

      与 Legacy TSQLEnv 兼容性:

      • columnCollationCaseSensitive = false → ⚠️ PARTIAL (legacy folded to UPPER, new folds to LOWER)
      • functionCollationCaseSensitive = true → ❌ INCOMPATIBLE (legacy preserved case, new folds to LOWER)
      • tableCollationCaseSensitive = true → ❌ INCOMPATIBLE (legacy preserved case, new folds to LOWER)
      • catalogCollationCaseSensitive = true → ❌ INCOMPATIBLE (legacy preserved case, new folds to LOWER)

      测试用例影响:

      • ✅ 新规则正确:PostgreSQL 确实将 unquoted identifiers 折叠为小写
      • ⚠️ 如果旧测试期望大写 (如 "MyTable" → "MYTABLE"),则测试会失败
      • ⚠️ 应更新测试期望为小写 (如 "MyTable" → "mytable")
      • ⚠️ 如果旧测试期望保留原始大小写,也会失败

      IdentifierRules 配置:

    • forClickHouse

      public static IdentifierRules forClickHouse()
      ClickHouse 标识符规则

      实际数据库行为(ClickHouse 20+):

      • Unquoted: 保留原样,比较敏感 (CREATE TABLE MyTable → stored as MyTable, MyTable!=mytable)
      • Quoted: 保留原样,比较敏感 (CREATE TABLE `MyTable` → stored as MyTable, MyTable!=mytable)

      与 Legacy TSQLEnv 兼容性:

      • columnCollationCaseSensitive = false → ❌ INCOMPATIBLE (legacy folded to UPPER, new preserves case)
      • functionCollationCaseSensitive = false → ❌ INCOMPATIBLE (legacy folded to UPPER, new preserves case)
      • tableCollationCaseSensitive = false → ❌ INCOMPATIBLE (legacy folded to UPPER, new preserves case)
      • catalogCollationCaseSensitive = false → ❌ INCOMPATIBLE (legacy folded to UPPER, new preserves case)

      测试用例影响:

      • ✅ 新规则正确:ClickHouse 是完全大小写敏感的数据库
      • ⚠️ 如果旧测试期望 "MyTable" → "MYTABLE",则测试会失败
      • ⚠️ 应更新测试期望为保留原样 (如 "MyTable" 保持为 "MyTable")
      • ⚠️ 旧代码可能错误地匹配了不同大小写的标识符 (foo 匹配 FOO),新代码会正确拒绝

      IdentifierRules 配置:

    • forCouchbase

      public static IdentifierRules forCouchbase()
      Couchbase N1QL 标识符规则

      实际数据库行为(Couchbase N1QL):

      • Unquoted: 保留原样,比较敏感 (与 ClickHouse 相同)
      • Quoted: 保留原样,比较敏感

      与 Legacy TSQLEnv 兼容性:

      • tableCollationCaseSensitive = false → ❌ INCOMPATIBLE (legacy folded to UPPER, new preserves case)
      • 参见 forClickHouse() 的详细说明
    • forSQLServer

      public static IdentifierRules forSQLServer()
      SQL Server / Azure SQL 标识符规则

      实际数据库行为(SQL Server 2019+):

      • Unquoted: 保留原样,比较由 collation 决定 (CREATE TABLE MyTable → stored as MyTable)
      • Quoted: 保留原样,比较由 collation 决定 (CREATE TABLE [MyTable] → stored as MyTable)
      • 默认 collation (SQL_Latin1_General_CP1_CI_AS): 大小写不敏感 (MyTable=mytable=MYTABLE)
      • CS collation (SQL_Latin1_General_CP1_CS_AS): 大小写敏感 (MyTable!=mytable)

      与 Legacy TSQLEnv 兼容性:

      • columnCollationCaseSensitive = false → ❌ INCOMPATIBLE (legacy folded to UPPER, new preserves case)
      • functionCollationCaseSensitive = false → ❌ INCOMPATIBLE (legacy folded to UPPER, new preserves case)
      • tableCollationCaseSensitive = false → ❌ INCOMPATIBLE (legacy folded to UPPER, new preserves case)
      • catalogCollationCaseSensitive = false → ❌ INCOMPATIBLE (legacy folded to UPPER, new preserves case)

      测试用例影响:

      • ✅ 新规则正确:SQL Server 保留标识符原始大小写,使用 collation 进行比较
      • ⚠️ 如果旧测试期望 "MyTable" → "MYTABLE",则测试会失败
      • ⚠️ 应更新测试期望为保留原样 (如 "MyTable" 保持为 "MyTable")
      • ⚠️ 这是导致 dataflow 测试 processId 变化的根本原因!
      • 📝 参考: investigation_findings_2025_10_20.md

      IdentifierRules 配置:

      注意:SQL Server 的大小写行为完全由 collation 决定,无法简单折叠。

    • forMySQL

      public static IdentifierRules forMySQL(int lowerCaseTableNames)
      MySQL 标识符规则(table/schema names)

      实际数据库行为(MySQL 8.0+):

      根据 lower_case_table_names 系统变量决定:

      • 0 (Unix/Linux): 大小写敏感,保留原样 (CREATE TABLE MyTable → stored as MyTable, MyTable!=mytable)
      • 1 (Windows): 存储为小写,比较不敏感 (CREATE TABLE MyTable → stored as mytable, MyTable=mytable=MYTABLE)
      • 2 (macOS): 存储保留原样,比较不敏感 (CREATE TABLE MyTable → stored as MyTable, MyTable=mytable=MYTABLE)

      与 Legacy TSQLEnv 兼容性:

      • columnCollationCaseSensitive = false → ⚠️ PARTIAL (legacy folded to UPPER, new doesn't fold or folds to LOWER)
      • functionCollationCaseSensitive = false → ⚠️ PARTIAL (legacy folded to UPPER, new doesn't fold)
      • tableCollationCaseSensitive = false → ⚠️ PARTIAL (legacy folded to UPPER, new behavior depends on lower_case_table_names)
      • catalogCollationCaseSensitive = true → ⚠️ PARTIAL (legacy preserved case, new is insensitive)

      测试用例影响:

      • ✅ 新规则正确:MySQL 的行为确实依赖 lower_case_table_names 设置
      • ⚠️ 如果旧测试期望 "MyTable" → "MYTABLE",则在模式 1/2 下测试会失败
      • ⚠️ 模式 0: 应期望保留原样 (如 "MyTable" 保持为 "MyTable"),区分大小写
      • ⚠️ 模式 1: 应期望小写 (如 "MyTable" → "mytable"),不区分大小写
      • ⚠️ 模式 2: 应期望保留原样 (如 "MyTable" 保持为 "MyTable"),不区分大小写

      IdentifierRules 配置:

      Parameters:
      lowerCaseTableNames - lower_case_table_names 值(0, 1, 2)
    • forMySQLColumn

      MySQL 列名规则(始终大小写不敏感)

      实际数据库行为(MySQL 8.0+):

      • 列名始终大小写不敏感,不受 lower_case_table_names 影响
      • 存储时保留原样,比较时不敏感 (SELECT MyColumn → stored as MyColumn, MyColumn=mycolumn)

      与 Legacy TSQLEnv 兼容性:

      • columnCollationCaseSensitive = false → ⚠️ PARTIAL (legacy folded to UPPER, new preserves case but is INSENSITIVE)
    • forMySQLRoutine

      MySQL 函数名规则(始终大小写不敏感)

      实际数据库行为(MySQL 8.0+):

      • 函数名/存储过程名始终大小写不敏感
      • 存储时保留原样,比较时不敏感 (与列名相同)

      与 Legacy TSQLEnv 兼容性:

      • functionCollationCaseSensitive = false → ⚠️ PARTIAL (legacy folded to UPPER, new preserves case but is INSENSITIVE)
    • forBigQueryTable

      BigQuery 表名规则(大小写敏感)

      实际数据库行为(BigQuery Standard SQL):

      • Unquoted: 保留原样,比较敏感 (CREATE TABLE MyTable → stored as MyTable, MyTable!=mytable)
      • Quoted: 保留原样,比较敏感 (CREATE TABLE `MyTable` → stored as MyTable, MyTable!=mytable)
      • 表名/dataset名/project名都是大小写敏感的

      与 Legacy TSQLEnv 兼容性:

      • tableCollationCaseSensitive = false → ❌ INCOMPATIBLE (legacy folded to UPPER, new preserves case and is SENSITIVE)
      • catalogCollationCaseSensitive = false → ❌ INCOMPATIBLE (legacy folded to UPPER, new preserves case and is SENSITIVE)

      测试用例影响:

      • ✅ 新规则正确:BigQuery 表名确实是大小写敏感的
      • ⚠️ 如果旧测试期望 "MyTable" → "MYTABLE",则测试会失败
      • ⚠️ 应更新测试期望为保留原样 (如 "MyTable" 保持为 "MyTable")
      • ⚠️ 旧代码可能错误地匹配了不同大小写的表名,新代码会正确拒绝
    • forBigQueryColumn

      BigQuery 列名规则(大小写不敏感)

      实际数据库行为(BigQuery Standard SQL):

      • Unquoted: 保留原样,比较不敏感 (SELECT MyColumn → stored as MyColumn, MyColumn=mycolumn=MYCOLUMN)
      • Quoted: 保留原样,比较不敏感 (SELECT `MyColumn` → MyColumn=mycolumn)
      • 列名是大小写不敏感的(与表名不同)

      与 Legacy TSQLEnv 兼容性:

      • columnCollationCaseSensitive = false → ⚠️ PARTIAL (legacy folded to UPPER, new preserves case but is INSENSITIVE)

      测试用例影响:

      • ✅ 新规则正确:BigQuery 列名确实是大小写不敏感的
      • ⚠️ 如果旧测试期望 "MyColumn" → "MYCOLUMN",则测试会失败
      • ⚠️ 应更新测试期望为保留原样 (如 "MyColumn" 保持为 "MyColumn")
      • ✅ 但比较时应忽略大小写 (MyColumn = mycolumn = MYCOLUMN)
    • forDB2

      public static IdentifierRules forDB2()
      DB2 / Netezza / Exasol 标识符规则(与 Oracle 相同)

      实际数据库行为(DB2 11+):

      • Unquoted: 折叠为大写,比较不敏感 (与 Oracle 相同)
      • Quoted: 保留原样,比较敏感

      与 Legacy TSQLEnv 兼容性:

      • DB2 tableCollationCaseSensitive = true → ❌ INCOMPATIBLE (legacy preserved case, new folds to UPPER)
      • DB2 catalogCollationCaseSensitive = true → ❌ INCOMPATIBLE (legacy preserved case, new folds to UPPER)
      • 参见 forOracle() 的详细说明
    • forSnowflake

      public static IdentifierRules forSnowflake()
      Snowflake 标识符规则(与 Oracle 相同)

      实际数据库行为(Snowflake):

      • Unquoted: 折叠为大写,比较不敏感 (与 Oracle 相同)
      • Quoted: 保留原样,比较敏感

      与 Legacy TSQLEnv 兼容性:

      • tableCollationCaseSensitive = false → ⚠️ PARTIAL (both fold to UPPER, COMPATIBLE)
      • columnCollationCaseSensitive = false → ⚠️ PARTIAL (both fold to UPPER, COMPATIBLE)
      • 参见 forOracle() 的详细说明
    • forHANA

      public static IdentifierRules forHANA()
      SAP HANA 标识符规则(与 Oracle 相同)

      实际数据库行为(SAP HANA):

      • Unquoted: 折叠为大写,比较不敏感 (与 Oracle 相同)
      • Quoted: 保留原样,比较敏感

      与 Legacy TSQLEnv 兼容性:

      • tableCollationCaseSensitive = false → ⚠️ PARTIAL (both fold to UPPER, COMPATIBLE)
      • 参见 forOracle() 的详细说明
    • forPresto

      public static IdentifierRules forPresto()
      Presto / Trino 标识符规则

      实际数据库行为(Presto/Trino):

      • Unquoted: 折叠为小写,比较不敏感 (CREATE TABLE MyTable → stored as mytable)
      • Quoted: 保留原样,但与 unquoted 规则一致(比较时仍不敏感)

      与 Legacy TSQLEnv 兼容性:

      • tableCollationCaseSensitive = false → ⚠️ PARTIAL (legacy folded to UPPER, new folds to LOWER)
      • columnCollationCaseSensitive = false → ⚠️ PARTIAL (legacy folded to UPPER, new folds to LOWER)

      测试用例影响:

      • ✅ 新规则正确:Presto/Trino 折叠为小写,quoted 标识符与 unquoted 规则一致
      • ⚠️ 如果旧测试期望 "MyTable" → "MYTABLE",则测试会失败
      • ⚠️ 应更新测试期望为小写 (如 "MyTable" → "mytable")

      IdentifierRules 配置:

    • forVertica

      public static IdentifierRules forVertica()
      Vertica 标识符规则(与 Presto 相同)

      实际数据库行为(Vertica):

      • Unquoted: 折叠为小写,比较不敏感 (与 Presto 相同)
      • Quoted: 保留原样,但与 unquoted 规则一致

      与 Legacy TSQLEnv 兼容性:

      • tableCollationCaseSensitive = false → ⚠️ PARTIAL (legacy folded to UPPER, new folds to LOWER)
      • 参见 forPresto() 的详细说明
    • forHive

      public static IdentifierRules forHive()
      Hive / SparkSQL / Impala 标识符规则(与 PostgreSQL 相同)

      实际数据库行为(Hive 3+, SparkSQL 3+):

      • Unquoted: 折叠为小写,比较不敏感 (与 PostgreSQL 相同)
      • Quoted: 保留原样,比较敏感

      与 Legacy TSQLEnv 兼容性:

      • tableCollationCaseSensitive = false → ⚠️ PARTIAL (legacy folded to UPPER, new folds to LOWER)
      • 参见 forPostgreSQL() 的详细说明
    • forTeradata

      public static IdentifierRules forTeradata()
      Teradata 标识符规则(与 PostgreSQL 相同)

      实际数据库行为(Teradata 16+):

      • Unquoted: 折叠为小写,比较不敏感 (与 PostgreSQL 相同)
      • Quoted: 保留原样,比较敏感

      与 Legacy TSQLEnv 兼容性:

      • tableCollationCaseSensitive = false → ⚠️ PARTIAL (legacy folded to UPPER, new folds to LOWER)
      • 参见 forPostgreSQL() 的详细说明
    • forAthena

      public static IdentifierRules forAthena()
      Athena 标识符规则(与 Presto 相同)

      实际数据库行为(AWS Athena):

      • Unquoted: 折叠为小写,比较不敏感 (与 Presto 相同,基于 Trino/Presto)
      • Quoted: 保留原样,但与 unquoted 规则一致

      与 Legacy TSQLEnv 兼容性:

      • tableCollationCaseSensitive = false → ⚠️ PARTIAL (legacy folded to UPPER, new folds to LOWER)
      • 参见 forPresto() 的详细说明
    • forGaussDB

      public static IdentifierRules forGaussDB()
      GaussDB 标识符规则(与 PostgreSQL 相同)

      实际数据库行为(华为 GaussDB):

      • Unquoted: 折叠为小写,比较不敏感 (与 PostgreSQL 相同,基于 PostgreSQL)
      • Quoted: 保留原样,比较敏感

      与 Legacy TSQLEnv 兼容性:

      • tableCollationCaseSensitive = false → ⚠️ PARTIAL (legacy folded to UPPER, new folds to LOWER)
      • 参见 forPostgreSQL() 的详细说明
    • forDatabricks

      public static IdentifierRules forDatabricks()
      Databricks 标识符规则(与 Hive 相同)

      实际数据库行为(Databricks SQL):

      • Unquoted: 折叠为小写,比较不敏感 (与 Hive/SparkSQL 相同)
      • Quoted: 保留原样,比较敏感

      与 Legacy TSQLEnv 兼容性:

      • tableCollationCaseSensitive = false → ⚠️ PARTIAL (legacy folded to UPPER, new folds to LOWER)
      • 参见 forHive()forPostgreSQL() 的详细说明
    • forGeneric

      public static IdentifierRules forGeneric()
      通用规则(默认:与 PostgreSQL 相同)

      说明:

      • 当数据库类型未知或不在支持列表时使用此规则
      • 默认采用 PostgreSQL 的行为(折叠为小写,比较不敏感)

      与 Legacy TSQLEnv 兼容性:

      • defaultCollationCaseSensitive = false → ⚠️ PARTIAL (legacy folded to UPPER, new folds to LOWER)
      • 参见 forPostgreSQL() 的详细说明
    • toString

      public String toString()
      Overrides:
      toString in class Object