Skip to content

partiql/partiql-scribe

PartiQL Transpiler

Scribe is a compiler framework for the PartiQL SQL dialect. It is considered experimental and is under active development.

Local Build

This project uses a git-submodule to pull in partiql-lang-kotlin. The easiest way to pull everything in is to clone the repository recursively:

git clone --recursive https://github.com/partiql/partiql-scribe.git

Terms

  • SQL — Specifically the SQL-99 Data Query Language specification, colloquially select-from-where

  • Dialect — An implementation of the SQL language specification

  • Target — A representation of some computation

  • Catalog — Schemas, tables, types, functions, and operators available in a target

Usage

Scribe leverages partiql-lang-kotlin’s SPI and planner to produce a resolved and typed logical query plan. This plan is passed to a target implementation to be transformed to the domain specific output. See partiql-lang-kotlin v1 docs on how to use the new SPI and planner interfaces.

Note
Much of the transpiler involves manipulating both the AST and Plan which are PartiQL intermediate representations. This PartiQL AST has tips on working with these structures.
Creating the Transpiler
// Scribe provides an interface to handle additional configuration such as error handling
val scribeContext = MyScribeContext()

// Instantiate the transpiler once. It can be re-used!
val transpiler = Scribe(scribeContext = scribeContext)

Suppose you have some table

CREATE TABLE orders (
    order_id   STRING PRIMARY KEY, -- PartiQL STRING type
    ordered_at TIMESTAMP NOT NULL   -- PartiQL TIMESTAMP type
);

How do you express this query for a different SQL engine like Trino?

// Get all orders in the last 30 days
val query = """
    SELECT order_id FROM orders
    WHERE ordered_at > date_diff(day, -30, UTCNOW())
"""

// Initialize error handling listeners for parsing and planning
val problemCollector = PErrorCollector()
val partiqlContext = Context.of(problemCollector)

// PartiQL TEXT -> AST
val parserResult = parser.parse(statement, partiqlContext)
val parsedStatements = parserResult.statements

if (parsedStatements.size != 1) {
    scribeContext.getProblemListener().report(
        ScribeProblem.simpleError(
            code = ScribeProblem.UNSUPPORTED_OPERATION,
            "Encountered error(s) during parsing: ${problemCollector.errors}",
        ),
    )
}
val ast = parsedStatements[0]

// PartiQL's SPI `Session` and `Catalog` is how you provide tables, schemas, and functions to the planner (à la Trino).
val session = MyPartiQLSession()

// AST -> PLAN
val plannerResult = planner.plan(ast, session, partiqlContext)
val plan = plannerResult.plan

// TrinoTarget holds the translation rules from a PartiQL plan to Trino SQL
val target = TrinoTarget() // Extends `SqlTarget`

// Invoke the transpiler
val result = transpiler.compile(plan, target, session)

println(result.output.value)
// Output:
//   SELECT orders.order_id AS order_id FROM orders AS orders
//   WHERE orders.ordered_at > date_add('day', -30, at_timezone(current_timestamp, 'UTC'))

Overview

Scribe is a framework to plug different compilation backends. Perhaps this project should be renamed to BYOB (bring your own backend). For now, we only provide SQL source-to-source compilation (hence "transpile"), but you could conceive of several non-SQL targets such as:

Producing SQL

For now, Scribe provides four simple SQL text targets - PartiQL, Redshift, SparkSQL, and Trino. Each dialect is quite similar (hence dialect) so much of the base translation from PartiQL’s logical plan to an SQL AST is captured by PlanToAst.

This applies a transformation of relational algebra to an SQL AST just like Calcite’s RelToSqlConverter; however, this is currently more limited than Calcite’s.

Much of the differences between dialects comes down to scalar functions, but it’s often the case that each dialect has functions with similar functionality albeit different names. This is shown in the earlier UTCNOW() example.

Common Interfaces

The most useful interfaces to implement for an SQL target are

  • ScribeTarget<T> — Base transpiler target interface

  • SqlTarget — Base ScribeTarget<String> implementation for an SQL dialect target

  • SqlCalls — Ruleset for rewriting scalar function calls and operators

  • PlanToAst — Ruleset for plan to AST conversion

  • RelConverter — Ruleset for Rel plan to AST ExprQuerySet conversion

  • RexConverter — Ruleset for Rex plan to AST Expr conversion

  • AstToSql — Ruleset for AST to SQL conversion

Development

Let’s work through an example of developing our own SQL target using SQLite as the target. How might we transpile?

SELECT CAST(a AS STRING) FROM T

With basic familiarity of SQLite, we know that STRING is not a valid type name, and we should replace it with TEXT. How do we express this in a transpilation target?

Tutorial

Extend SqlTarget
public object SQLiteTarget : SqlTarget() {

    override val target: String = "SQLite"

    // Using SQLite3
    override val version: String = "3"

    // Override the default AstToSql with the SQLiteAstToSql ruleset
    override fun getAstToSql(context: ScribeContext): AstToSql = SQLiteAstToSql(context)

    // No need to rewrite the plan for this example, return as is
    override fun rewrite(plan: Plan, context: ScribeContext) = plan
}
Defining a Dialect
public open class SQLiteAstToSql(context: ScribeContext) : AstToSql(context) {
    /**
    * AstToSql has many open functions which you can extend to override for edge cases.
    */
    override fun visitDataType(node: DataType, tail: SqlBlock): SqlBlock {
        return when (node.code()) {
            DataType.STRING -> tail concat "TEXT" // override "STRING" printing to "TEXT"
            else -> super.visitDataType(node, tail) // use default behavior for other type conversions
        }
    }
}

This will overwrite all occurrences of STRING with TEXT in CAST expressions, which has the added benefit of performing this overwrite for other expressions that may use the DataType AST class such as the IS <type> operator.

The AstToSql interface is extensible to allow for additional AST to text rewrites of any AST node.

Testing

PartiQL Scribe has a simple testing framework whereby each target asserts its desired output against a shared set of input queries (defined in test/resources/inputs/).

If you wish to add a new test; please add in one of the .sql files of test/resources/inputs/ with a unique name.

Inputs

All tests within a directory are flattened; you may define multiple tests in one file.

-- Tests are named with the macro `--#[my-test-name]`

--#[readme-example-00]
SELECT header FROM readme;

-- be sure to terminate a statement with `;`

--#[readme-example-01]
SELECT x, y, z FROM T
WHERE x BETWEEN y AND z;

Outputs

Similar to inputs, you’ll see that expected test outputs are stored in test/resources/outputs. The default test suite will produce a junit test for each expected output. You may implement additional junit tests for negative testing.

Please see PartiQLTargetSuite as an example.

Appendix

I. PartiQL Value Schema Language

Testing schemas are described using a modified version of the Avro JSON schema. The changes are (1) it’s Ion and (2) we use the PartiQL type names.

Basic Type Schema Examples
// type name atomic types
"int"

// type list for union types
[ "int", "null" ]

// Collection Type
{
  type: "bag",  // valid values "bag", "list", "sexp"
  items: type
}

// Struct Type
{
  type: "struct",
  fields: [
    {
      name: "foo",
      type: type
    },
    // ....
  ]
}

These schemas will be converted to corresponding `PType`s during the catalog construction.

II. PartiQL Session and Catalog Loading

The PartiQL SessionProvider builds a catalog from an in-memory directory tree. It is implemented here.

Note
Directories are nested schemas; files represent table schema where the table name is the file name (without .ion).

About

PartiQL language transpiler for Spark, Trino, and Redshift

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages