Scribe is a compiler framework for the PartiQL SQL dialect. It is considered experimental and is under active development.
This project uses a git-submodule to pull in partiql-lang-kotlin. The easiest way to pull everything in is to clone the repository recursively:
git clone --recursive https://github.com/partiql/partiql-scribe.git
-
SQL — Specifically the SQL-99 Data Query Language specification, colloquially select-from-where
-
Dialect — An implementation of the SQL language specification
-
Target — A representation of some computation
-
Catalog — Schemas, tables, types, functions, and operators available in a target
Scribe leverages partiql-lang-kotlin’s SPI and planner to produce a resolved and typed logical query plan. This plan is passed to a target implementation to be transformed to the domain specific output. See partiql-lang-kotlin v1 docs on how to use the new SPI and planner interfaces.
Note
|
Much of the transpiler involves manipulating both the AST and Plan which are PartiQL intermediate representations. This PartiQL AST has tips on working with these structures. |
// Scribe provides an interface to handle additional configuration such as error handling
val scribeContext = MyScribeContext()
// Instantiate the transpiler once. It can be re-used!
val transpiler = Scribe(scribeContext = scribeContext)
Suppose you have some table
CREATE TABLE orders (
order_id STRING PRIMARY KEY, -- PartiQL STRING type
ordered_at TIMESTAMP NOT NULL -- PartiQL TIMESTAMP type
);
How do you express this query for a different SQL engine like Trino?
// Get all orders in the last 30 days
val query = """
SELECT order_id FROM orders
WHERE ordered_at > date_diff(day, -30, UTCNOW())
"""
// Initialize error handling listeners for parsing and planning
val problemCollector = PErrorCollector()
val partiqlContext = Context.of(problemCollector)
// PartiQL TEXT -> AST
val parserResult = parser.parse(statement, partiqlContext)
val parsedStatements = parserResult.statements
if (parsedStatements.size != 1) {
scribeContext.getProblemListener().report(
ScribeProblem.simpleError(
code = ScribeProblem.UNSUPPORTED_OPERATION,
"Encountered error(s) during parsing: ${problemCollector.errors}",
),
)
}
val ast = parsedStatements[0]
// PartiQL's SPI `Session` and `Catalog` is how you provide tables, schemas, and functions to the planner (à la Trino).
val session = MyPartiQLSession()
// AST -> PLAN
val plannerResult = planner.plan(ast, session, partiqlContext)
val plan = plannerResult.plan
// TrinoTarget holds the translation rules from a PartiQL plan to Trino SQL
val target = TrinoTarget() // Extends `SqlTarget`
// Invoke the transpiler
val result = transpiler.compile(plan, target, session)
println(result.output.value)
// Output:
// SELECT orders.order_id AS order_id FROM orders AS orders
// WHERE orders.ordered_at > date_add('day', -30, at_timezone(current_timestamp, 'UTC'))
Scribe is a framework to plug different compilation backends. Perhaps this project should be renamed to BYOB (bring your own backend). For now, we only provide SQL source-to-source compilation (hence "transpile"), but you could conceive of several non-SQL targets such as:
-
Apache Beam Transform
For now, Scribe provides four simple SQL text targets - PartiQL, Redshift, SparkSQL, and Trino. Each dialect is quite similar (hence dialect) so much of the base translation from PartiQL’s logical plan to an SQL AST is captured by PlanToAst.
This applies a transformation of relational algebra to an SQL AST just like Calcite’s RelToSqlConverter; however, this is currently more limited than Calcite’s.
Much of the differences between dialects comes down to scalar functions, but it’s often the case that each dialect has
functions with similar functionality albeit different names. This is shown in the earlier UTCNOW()
example.
The most useful interfaces to implement for an SQL target are
-
ScribeTarget<T>
— Base transpiler target interface -
SqlTarget
— BaseScribeTarget<String>
implementation for an SQL dialect target -
SqlCalls
— Ruleset for rewriting scalar function calls and operators -
PlanToAst
— Ruleset for plan to AST conversion -
RelConverter
— Ruleset forRel
plan to ASTExprQuerySet
conversion -
RexConverter
— Ruleset forRex
plan to ASTExpr
conversion -
AstToSql
— Ruleset for AST to SQL conversion
Let’s work through an example of developing our own SQL target using SQLite as the target. How might we transpile?
SELECT CAST(a AS STRING) FROM T
With basic familiarity of SQLite, we know that STRING
is not a valid type name, and we should replace it with TEXT
.
How do we express this in a transpilation target?
public object SQLiteTarget : SqlTarget() {
override val target: String = "SQLite"
// Using SQLite3
override val version: String = "3"
// Override the default AstToSql with the SQLiteAstToSql ruleset
override fun getAstToSql(context: ScribeContext): AstToSql = SQLiteAstToSql(context)
// No need to rewrite the plan for this example, return as is
override fun rewrite(plan: Plan, context: ScribeContext) = plan
}
public open class SQLiteAstToSql(context: ScribeContext) : AstToSql(context) {
/**
* AstToSql has many open functions which you can extend to override for edge cases.
*/
override fun visitDataType(node: DataType, tail: SqlBlock): SqlBlock {
return when (node.code()) {
DataType.STRING -> tail concat "TEXT" // override "STRING" printing to "TEXT"
else -> super.visitDataType(node, tail) // use default behavior for other type conversions
}
}
}
This will overwrite all occurrences of STRING with TEXT in CAST
expressions, which has the added benefit of
performing this overwrite for other expressions that may use the DataType
AST class such as the IS <type>
operator.
The AstToSql
interface is extensible to allow for additional AST to text rewrites of any AST node.
PartiQL Scribe has a simple testing framework whereby each target asserts its desired output against a shared set of
input queries (defined in test/resources/inputs/
).
If you wish to add a new test; please add in one of the .sql files of test/resources/inputs/
with a unique name.
All tests within a directory are flattened; you may define multiple tests in one file.
-- Tests are named with the macro `--#[my-test-name]`
--#[readme-example-00]
SELECT header FROM readme;
-- be sure to terminate a statement with `;`
--#[readme-example-01]
SELECT x, y, z FROM T
WHERE x BETWEEN y AND z;
Similar to inputs, you’ll see that expected test outputs are stored in test/resources/outputs
. The default test suite
will produce a junit test for each expected output. You may implement additional junit tests for negative testing.
Please see PartiQLTargetSuite as an example.
Testing schemas are described using a modified version of the Avro JSON schema. The changes are (1) it’s Ion and (2) we use the PartiQL type names.
// type name atomic types
"int"
// type list for union types
[ "int", "null" ]
// Collection Type
{
type: "bag", // valid values "bag", "list", "sexp"
items: type
}
// Struct Type
{
type: "struct",
fields: [
{
name: "foo",
type: type
},
// ....
]
}
These schemas will be converted to corresponding `PType`s during the catalog construction.
The PartiQL SessionProvider
builds a catalog from an in-memory directory tree. It is implemented here.
Note
|
Directories are nested schemas; files represent table schema where the table name is the file name (without .ion). |