Scribe is a compiler framework for the PartiQL SQL dialect. It is considered experimental and is under active development.
Scribe will depend on a PartiQL major-version, but for now we shadow partiql-lang-kotlin to a single dependency.
# From partiql-lang-kotlin produce a JAR
./gradlew :partiql-lang:shadowJar
cp ./partiql-lang/build/libs/partiql-lang-kotlin-version.jar ./path/to/Scribe/libs
# Also copy the local plugin (we should consider bundling with SPI)
./gradlew :plugins:partiql-local:installDist
cp ./plugins/partiql-local/build/libs/partiql-local-version.jar ./path/to/Scribe/libs
-
SQL — Specifically the SQL-99 Data Query Language specification, colloquially select-from-where
-
Dialect — An implementation of the SQL language specification
-
Target — A representation of some computation
-
Catalog — Schemas, tables, types, functions, and operators available in a target
Scribe leverages PartiQL’s plugin system and planner to produce a resolved and typed logical query plan. This plan is passed to a target implementation to be transformed to the domain specific output.
Note
|
Much of the transpiler involves manipulating both the AST and Plan which are PartiQL intermediate representations. This PartiQL AST README has tips on working with these structures. |
// PartiQL's plugin system is how you provide tables and schemas to the planner (à la Trino).
val plugin = MyPlugin()
// Instantiate the transpiler once. It can be re-used!
val transpiler = PartiQLTranspiler(listOf(plugin))
Suppose you have some table
CREATE TABLE orders (
order_id STRING PRIMARY KEY, -- PartiQL STRING type
ordered_at DATETIME NOT NULL -- PartiQL DATETIME type
);
How do you express this query for a different SQL engine like Trino?
// Get all orders in the last 30 days
val query = """
SELECT order_id FROM orders
WHERE ordered_at > date_diff(day, -30, UTCNOW())
"""
// TrinoTarget holds the translation rules from a PartiQL plan to Trino SQL
val target = TrinoTarget()
// Planner session, assuming your table `orders` exists in the "default" catalog
val session = PartiQLPlanner.Session(
queryId = "readme_query_id",
userId = "readme_user_id",
currentCatalog = "default",
)
// Invoke the transpiler
val result = transpiler.transpile(query, target, session)
println(result.value)
// Output:
// SELECT orders.order_id AS order_id FROM orders AS orders
// WHERE orders.ordered_at > date_add('day', -30, at_timezone(current_timestamp, 'UTC'))
There’s a shell which can be useful for rapid experimentation. You can also attach a debugger to this shell.
Note
|
You may load the TPC-DS schemas in catalogs/ or provide your own catalog.
|
# Install
./gradlew install
# Execute
./build/install/scribe/bin/scribe --help
Usage: scribe [-hV] [--catalog=catalog] -d=DIR
The PartiQL Transpiler Debug REPL
This REPL is used for debugging the transpiler
--catalog=catalog Catalog, use `default` .. by default
Default: default
-d=DIR Database root directory
-h, --help Show this help message and exit.
-V, --version Print version information and exit.
./build/install/scribe/bin/scribe -d ./catalogs --catalog=tpc_ds
____ ____ ____ ____ ____ ____
||S |||C |||R |||I |||B |||E ||
||__|||__|||__|||__|||__|||__||
|/__\|/__\|/__\|/__\|/__\|/__\|
scribe ▶ .h
.h Print this message
.s Print command history
.q Disconnect
.d Describe catalog
.dt table Describe table
.debug on|off Toggle debug printing
.t target Get/Set the transpiler target
.clear Clear screen
Note
|
You can attach a debugger with JAVA_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=localhost:5050 …
|
Scribe is a framework to plug different compilation backends. Perhaps this project should be renamed to BYOB (bring your own backend). For now, we only provide SQL source-to-source compilation (hence "transpile"), but you could conceive of several non-SQL targets such as:
-
Substrait
-
Apache Beam Transform
For now, Scribe provides two simple SQL text targets.
Each dialect is quite similar (hence dialect) so much of the base translation from PartiQL’s logical plan to an SQL AST is captured by org.partiql.transpiler.sql.SqlTransform
.
This applies a transformation of relational algebra to an SQL AST just like Calcite’s RelToSqlConverter; however, this is currently more limited than Calcite’s.
Much of the differences between dialects comes down to scalar functions, but it’s often the case that each dialect has functions with similar functionality albeit different names.
This is shown in the earlier UTCNOW()
example.
Let’s work through an example of developing our own SQL target using SQLite as the target. How might we transpile?
SELECT CAST(a AS STRING) FROM T
With basic familiarity of SQLite, we know that STRING
is not a valid type name, and we should replace it with TEXT
.
How do we express this in a transpilation target?
public object SQLiteTarget : SqlTarget() {
override val target: String = "SQLite"
// Using SQLite3
override val version: String = "3"
// Override the default call ruleset with the SQLiteCalls ruleset
override fun getCalls(onProblem: Problemhandler): SqlCalls = SQLiteCalls()
// No need to rewrite the plan, return as is
override fun rewrite(plan: PartiQLPlan, onProblem: ProblemCallback) = plan
}
Note
|
I’m conflicted on how to pass the problem handler to SqlCalls, so that’s subject to change. |
@OptIn(PartiQLValueExperimental::class)
public class SQLiteCalls : SqlCalls() {
/**
* SqlCalls has many open functions which you can extend to override for edge cases.
*/
override fun rewriteCast(type: PartiQLValueType, args: SqlArgs): Expr = Ast.create {
if (type == PartiQLValueType.STRING) {
// do something special for `CAST(.. AS STRING)`
Ast.create { exprCast(args[0].expr, typeCustom("TEXT")) }
} else {
return super.rewriteCast(type, args)
}
}
}
This is reasonable, but what about replacing all occurrences of STRING with TEXT?
It would be a cumbersome to track down all the places a type might be used (like this IS
special form is another).
We can actually also extend how SQL is rendered to text via an extendable query printing framework. See Pull #1183. You can provide the pretty-printer a Dialect which contains base behavior for translating from an AST to a Block tree where the Block tree is a basic formatting structure.
Let’s implement SQLiteDialect
and wire it to our SQLiteTarget
.
public object SQLiteDialect : SqlDialect() {
override fun visitTypeString(node: Type.String, head: SqlBlock) =
SqlBlock.Link(head, SqlBlock.Text("TEXT"))
}
All this says is during the fold from an AST to Block tree, is to append the string "TEXT" to the tree.
We can use this dialect for our target by overriding the dialect
field.
public object SQLiteTarget : SqlTarget() {
// ... same as before
// hook up the pretty-printer rules
override val dialect = SQLiteDialect
}
PartiQL Scribe has a simple testing framework whereby each target asserts its desired output against a shared set of
input queries (defined in test/resources/inputs/
).
If you wish to add a new test; please add in one of the .sql files of test/resources/inputs/
with a unique name.
All tests within a directory are flattened; you may define multiple tests in one file.
-- Tests are named with the macro `--#[my-test-name]`
--#[readme-example-00]
SELECT header FROM readme;
-- be sure to terminate a statement with `;`
--#[readme-example-01]
SELECT x, y, z FROM T
WHERE x BETWEEN y AND z;
Similar to inputs, you’ll see that expected test outputs are stored in test/resources/outputs
. The default test suite
will produce a junit test for each expected output. You may implement additional junit tests for negative testing.
Please see test / org.partiql.scribe.targets.PartiQLTargetSuite
as an example.
Testing schemas are described using a modified version of the Avro JSON schema. The changes are (1) it’s Ion and (2) we use the PartiQL type names.
// type name atomic types
"int"
// type list for union types
[ "int", "null" ]
// Collection Type
{
type: "bag", // valid values "bag", "list", "sexp"
items: type
}
// Struct Type
{
type: "struct",
fields: [
{
name: "foo",
type: type
},
// ....
]
}
Important
|
For now, we omit constraints such as open/closed structs. |
The PartiQL FS Plugin builds a catalog from a directory tree. It is implemented here https://github.com/partiql/partiql-lang-kotlin/tree/transpile/partiql-planner/src/testFixtures/kotlin/org/partiql/planner/test/plugin.
Note
|
Directories are nested schemas; files represent table schema where the table name is the file name (without .ion). |
It will eventually be merged with mockdb. The only difference is that is memoizes parsed schemas and uses PartiQL Value Schema syntax.