Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example using SQLite via SQLx for secondary indexes #1

Merged
merged 21 commits into from
Jun 18, 2024
Merged

Example using SQLite via SQLx for secondary indexes #1

merged 21 commits into from
Jun 18, 2024

Conversation

adriangb
Copy link
Collaborator

@adriangb adriangb commented Jun 7, 2024

No description provided.

sqlx-sqlite/Cargo.toml Outdated Show resolved Hide resolved

SQLite is used as a stand-in for an external remote relational database, it should be easy to adapt this example to use another database.

This examples should be considered incomplete: it does not try to handle **many** edge cases or push down filters as much as possible.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, I am quite concerned that the filter pushdown might be incorrect in some ways, I don't intend for people to copy this or use it directly. Happy to add some tests and increase confidence then tone down this warning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll point out any obvious errors I see

sqlx-sqlite/src/main.rs Show resolved Hide resolved
FileStatistics::RowGroupCount,
])
.column(ColumnStatistics::RowGroup)
.distinct() // could be distinct_on(vec![ColumnStatistics::FileId, ColumnStatistics::RowGroup]) if the backing store supports it
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a couple thing along these lines that are just limitations of SQLite

sqlx-sqlite/src/index.rs Outdated Show resolved Hide resolved
@adriangb
Copy link
Collaborator Author

adriangb commented Jun 7, 2024

cc @alamb

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just an empty sqlite file

@adriangb adriangb requested a review from alamb June 7, 2024 19:59
@adriangb adriangb self-assigned this Jun 7, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @adriangb -- this looks very cool. Thank you for sharing

sqlx-sqlite/Cargo.toml Outdated Show resolved Hide resolved

SQLite is used as a stand-in for an external remote relational database, it should be easy to adapt this example to use another database.

This examples should be considered incomplete: it does not try to handle **many** edge cases or push down filters as much as possible.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll point out any obvious errors I see


/// Push down a simple binary expression to the index
/// Only a subset of expressions are supported since `a = 1` has to be rewritten as `a_int_max_value >= 1 AND a_int_min_value <= 1`
fn push_down_binary_filter(value: &ScalarValue, op: &Operator) -> Option<SimpleExpr> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be clearer to call this "rewrite_filter" or "rewrite_binary_expr" as it isn't really "pushing down" the filter in my mind -- it is creating SimpelExpr that wll be "pushed down" (into sqlite)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW this rewrite looks like a combination

  1. The expr rewrite that PruningPredicate does internally
  2. A translation from the DataFusion Exprs to the sea_query

If you separated the two passes the logic might be clearer and maybe you could reuse some existing code:

Some thoughts:

  1. Consider using the rewritten PruningPredicate expr directly (and there has been a lot of effort in testing that rewrite for correctness)
  2. Use the Expr --> SQL code in expr_to_sql instead of sea_query (as the code already exists

Copy link
Collaborator Author

@adriangb adriangb Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, I was wondering if there's existing tested code to do these conversions. I see that I can build a PruningPredicate and call PruningPredicate::predicate_expr to get the rewritten expr so I can then run that against SQLite and not have to materialize the statistics in memory (which is what PruningPredicate seems to be used for generally).

However the way that rewriting happens it expects the schema of the statistics to match the schema of the table: value needs columns value_min and value_max of the correct type.
That makes sense when your statistics are stored with the actual data or in memory but breaks down if you use a traditional RDMS and want to support more than 1 table schema because you don't want to make a super wide table and add dynamically generated columns.

The schema I'm using here gets around that by storing a sparse table (1 column for each of max/min of each type) of values with (file_id, row_group, column_name) as the primary key.

What do you think generally of the statistics schema being used here, any idea how we can generalize what PruningPredicate does without requiring the schema of the statistics table to depend on the schema of the data?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking a bit further into what PruningPredicate is doing it seems to me like the LiteralGuarantees part is not necessary if I'm pushing work to another database. Let me know if that sounds right or not.

Copy link
Collaborator Author

@adriangb adriangb Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted to keep the current schema and re-use existing logic I think we'd need to make RequiredColumns::stat_column_expr pluggable.

I'll also note that I use the type of the value to determine which column to query (since they're named as int_max_value and such), which seems wrong as soon as someone does date = '2020-01-01'. I think it would be better to use the approach that I think the existing code is using of getting the type from the schema.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here you mean col can never be x and y at the same time 👀

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see -- yes, if you want to push the evaluation down into the lower level database without a column per statistic (which could indeed result in a large number of columns) it would be hard to reuse the exisitng pruning predicate logic

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may still be able to make it work by doing a join per column. Which I don't love, but should work and avoids any dynamic schemas in the index database. I'll give that a try and loop back.

Copy link
Collaborator Author

@adriangb adriangb Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to rework this to be simple and hardcoded with the index's schema having a min/max column per column in the actual schema. It was getting out of hand to try to make this fancier for an example.

@alamb is there a way to go from a PhysicalExpr to an Expr if I want to use expr_to_sql? It's another thing where it would be simpler for the example but not universal (I assume it generates datafusion SQL, there's no guarantee that's valid for the lower level index database).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb I've reworked this to use PruningPredicate to do the rewriting. The statistics table now has largely the same schema as what PruningPredicate expects.

I did not use expr_to_sql because that generates DataFusion flavored SQL which would not be compatible with many index databases. Instead I opted to implement a function to convert from PhysicalExpr -> sea_query::SimpleExpr. I think this is valuable to have as an example for folks, it covers a wide range of index databases (since SeaQuery can generate SQL for SQLite, Postgres, MySQL, or be extended) and is not that much code (at least for the cases I chose to handle).

// TODO: we could aggregate the row groups into an array in the query to transmit less data over the wire
// (and maybe avoid the join), leaving that as a TODO since it introduces more complexity and coupling to the index's backing store
// Result is in the form of (file_name, file_size, row_group_count, row_group_to_scan)
let row_groups: Vec<(String, i64, i64, i64)> = sqlx::query_as_with(&sql, values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW you could probably use sqlite to do the aggregations

Like add a SELECT DISTINCT ... to the query so you wouldn't have to handle duplicates in the Rust code (handling duplicates is fine, I am just pointing it out

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a distinct in there (currently line 91).
I do think these queries could be a bit fancier, e.g. I'd like it to just return file_name string, file_size int, row_groups int[] but I'm not sure that's possible with sqlite, I'm most familiar with postgres. But I do plan on giving it another shot before merging this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool -- sorry I didn't see that. Nice@

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly don't want to commit the DB to the repo?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually do. Running this is idempotent and it's one setup command less. The empty db I'd also 24kB

@adriangb adriangb changed the title Example using SQLx + SQLite Example using SQLx + SQLite for secondary indexes Jun 11, 2024
@adriangb adriangb changed the title Example using SQLx + SQLite for secondary indexes Example using SQLite via SQLx for secondary indexes Jun 11, 2024
@samuelcolvin
Copy link

Looks great in general, I'll read this more later.

I do wonder if we could do something similar using a bloom filter stored in the external index - either using the bloom crate, then serialize that and store the data in sqlite (that's what I prototyped in logfire), or better using a bloom filter that's natively part of the database.

A bloom filter will only make sense on some columns, and even then on relatively large row groups or entire files, but still it could be very effective for stuff like trace_id where min/max won't help and there are not that many distinct values.

@adriangb
Copy link
Collaborator Author

adriangb commented Jun 11, 2024

I do wonder if we could do something similar using a bloom filter stored in the external index

Yes, absolutely! I think that's beyond the scope of this example though.

@samuelcolvin
Copy link

yes definitely.

@alamb
Copy link
Contributor

alamb commented Jun 11, 2024

I do wonder if we could do something similar using a bloom filter stored in the external index - either using the bloom crate, then serialize that and store the data in sqlite (that's what I prototyped in logfire), or better using a bloom filter that's natively part of the database.

BTW bloom filters (and similar structures) are the usecase for PruningStatistics::contains

Specifically that will tell you the constant values to check for in your bloom filter. We use this code in the parquet reader to evaluate parquet bloom filters

@adriangb
Copy link
Collaborator Author

BTW bloom filters (and similar structures) are the usecase for PruningStatistics::contains

It'd be really cool to have an example (again, I think not this one) where PruningStatistics::contains gets used to get the constant values to check and those constant values are checked against a secondary index instead of bloom filters in the parquet file (either using the indexes native filtering or pulling the bloom filter as a blob from the index and filtering against it in memory.

sqlx-sqlite/src/index.rs Outdated Show resolved Hide resolved
sqlx-sqlite/src/index.rs Outdated Show resolved Hide resolved
sqlx-sqlite/src/index.rs Outdated Show resolved Hide resolved
sqlx-sqlite/src/index.rs Outdated Show resolved Hide resolved
let null_counts = null_counts.as_primitive::<UInt64Type>();

for row_group in 0..metadata.num_row_groups() {
match field.data_type() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably get some re-use here with two generic functions (one for IntXType and one for strings).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I nerd-sniped myself into trying this; davidhewitt@9fa9e07 - feel free to pull the commit if you want it :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up going with e88641b which also means we do the downcasts once instead of for each value in the stats arrays. Let me know if that sounds good to you.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah definitely better!

sqlx-sqlite/src/index.rs Outdated Show resolved Hide resolved
let values = match array.data_type() {
DataType::Int8 => {
let array = array.as_primitive::<datatypes::Int8Type>();
array.iter().map(|v| {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the map closure here be simplified to:

array.iter().map(Value::TinyInt).collect()

and similar for the other data types?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at least those that don’t require a box

@adriangb
Copy link
Collaborator Author

I think this is in a good enough state to merge and be iterated upon in the future.

@adriangb adriangb merged commit ae4edd4 into main Jun 18, 2024
@adriangb adriangb deleted the sqlx branch June 18, 2024 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants