Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLite: Validate expected indexes when attaching local datasets #88

Merged
merged 3 commits into from
Sep 7, 2024

Conversation

sgrebnov
Copy link
Collaborator

@sgrebnov sgrebnov commented Sep 6, 2024

PRs adds verification of expected indexes when attaching local datasets to SQLite. This change ensures that the schema of the local copy of a table matches the expected configuration.

Note: created composite indexes can have complex names, produced index name information is easy to troubleshoot IMO so converting index name back to original configuration for tracing has not been added (NOTE: it is also not always possible or trivial, for example I have the following index i_taxi_trips_fare_amount_passenger_count_total_amount_tpep_dropoff_datetime_tpep_pickup_datetime_trip_distance) generated from test configuration '(tpep_pickup_datetime, tpep_dropoff_datetime, trip_distance, fare_amount, passenger_count, total_amount)': unique

Sample output:

2024-09-06T21:46:12.752546Z  WARN datafusion_table_providers::sqlite: Schema mismatch detected for table 'taxi_trips'. The following expected indexes are missing: ["i_taxi_trips_passenger_count"].
2024-09-06T21:46:12.752596Z  WARN datafusion_table_providers::sqlite: Schema mismatch detected for table 'taxi_trips'. The table contains unexpected indexes not defined in the configuration: ["i_taxi_trips_VendorID"].
2024-09-06T21:46:12.752620Z  WARN datafusion_table_providers::sqlite: The schema of the local copy of table 'taxi_trips' does not match the expected configuration. To correct this, you can drop the existing local copy at '/Users/sg/spice/spiceai/.spice/data/taxi_trips_sqlite.db'. A new table will be automatically created with the correct schema upon the first access.
2024-09-06T21:46:12.753283Z  INFO runtime: Dataset taxi_trips registered (s3://spiceai-demo-datasets/taxi_trips/2024/), acceleration (sqlite:file), results cache enabled.

In progress: unit test for get_indexes

Note: similar improvement must be done for primary_key

@sgrebnov sgrebnov self-assigned this Sep 6, 2024
src/sqlite.rs Show resolved Hide resolved
src/sqlite.rs Show resolved Hide resolved
src/sqlite.rs Outdated Show resolved Hide resolved
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants