-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read test cases checklist #7
Comments
@wjones127 - Your list looks great. I'd add a Delta Lake that's constructed with different save modes to the list. df = spark.range(0, 3)
df.write.format("delta").save("/tmp/delta-table")
df2 = spark.range(4, 6)
df2.write.mode("overwrite").format("delta").save("/tmp/delta-table") This test will make sure that the Delta Lake reader isn't just reading all the Parquet files. |
Some notes for implementing each of these A Delta Lake table with all data typesExample from delta-rs tests: https://github.com/delta-io/delta-rs/blob/fae50cca528446e27c5401818a4f31b5a97e8ad2/python/tests/conftest.py#L30-L53 A table with a checkpointSet A table which has had a schema changeoverwrite with A table with stats as structTurn A table with id-based column mappingset A table with name-based column mappingset A table with multi-part checkpointUse setting https://github.com/delta-io/delta/pull/946/files |
@wjones127 - are you cool with separate reference tables for "A Delta Lake table with all data types"? I think this will make it more obvious what types aren't supported for each connector. Suppose a connector doesn't support 5 data types. One failing test might not fully explain the gap like 5 failing tests would. Thoughts? |
IMO that doesn't seem fully necessary. But perhaps we can separate the primitive types from the nested (struct, list, map) |
@wjones127 - separating the primitive times from complex types seems like a nice balance 👍 |
These table ideas look very good to me.
I will think of more and keeping adding to this thread. :D |
|
What is our philosophy of test cases? Do we care about each individual feature? Or are we collecting a set of cases that have maximal coverage of important common and corner cases? I'm assuming the latter for this draft list.
Reader protocol v1:
Reader protocol v2:
The text was updated successfully, but these errors were encountered: