Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SQLite converter into Recap #418

Closed
mjperrone opened this issue Jan 19, 2024 · 14 comments · Fixed by #424
Closed

Support SQLite converter into Recap #418

mjperrone opened this issue Jan 19, 2024 · 14 comments · Fixed by #424
Assignees

Comments

@mjperrone
Copy link
Contributor

SQLite schematab docs:

Every SQLite database contains a single "schema table" that stores the schema for that database. The schema for a database is a description of all of the other tables, indexes, triggers, and views that are contained within the database. The schema table looks like this:

CREATE TABLE sqlite_schema(
  type text,
  name text,
  tbl_name text,
  rootpage integer,
  sql text
);

The sql field is the SQL DDL that would create the table.

To implement this, one would create a recap SQLite client to grab that DDL and then create a recap SQLite converter to turn it into recap types.

@mjperrone mjperrone changed the title Support SQLite Support SQLite converter into Recap Jan 19, 2024
@criccomini
Copy link
Contributor

Oh, gosh, I can't believe I didn't think of adding this! 🤣 Seems so obvious now that you opened the issue. 😝

@mjperrone
Copy link
Contributor Author

Using the DDL to represent the schema is an interesting (lazy?) choice for SQLite. But thinking about it a bit more... if we implement that for SQLite, we might be able to reuse that code for pulling schema from other databases assuming the DDL dumps are compatible. Potentially replace postgresql and mysql client/converter code using that.

@criccomini
Copy link
Contributor

criccomini commented Jan 20, 2024

Perhaps sqlglot can be of use?

@criccomini
Copy link
Contributor

assuming the DDL dumps are compatible

I think this is the key part of your statement. I suspect information_schema will be more robust. Perhaps start with SQLite impl only, and see where that leads...

@criccomini
Copy link
Contributor

Poked around a bit. Looks like there are a few options:

https://www.sqlitetutorial.net/sqlite-describe-table/

The pragma option looks pretty parsable.

@criccomini
Copy link
Contributor

@mjperrone do y'all want to take this or should I? I don't mind doing it, but I don't want to start if y'all want to take it for learning experience or whatever. =-)

@mjperrone
Copy link
Contributor Author

We aren't in a rush to implement this one, so I dont think we will get to it before you do if you're interested in it

@criccomini
Copy link
Contributor

Nice! I've been hankering to write a bit of code, so I'll take a crack at this.

@criccomini criccomini self-assigned this Jan 25, 2024
@criccomini
Copy link
Contributor

Starting to poke at this now.

@criccomini
Copy link
Contributor

criccomini commented Jan 31, 2024

@mjperrone been doing some digging on SQLite data types.

  1. SQLite supports two modes: STRICT or default. Tables created without STRICT use type affinities, which are recommendations that SQLite ignores.
  2. sqlite2 and sqlite3 differ
  3. In non-STRICT tables, everything in () is ignored (e.g. INTEGER(123) is just INTEGER to SQLite). The parenthesis are still stored in the pragma description, so it's accessible.
  4. For non-STRICT tables, SQLite uses some kind of wacky affinity matching rules.

Here's what I recommend:

  1. Make the converter work with STRICT tables only.
  2. Make the converter work with sqlite3.

This is a very straight-forward to implement. Is this OK with you?

Alternatively, we could add support for both STRICT and non-STRICT tables. This gets a lot messier. Dates, for example, are a mess in SQLite; it accepts both strings or ints in the column. 😢

WDYT? What is your use case here? Do you have any practical examples of SQLite schemas you're dealing with?

/cc @adrianisk

@criccomini
Copy link
Contributor

criccomini commented Jan 31, 2024

Some more details on affinity matching:

https://medium.com/@SullivanArielle/type-affinity-5936cee17c35

Note that there are two things happening here:

  1. The type of the column
  2. Coercing values

We're only interested in column types (1). As such, after further reading, I think the converter can handle both affinity and STRICT types. I'll include a flag similar to the PG converter's enforce_array_dimensions flag to specify whether the converter should return STRICT RecapTypes (e.g. all integers get treated as IntType(bits=64)) or affinity types (e.g. a TINYINT would return a BoolType (or maybe IntType(bits=1, variable=False)).

I also plan on implementing scale/precision support and char octet length. They'll only be used when use_affinity_rules==True.

@mjperrone
Copy link
Contributor Author

@criccomini I'll try to get some more details for you.

criccomini added a commit that referenced this issue Feb 1, 2024
Recap can now read SQLite schemas as Recap types. SQLite's schema system is
somewhat strange. Some notes:

1. Any column can store any type.
2. SQLite has 5 storage classes (null, int, real, text, blob).
3. STRICT forces column types to be the storage types.
4. non-STRICT tables allow any strings for column types.
5. non-STRICT column types are hints to coerce types as they're written to disk.
6. Parenthesis in types (e.g. DOUBLE(6, 2)) are ignored by SQLite.

See https://www.sqlite.org/datatype3.html#storage_classes_and_datatypes for more
details.

With all of this in mind, Recap's SQLiteConverter works according to SQLite's
affinity rules. This means:

1. Unknown types are treated as "ANY", which is a union of all storage types.
2. SQLiteConverter pays attention to precision/scale for REAL, etc.
3. SQLiteConverter pays attention to lengths for VARCHAR(255), etc.
4. SQLiteConverter treats date, datetime, time, and timestamp as ANY types.

Closes #418
@criccomini
Copy link
Contributor

@mjperrone I have a PR up here:

#424

It's still WIP for tests, but I want feedback. I'm going to add CLI tests for recap schema and recap ls and do some minor tweaks.

criccomini added a commit that referenced this issue Feb 1, 2024
Recap can now read SQLite schemas as Recap types. SQLite's schema system is
somewhat strange. Some notes:

1. Any column can store any type.
2. SQLite has 5 storage classes (null, int, real, text, blob).
3. STRICT forces column types to be the storage types.
4. non-STRICT tables allow any strings for column types.
5. non-STRICT column types are hints to coerce types as they're written to disk.
6. Parenthesis in types (e.g. DOUBLE(6, 2)) are ignored by SQLite.

See https://www.sqlite.org/datatype3.html#storage_classes_and_datatypes for more
details.

With all of this in mind, Recap's SQLiteConverter works according to SQLite's
affinity rules. This means:

1. Unknown types are treated as "ANY", which is a union of all storage types.
2. SQLiteConverter pays attention to precision/scale for REAL, etc.
3. SQLiteConverter pays attention to lengths for VARCHAR(255), etc.
4. SQLiteConverter treats date, datetime, time, and timestamp as ANY types.

Closes #418
criccomini added a commit that referenced this issue Feb 2, 2024
Recap can now read SQLite schemas as Recap types. SQLite's schema system is
somewhat strange. Some notes:

1. Any column can store any type.
2. SQLite has 5 storage classes (null, int, real, text, blob).
3. STRICT forces column types to be the storage types.
4. non-STRICT tables allow any strings for column types.
5. non-STRICT column types are hints to coerce types as they're written to disk.
6. Parenthesis in types (e.g. DOUBLE(6, 2)) are ignored by SQLite.

See https://www.sqlite.org/datatype3.html#storage_classes_and_datatypes for more
details.

With all of this in mind, Recap's SQLiteConverter works according to SQLite's
affinity rules. This means:

1. Unknown types are treated as "ANY", which is a union of all storage types.
2. SQLiteConverter pays attention to precision/scale for REAL, etc.
3. SQLiteConverter pays attention to lengths for VARCHAR(255), etc.
4. SQLiteConverter treats date, datetime, time, and timestamp as ANY types.

Closes #418
criccomini added a commit that referenced this issue Feb 2, 2024
Recap can now read SQLite schemas as Recap types. SQLite's schema system is
somewhat strange. Some notes:

1. Any column can store any type.
2. SQLite has 5 storage classes (null, int, real, text, blob).
3. STRICT forces column types to be the storage types.
4. non-STRICT tables allow any strings for column types.
5. non-STRICT column types are hints to coerce types as they're written to disk.
6. Parenthesis in types (e.g. DOUBLE(6, 2)) are ignored by SQLite.

See https://www.sqlite.org/datatype3.html#storage_classes_and_datatypes for more
details.

With all of this in mind, Recap's SQLiteConverter works according to SQLite's
affinity rules. This means:

1. Unknown types are treated as "ANY", which is a union of all storage types.
2. SQLiteConverter pays attention to precision/scale for REAL, etc.
3. SQLiteConverter pays attention to lengths for VARCHAR(255), etc.
4. SQLiteConverter treats date, datetime, time, and timestamp as ANY types.

Closes #418
criccomini added a commit that referenced this issue Feb 2, 2024
Recap can now read SQLite schemas as Recap types. SQLite's schema system is
somewhat strange. Some notes:

1. Any column can store any type.
2. SQLite has 5 storage classes (null, int, real, text, blob).
3. STRICT forces column types to be the storage types.
4. non-STRICT tables allow any strings for column types.
5. non-STRICT column types are hints to coerce types as they're written to disk.
6. Parenthesis in types (e.g. DOUBLE(6, 2)) are ignored by SQLite.

See https://www.sqlite.org/datatype3.html#storage_classes_and_datatypes for more
details.

With all of this in mind, Recap's SQLiteConverter works according to SQLite's
affinity rules. This means:

1. Unknown types are treated as "ANY", which is a union of all storage types.
2. SQLiteConverter pays attention to precision/scale for REAL, etc.
3. SQLiteConverter pays attention to lengths for VARCHAR(255), etc.
4. SQLiteConverter treats date, datetime, time, and timestamp as ANY types.

Closes #418
@mjperrone
Copy link
Contributor Author

I may be able to give this a look on Monday

criccomini added a commit that referenced this issue Feb 5, 2024
Recap can now read SQLite schemas as Recap types. SQLite's schema system is
somewhat strange. Some notes:

1. Any column can store any type.
2. SQLite has 5 storage classes (null, int, real, text, blob).
3. STRICT forces column types to be the storage types.
4. non-STRICT tables allow any strings for column types.
5. non-STRICT column types are hints to coerce types as they're written to disk.
6. Parenthesis in types (e.g. DOUBLE(6, 2)) are ignored by SQLite.

See https://www.sqlite.org/datatype3.html#storage_classes_and_datatypes for more
details.

With all of this in mind, Recap's SQLiteConverter works according to SQLite's
affinity rules. This means:

1. Unknown types are treated as "ANY", which is a union of all storage types.
2. SQLiteConverter pays attention to precision/scale for REAL, etc.
3. SQLiteConverter pays attention to lengths for VARCHAR(255), etc.
4. SQLiteConverter treats date, datetime, time, and timestamp as ANY types.

Closes #418
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants