Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create temporary tables with SQL not supported. #12363

Closed
ncclementi opened this issue Sep 6, 2024 · 12 comments · Fixed by #12439
Closed

Create temporary tables with SQL not supported. #12363

ncclementi opened this issue Sep 6, 2024 · 12 comments · Fixed by #12439
Assignees
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@ncclementi
Copy link

ncclementi commented Sep 6, 2024

I'm trying to create a temporary table and a view but it seems they can't be created via SQL, I found this comment in the codebase but there is no documentation about it

I was playing around on the CLI trying to generate tables and views, and temp versions of them, and I noticed that adding TEMPORARY to the create statement still resulted in a table or view being categorized with table_type under information_schema.tables as BASE_TABLE and VIEW respectively, and not as LOCAL_TEMPORARY as I would expected, given that LOCAL_TEMPORARY exists as atable_type see here

Does this mean, that I can't create temporary tables or views via SQL? if so is that plan to be supported, or at least documented somewhere?

If not would it be possible to raise an exception instead of silently creating a table and ignoring the TEMPORARY aspect.?

@alamb
Copy link
Contributor

alamb commented Sep 9, 2024

Hi @ncclementi -- thanks for filing this issue

By default, the DataFusion engine implements all tables as "in memory ephemeral" tables -- they don't persist after the session

Since DataFusion is an engine it doesn't really have the notion of transactions

Does this mean, that I can't create temporary tables or views via SQL? if so is that plan to be supported, or at least documented somewhere?

You should be able to create VIEWs just fine and there are several tests that verify this

DataFusion CLI v41.0.0
> create view foo as SELECT 1+3;
0 row(s) fetched.
Elapsed 0.040 seconds.

> select * from foo;
+---------------------+
| Int64(1) + Int64(3) |
+---------------------+
| 4                   |
+---------------------+
1 row(s) fetched.
Elapsed 0.028 seconds.

As for temporary tables, no I don't think datafusion itself supports them (systems built on top of DataFusion do). It would be great to document this somewhere

It does appear that the TEMPORARY keyword is simply ignored by DataFusion:

> create temporary table bar (x int) as values (1);
0 row(s) fetched.
Elapsed 0.003 seconds.

> select * from bar;
+---+
| x |
+---+
| 1 |
+---+
1 row(s) fetched.
Elapsed 0.001 seconds.

That would likely be better if it returned an explicit "not supported" error or a reported the temporary nature correctly via information tables

@alamb alamb added the bug Something isn't working label Sep 9, 2024
@ncclementi
Copy link
Author

@alamb thank you very much for the reply.

By default, the DataFusion engine implements all tables as "in memory ephemeral" tables -- they don't persist after the session
Since DataFusion is an engine it doesn't really have the notion of transactions

Yes, thanks for pointing this put.

You should be able to create VIEWs just fine and there are several tests that verify this

Yes, this works as expected 👍

As for temporary tables, no I don't think datafusion itself supports them (systems built on top of DataFusion do). It would be great to document this somewhere
It does appear that the TEMPORARY keyword is simply ignored by DataFusion

Yes I thought, I pasted this but I forgot. Here is a an extended reproducer.

> CREATE TABLE my_table (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL
);

0 row(s) fetched. 
Elapsed 0.004 seconds.

> CREATE VIEW my_view AS
SELECT id, name FROM my_table;
0 row(s) fetched. 
Elapsed 0.002 seconds.

> CREATE TEMPORARY TABLE my_temp_table (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL
);
0 row(s) fetched. 
Elapsed 0.005 seconds.

> CREATE TEMPORARY VIEW my_temp_view AS
SELECT id, name FROM my_table;
0 row(s) fetched. 
Elapsed 0.002 seconds.

> SELECT * FROM information_schema.tables;
+---------------+--------------------+---------------+------------+
| table_catalog | table_schema       | table_name    | table_type |
+---------------+--------------------+---------------+------------+
| datafusion    | public             | my_temp_table | BASE TABLE |
| datafusion    | public             | my_temp_view  | VIEW       |
| datafusion    | public             | my_table      | BASE TABLE |
| datafusion    | public             | my_view       | VIEW       |
| datafusion    | information_schema | tables        | VIEW       |
| datafusion    | information_schema | views         | VIEW       |
| datafusion    | information_schema | columns       | VIEW       |
| datafusion    | information_schema | df_settings   | VIEW       |
| datafusion    | information_schema | schemata      | VIEW       |
+---------------+--------------------+---------------+------------+

That would likely be better if it returned an explicit "not supported" error or a reported the temporary nature correctly via information tables

Yes, if this is possible, that would be great! I'm happy to open a separate issue if needed to request for this.

@alamb
Copy link
Contributor

alamb commented Sep 10, 2024

Yes, if this is possible, that would be great! I'm happy to open a separate issue if needed to request for this.

Its definitely possible -- someone just needs to write the code.

I don't think there is any reason to open a separate ticket

Given that the request is clear and there is a test case, I think this would make a good first issue and will mark it as such to see if anyone is interested in trying to code it up.

Basically:

  1. Add a .slt test showing that CREATE TEMPORARY VIEW and CREATE TEMPORARY TABLE are ignored. sqllogictesting is documented here
  2. Add code to return a Not supported error when the TEMPORARY keyword is encountered

@alamb alamb added the good first issue Good for newcomers label Sep 10, 2024
@hailelagi
Copy link
Contributor

take

@comphead
Copy link
Contributor

comphead commented Sep 12, 2024

@alamb why we think this needs to be fixed? The DF ignores word TEMPORARY but in fact temporary table is in-memory session scoped table which DF exactly creates correctly.

I would say CREATE TABLE is alias in this case for CREATE TEMPORARY TABLE. I'm not really sure something needs to be fixed here

@ncclementi
Copy link
Author

would say CREATE TABLE is alias in this case for CREATE TEMPORARY TABLE. I'm not really sure something needs to be fixed here

@comphead But in this case shouldn't show up in the information schema with type=LOCAL_TEMPORARY instead of BASE_TABLE. ?

I think it's confusing for the user (it was to me) that CREATE TABLE and CREATE TEMPORARY TABLE lead silently to the same result.

@alamb
Copy link
Contributor

alamb commented Sep 12, 2024

I agree with @ncclementi -- the semantics of TEMPORARY tables are different than tables so silently ignoring the keyword is likely to be confusing (e.g. as they aren't distinguished in the information_schema). Btw I found a bunch of other syntax that is also ignored -- see #12450

If someone wants to implement temporary tables that sounds like a good idea to me -- I think that would mean propagating the temporary flag into the CreateTable statement and then into the table provider / info schema that would seem like a good change to me

Hopefully I didn't get too excited merging PRs earlier today. We can back out #12439 if you prefer or think we should have some more discussion about it

@comphead
Copy link
Contributor

we could have this documented. in fact tables created in DF are local temporary tables and as long it lives in memory.

The same way we can state the CREATE TABLE might be confusing to the user as it creates a temp table instead of permanent.

in DUCKDB https://duckdb.org/docs/sql/statements/create_table.html they support both syntaxes, but my feeling it is the same mechanism

@ncclementi
Copy link
Author

we could have this documented. in fact tables created in DF are local temporary tables and as long it lives in memory.

This, at the moment is not reflected in the information schema table, and that is what in my opinion is causing confusion. I personally like the addition of the PR #12439 because it makes it explicit.

In DUCKDB https://duckdb.org/docs/sql/statements/create_table.html they support both syntaxes, but my feeling it is the same mechanism

In duckdb a CREATE TABLE table statement results in a table with type=BASE TABLE in the information schema and it's located in the current database. If you CREATE TEMPORARY TABLE the table shows as type=LOCAL TEMPORARY in the information schema, and it will be located in the temp system database.

Something similar happens with VIEWS, happy to go into details if needed.

@comphead
Copy link
Contributor

Correct, thats what I meant, the difference is how the catalog treats the table, but physically they are the same.
If we care about the correct metadata handling in catalog then yes, we need to warn the user that TEMPORARY tables are not supported although they are supported.

I believe we can do the same as duckDB having the temporary and tweak the catalog based on the flag to comply with standard

@hailelagi
Copy link
Contributor

hailelagi commented Sep 12, 2024

would be happy to implement temporary tables in the catalog if it makes sense and there's another issue for it

@alamb
Copy link
Contributor

alamb commented Sep 16, 2024

I left some thoughts on #12463

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants