Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snowflake: cannot call GetObjects with null catalog #2171

Open
davidhcoe opened this issue Sep 19, 2024 · 12 comments
Open

snowflake: cannot call GetObjects with null catalog #2171

davidhcoe opened this issue Sep 19, 2024 · 12 comments
Labels
Type: bug Something isn't working

Comments

@davidhcoe
Copy link
Contributor

What happened?

The recent updates to the Snowflake driver using SQL-based queries do not allow for querying across catalogs and tables. For example, per

/// \param[in] catalog Only show tables in the given catalog. If NULL,
, I should be able to pass a NULL value for the catalog and ignore filtering by catalog to locate all tables , regardless of catalog, and if the value is an empty string, be able to find the tables that don't belong to a catalog.

The current SQL requires that I already have a database context, and the information_schema is queried that is tied to that catalog that is passed in.

Stack Trace

No response

How can we reproduce the bug?

No response

Environment/Setup

No response

@davidhcoe davidhcoe added the Type: bug Something isn't working label Sep 19, 2024
@davidhcoe
Copy link
Contributor Author

fyi @joellubi, @zeroshade

@joellubi
Copy link
Member

Hi @davidhcoe, can you share the error you're getting?

We explicitly test this case (note that the catalogFilter field is missing which means we're passing a nil pointer, and we expect to find all resources in this case).

I'm wondering if this could be related to snowflake insisting that a default database is set, and perhaps the session is missing one.

@davidhcoe
Copy link
Contributor Author

yes, the specific error is:

[Snowflake] 090105 (22000): Cannot perform SELECT. This session does not have a current database. Call 'USE DATABASE', or use a qualified name.

But looking at the SQL query it doesn't appear to allow for the empty strings or null entries, but I will setup and run the test you pointed to and try it that way as well.

@davidhcoe
Copy link
Contributor Author

davidhcoe commented Sep 20, 2024

The test works, but I think only because the URI already has the database in it which sets the context.

@joellubi
Copy link
Member

The test works, but I think only because the URI already has the database in it which sets the context.

This is true. I believe this is a Snowflake quirk where a default database must be set in order to use unqualified schema names (see USE DATABASE). This makes sense in most cases, but doesn't seem like it should be necessary for special schemas like information_schema.

The only ways I have been able to get this to work have been to use a default database for the session, or to qualify the schema in the query (e.g. database_name.information_schema). Snowflake requires the database name in both cases.

Was this working before? Prior to this change we used SHOW TERSE DATABASES to get the catalogs, which may not have required a database to be set for the session.

I think this can present issues for users aiming to use GetObjects to discover the databases available before connecting to one. Would it make sense to allow users with no database currently set for their session to issue GetObjects(depth=Catalog) and get the result of SHOW TERSE DATABASES to help bridge this gap before they can add a database to the session?

@lidavidm
Copy link
Member

I think that's reasonable. Is it also possible to detect this case and issue an error if the user tries to query the catalog (beyond the databases) without a session set?

Either way it would be good to make sure this is also explained in the documentation

@joellubi
Copy link
Member

Yes I think we can document the behavior and add some informative errors to guide users toward the correct usage.

@davidhcoe
Copy link
Contributor Author

It did work previously. In our scenario, the user can browse their databases and schemas before connecting to a table.

But shouldn’t you be able to call GetObjects with null values even at the column depth and have it bring in all columns across all tables (even if it’s not performant)?

@joellubi
Copy link
Member

You're right, it appears that information_schema.databases must be qualified with a valid database name but shows all databases regardless while information_schema.schemata only enumerates objects within the current database.

I was hoping we could make this a thinner wrapper around existing snowflake functionality, but it doesn't appear that the existing APIs have quite the right semantics. I can work on a change for the next release that allows the search to fan out to all databases matching the search.

@davidhcoe
Copy link
Contributor Author

Thanks. The metadata calls have been a consistent performance bottleneck, particularly as we look to switch from ODBC to ADBC. In ODBC, these calls are consistently less than 1 second but the ADBC calls are usually 2.5+ seconds.

@joellubi
Copy link
Member

Are you able to tell, either from the snowflake query log or from the driver source code if you have access, what APIs the ODBC driver is currently using to achieve this performance?

@davidhcoe
Copy link
Contributor Author

I’ll try to pull together the specifics on Monday.

joellubi pushed a commit that referenced this issue Oct 2, 2024
…atalog depth (#2194)

This PR partially addresses
#2171 by calling SHOW TERSE
DATABASES if GetObjects is called with the catalog depth and a null
catalog is passed.

It does not address other parts of
#2171 where a catalog can be
null at other depths.

---------

Co-authored-by: David Coe <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants