-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefer explicitly provided dataset over default dataset in lookup #53
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -199,9 +199,12 @@ def test_engine_with_dataset(engine_using_test_dataset): | |
rows = table_one_row.select().execute().fetchall() | ||
assert list(rows[0]) == ONE_ROW_CONTENTS_EXPANDED | ||
|
||
# Table name shouldn't include dataset | ||
with pytest.raises(Exception): | ||
table_one_row = Table('test_pybigquery.sample_one_row', MetaData(bind=engine_using_test_dataset), autoload=True) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is another change in behavior, however seems to me that this is just another bug fix. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's correct. The lookup code is the same between |
||
table_one_row = Table('test_pybigquery.sample_one_row', MetaData(bind=engine_using_test_dataset), autoload=True) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. probably better to check that the expected results is returned There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we add an assertion on some expected property of the Table? In this example, we want to make sure that the dataset ID associated with the table == "test_pybigquery" and != whatever the default dataset ID is, right? |
||
rows = table_one_row.select().execute().fetchall() | ||
# verify that we are pulling from the specifically-named dataset, | ||
# instead of pulling from the default dataset of the engine (which | ||
# does not have this table at all) | ||
assert list(rows[0]) == ONE_ROW_CONTENTS_EXPANDED | ||
|
||
|
||
def test_dataset_location(engine_with_location): | ||
|
@@ -478,6 +481,14 @@ def test_has_table(engine, engine_using_test_dataset): | |
assert engine.has_table('sample', 'test_pybigquery') is True | ||
assert engine.has_table('test_pybigquery.sample') is True | ||
|
||
assert engine.has_table('sample_alt', 'test_pybigquery_alt') is True | ||
assert engine.has_table('test_pybigquery_alt.sample_alt') is True | ||
|
||
assert engine_using_test_dataset.has_table('sample') is True | ||
with pytest.raises(Exception): | ||
assert engine_using_test_dataset.has_table('test_pybigquery.sample') is True | ||
assert engine_using_test_dataset.has_table('sample', 'test_pybigquery') is True | ||
assert engine_using_test_dataset.has_table('test_pybigquery.sample') is True | ||
|
||
assert engine_using_test_dataset.has_table('sample_alt') is False | ||
|
||
assert engine_using_test_dataset.has_table('sample_alt', 'test_pybigquery_alt') is True | ||
assert engine_using_test_dataset.has_table('test_pybigquery_alt.sample_alt') is True | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I ran the full test suite with these changes and it passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is precedence for the 'table name dataset' over the scheme the behavior in other sqlalchemy dialects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike BigQuery, Postgres has a two level hierarchy (schema and table).
Postgres doesn't seem to do any interpretation of schema from the tablename parameter: https://github.com/sqlalchemy/sqlalchemy/blob/master/lib/sqlalchemy/dialects/postgresql/base.py#L2599-L2635
Same situation with MySQL:
https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/dialects/mysql/base.py#L2458-L2491
Arguably BigQuery is a special situation given the three level hierarchy (project/dataset/table). Other common databases let you run different database instances on the same server and with the same binaries, but BigQuery allows all relevant project_ids to be reached via the same connection and joined with each other, so it's more like a two-level schema name than anything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
What do you think should be the behavior for BQ?
Raising an error in that case sounds legit to me, problem is we will be changing a fundamental behavior.
Anyway this should probably be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, ideally I think the parsing on the table name should be dropped entirely, and we should only parse the schema. Given you already parse on the table name, it's probably rude to stop doing so, as there are probably code bases that rely on it.
That said, if we maintain compatibility there, I do think raising an exception when the user gives you conflicting instructions is safer than arbitrarily choosing which one to obey, and wouldn't break any sane existing code.