Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(mssql): support case-sensitive collations #9700

Merged
merged 9 commits into from
Jul 29, 2024

Conversation

gforsyth
Copy link
Member

@gforsyth gforsyth commented Jul 26, 2024

This should enable support for users working with Microsoft Fabric Warehouse

Description of changes

This updates a few internal table and view calls to use the
case-sensitive names which will allows us to (hopefully) support both
case-sensitive and case-insensitive collations.

The correct casing for the tables and views we use often (and the
corresponding columns):

Info schema tables:

  • INFORMATION_SCHEMA.COLUMNS
  • INFORMATION_SCHEMA.SCHEMATA
  • INFORMATION_SCHEMA.TABLES
    Temp table location: tempdb.dbo
    Catalogs: sys.databases
    Databases: sys.schemas

I also had to swap out a stored procedure that really didn't like
being used with the case-sensitive collation, but I think the
replacement is nicer anyway (we can push down some column selection to
the engine).

Issues closed

@gforsyth gforsyth added the mssql The Microsoft SQL Server backend label Jul 26, 2024
@cpcloud
Copy link
Member

cpcloud commented Jul 26, 2024

Brain is hurting at the prospect of debugging why the 3.10 job passed but not the 3.12 one 🧠

@cpcloud
Copy link
Member

cpcloud commented Jul 26, 2024

Well, locally 3.12.4 (the same version as in CI) is all green for me with this PR 🤔

@cpcloud
Copy link
Member

cpcloud commented Jul 26, 2024

Deleting my sqlserver image, because ms is known to alter the image under that tag without notice.

@cpcloud
Copy link
Member

cpcloud commented Jul 26, 2024

Re-running CI, to see if that was a fluke run. Not sure if a successful run is more concerning, but we'll have some additional information!

@gforsyth
Copy link
Member Author

I'm also having some weird behavior locally. Initially things were green on my local, now they're not. I might add in a few checks to ensure that the collation is being respected

@gforsyth
Copy link
Member Author

Not sure if a successful run is more concerning, but we'll have some additional information!

well... it succeeded this time...

@gforsyth gforsyth marked this pull request as draft July 26, 2024 16:47
@gforsyth
Copy link
Member Author

I'm marking this as a draft until I can figure out how to make our testing setup behave deterministically w.r.t. collation

@gforsyth
Copy link
Member Author

Wow I hate everything.

We use a docker health-check in the compose.yaml file to create the ibis_testing catalog so that it is there for us to insert data into when we start pytest.

mssql doesn't set collation until it's done restoring/creating various backups, so if the health check fires before the collation is set, then it gets set to the default collation, and you can't change that collation without removing the volume.

@cpcloud
Copy link
Member

cpcloud commented Jul 29, 2024

Thanks for tracking that down. @ncclementi and I observed this behavior but struggled to get to the bottom of it in a reasonable amount of time.

What is the MS recommended sequence of steps for creating a database with a specific collation such that it isn't defeated by some other MS SQL process?

@gforsyth
Copy link
Member Author

What is the MS recommended sequence of steps for creating a database with a specific collation such that it isn't defeated by some other MS SQL process?

I really don't know. Right now my only plan is to delay the health check...

@gforsyth
Copy link
Member Author

This is a semi-brittle fix to the collation race condition, but it seems to work consistently on my machine and CI?

@gforsyth gforsyth marked this pull request as ready for review July 29, 2024 18:49
The default collation in MSSQL is SQL_Latin1_General_CP1_CI_AS
where the CI stands for Case Insensitive.
We use a case-sensitive collation for testing so that we don't
break users with case-sensitive collations.
This updates a few internal table and view calls to use the
case-sensitive names which will allows us to (hopefully) support both
case-sensitive and case-insensitive collations.

The correct casing for the tables and views we use often (and the
corresponding columns):

Info schema tables:
- INFORMATION_SCHEMA.COLUMNS
- INFORMATION_SCHEMA.SCHEMATA
- INFORMATION_SCHEMA.TABLES
Temp table location: tempdb.dbo
Catalogs: sys.databases
Databases: sys.schemas

I also had to swap out a stored procedure that _really_ didn't like
being used with the case-sensitive collation, but I think the
replacement is nicer anyway (we can push down some column selection to
the engine).
Yes this is incredibly gross, but even specifying the collation in the
`CREATE DATABASE` call doesn't work reliably.
compose.yaml Outdated Show resolved Hide resolved
@cpcloud
Copy link
Member

cpcloud commented Jul 29, 2024

I have an idea: let's try to create database as part of the conftest.py setup instead of the health check hack (guilty as charged on that front).

@cpcloud
Copy link
Member

cpcloud commented Jul 29, 2024

Ok, I think I might have something working!

Comment on lines +41 to +45
def _load_data(self, *, database: str = IBIS_TEST_MSSQL_DB, **_):
with self.connection._safe_raw_sql(
"IF DB_ID('ibis_testing') is NULL BEGIN CREATE DATABASE [ibis_testing] END"
):
pass
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I was getting cryptic user errors when I tried this, but I must've made a syntax error somewhere

Copy link
Member

@cpcloud cpcloud Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably was the other thing I had to change (if the issue was connecting), database=None doesn't do what you might expect when connecting.

@cpcloud
Copy link
Member

cpcloud commented Jul 29, 2024

Locally all passing with

pytest -m mssql -n auto --dist loadgroup --snapshot-update

@@ -35,11 +36,23 @@
("DATETIME", dt.Timestamp(scale=3)),
# Characters strings
("CHAR", dt.string),
("TEXT", dt.string),
param(
"TEXT",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know that the TEXT type doesn't support modern text.

@cpcloud
Copy link
Member

cpcloud commented Jul 29, 2024

This seems like a bug fix or a feature given that a thing that was previously not working is now working, also 🥳 for that!

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, ship it!

@cpcloud cpcloud changed the title refactor(mssql): support case-sensitive collations feat(mssql): support case-sensitive collations Jul 29, 2024
@cpcloud cpcloud enabled auto-merge (squash) July 29, 2024 21:02
@cpcloud cpcloud merged commit 9382a0e into ibis-project:main Jul 29, 2024
83 checks passed
@gforsyth gforsyth deleted the mssql_case_sensitive branch July 30, 2024 14:17
@cpcloud cpcloud added this to the 9.3 milestone Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mssql The Microsoft SQL Server backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: Support for Microsoft Fabric Warehouse
2 participants