Skip to content

Fix escaping in metadata calls to SQL Server driver#14286

Merged
hashhar merged 1 commit intotrinodb:masterfrom
vlad-lyutenko:vlad-lyutenko/get-columns-escape-brackets
Oct 4, 2022
Merged

Fix escaping in metadata calls to SQL Server driver#14286
hashhar merged 1 commit intotrinodb:masterfrom
vlad-lyutenko:vlad-lyutenko/get-columns-escape-brackets

Conversation

@vlad-lyutenko
Copy link
Copy Markdown
Contributor

Description

I was able to rename table to contain special brackets:
alter table test2 rename to "[test2]";
show tables;
Table
----------
[test2]

But looks like we don't support to query such tables:
select * from "[test2]";
uery 20220924_171606_00044_winm8 failed: Table 'dbo.[test2]' has no supported columns (all 0 columns are not supported)

I checked that we cannot select tables with brackets in names because we didn't escape properly these table names:
because in metadata getColumns method, this parameter is used as SQL Like pattern and we should escape special characters

So I made a simple fix, by escaping brackets and now was I able to query such tables in trino:
trino:dbo> SELECT * from sqlserver.dbo."[test3]";
column1 | column2

Non-technical explanation

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

Comment on lines 1155 to 1156
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is not needed for any SQL spec-conformant database.
It looks like something required for SQL Server to work correctly.
Am i correct?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi findepi changed the title Escape properly brackets in get columns call Fix escaping in metadata calls to SQL Server driver Sep 29, 2022
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the call to super is correct only because you know what exactly the implementation is doing.
since you know this, you can as well inline

i'd inline here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under inline you mean:
return super.escapeObjectNameForMetadataQuery(name, escape)
.replace("]", escape + "]")
.replace("[", escape + "[");
?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i meant replacing super.escapeObjectNameForMetadataQuery with the contents of that method.

cc @hashhar

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW is there some SQL Server docs documenting the deviation from JDBC specification?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM.
Should this PR include any test changes as well?

@vlad-lyutenko
Copy link
Copy Markdown
Contributor Author

Code LGTM.
Should this PR include any test changes as well?

Yes I think, we should add some tests, which for example create table with brackets in table name and then query this table, it was the fail scenario before this PR

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about renaming and reusing io.trino.testing.BaseConnectorTest#testColumnNameTestData?
What about having this as a generic test (in BCT) to guard all connectors against "tricky table names"?

We probably would want separate method to filterColumnNameTestData method, since column naming rules are not necessarily same as table naming rules in the remote system. For example, some databases store tables in files with names after table names, so file system naming constraints apply.

cc @hashhar @ebyhr @kokosing

Copy link
Copy Markdown
Contributor Author

@vlad-lyutenko vlad-lyutenko Sep 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about renaming and reusing io.trino.testing.BaseConnectorTest#testColumnNameTestData?

I think these rules a little bit different for table name and column name, so data provider should be different for such cases and tests should be different I plan to use this test (column name test) in this PR:
#14272

By overriding testColumnNameTestData

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about having this as a generic test (in BCT) to guard all connectors against "tricky table names"?

Maybe we can move this test to BCT and then just override Data Provider for SQLServer

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should test tricky names for all connectors - would help make sure of catching other "special" databases as well.

I'd propose keeping test + data provider both in BCT and see if something else fails.

As for column and table name rules being different we can have single data provider + two different filter methods so that other connectors can either filter for column names or table names depending on what is not supported.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these rules a little bit different for table name and column name, so data provider should be different for such cases and tests should be different

yes, but source of "tricky names" can be the same

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. I mean single data provider + two filter methods (one for table and one for columns) - both do not filter anything in default impl.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absolutely, no unified filtering (#14286 (comment))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay to doing this in a followup.
let's add a TODO comment on this method.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: to match existing tests having similar name (makes finding related tests easier).

Suggested change
public void testCreateTableWithTrickyName(String tableName)
public void testCreateTableWithSpecialCharacterName(String tableName)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to BaseConnectorTest (either in this PR or as follow-up - since I expect this to fail in lots of places (ClickHouse, Hive/Iceberg/Delta etc.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep let's do it in follow up PR

Comment on lines 194 to 197
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I'd recommend inlining this since there are lot of other tests which do same thing.

As a follow-up you can add a static helper + replace other usages in one go.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this example the brackets don't seem significant. i.e. i think this is testing ; instead of brackets.

i.e. a test case's significance should either be obvious or added as a comment.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also test unpaired brackets i.e. only single [ or single ].

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about " since they are quoting characters in lot of databases.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for ` since it's quoting character in some databases as well.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you know whether SQL Server supports create table, so this condition is redundant

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testCreate -> testCreateAndDrop -- you test also that DROP works, and it's reasonable to test that

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.replace("\"\"", "\"") looks fishy here.
QueryRunner.tableExists() should be given a table name as it is, with no "SQL-like formatting" or any other escapes applied

are you working around some bug?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep it's kind of workaround because for this test :
.add("tri\"\"cky")

When we create table using trino if we want create table with such name:tri"ckywe need to pass to trino double double quotes like tri""cky in another way trino parser will not pass it and then it replace double double quotes with double quotes "" -> " so final table name in Database will be tri"cky as we want. But when we query this table with :
assertTrue(getQueryRunner().tableExists(getSession(), tableName));

We need to pass strict table name which we expect because it's not going through the trino parser engine it's kind of checking table metadata, so it expect strict column name with one double quote

Copy link
Copy Markdown
Contributor Author

@vlad-lyutenko vlad-lyutenko Oct 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in BCT we have smth like this, I think It could help also in my case:

    private static String toColumnNameInSql(String columnName, boolean delimited)
    {
        String nameInSql = columnName;
        if (delimited) {
            nameInSql = "\"" + columnName.replace("\"", "\"\"") + "\"";
        }
        return nameInSql;
    }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we need to distinguish "actual table name" and "table name in SQL" in this test.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some " + " are redundant.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay to doing this in a followup.
let's add a TODO comment on this method.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, why not immediately add all the test cases from

        return ImmutableList.<String>builder()
                .add("lowercase")
                .add("UPPERCASE")
                .add("MixedCase")
                .add("an_underscore")
                .add("a-hyphen-minus") // ASCII '-' is HYPHEN-MINUS in Unicode
                .add("a space")
                .add("atrailingspace ")
                .add(" aleadingspace")
                .add("a.dot")
                .add("a,comma")
                .add("a:colon")
                .add("a;semicolon")
                .add("an@at")
                .add("a\"quote")
                .add("an'apostrophe")
                .add("a`backtick`")
                .add("a/slash`")
                .add("a\\backslash`")
                .add("adigit0")
                .add("0startwithdigit")
                .build();

?

are these cases superior over the list of cases from io.trino.testing.BaseConnectorTest#testColumnNameTestData?

having these two lists out of sync today will make test unification harder tomorrow, since someone will need to think about every each of these cases and think whether to add them to a merged result

it would be better to do this hard thinking part within this PR

  • enhance cases in io.trino.testing.BaseConnectorTest#testColumnNameTestData method covering names with brackets
    • i don't think we need cases containing brackets AND apostrophes at the same time, but if you think these are valuable, let's add them too
  • have a test here with exact same list of test cases to cover and with a TODO comment to move it to BCT.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this is used for table names only, so let's name it as such

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"filter" as in "include" or "exclude"?

but why?
is table name upper/lower case story actually different from column name upper/lower?
Until we do #17, all names are effectively lowercased.

Copy link
Copy Markdown
Contributor Author

@vlad-lyutenko vlad-lyutenko Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have such check in MetadataUtil.tableExists :

 public static void checkObjectName(String catalogName, String schemaName, String objectName)
    {
        checkLowerCase(catalogName, "catalogName");
        checkLowerCase(schemaName, "schemaName");
        checkLowerCase(objectName, "objectName");
    }

so we should have only lower case in table name, or we got exception

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of filtering, we can know the table name will be effectively lowercased and pass that to tableExists call
this way we would add test coverage for the future when #17 is complete.
(i doubt we can hope person doing #17 would know to update the test code here and remove this filtering)

Copy link
Copy Markdown
Contributor Author

@vlad-lyutenko vlad-lyutenko Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi
So should we lowercase table name in test and pass it to table exists?
Because in table exists code, we didn't lowercase anything, we just check is it lower case or not and if not, throw exception.

If we lower case it in test, the person after doing #17, should also remove this lowercasing in test.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you can do this

String tableNameInSql = "\"" + tableName.replace("\"", "\"\"") + "\"";
// Until https://github.com/trinodb/trino/issues/17 the table name is effectively lowercase
tableName = tableName.toLowerCase(ENGLISH);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concat here will make addressing the above //TODO move test to BaseConnectorTest and merge this data provider with testColumnNameDataProvider harder.
A person doing that will need to think why table names have more test cases than column names, and what do with that.

Which elements from testTableNameDataProvider() actually add more value compared with what testColumnNameDataProvider() already provides?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add the comment like this:
//TODO merge TableNameDataProvider and ColumnNameDataProvider to ObjectNameDataProvider in BaseConnectorTest

So I expect that person will move the test to BCT and also merge this 2 data provider in one common ObjectNameDataProvider and then just applied 2 filters to column and table names.
I think it could break a lot in other connectors, maybe lets do it in follow up PR and now just don't touch BCT

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I expect that person will move the test to BCT and also merge this 2 data provider in one common ObjectNameDataProvider

i still don't know why would i merge the list instead of using the longer (better?) one everywhere.

Which elements from testTableNameDataProvider() actually add more value compared with what testColumnNameDataProvider() already provides?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By merging I mean that person will take 2 DataProviders, replace it with one ObjectNameDataProvider and replace 2 lists of special characters with one big list with all cases, which will be used everywhere

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this add over "a space" case?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a closing bracket, but the word says "open"

Suggested change
.add("single]openbrackets")
.add("open[bracket")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this add over an opening bracket and closing bracket cases below?

(we don't want to run more cases than necessary, since each takes time)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//TODO move test to BaseConnectorTest
// TODO move test to BaseConnectorTest

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//TODO replace TableNameDataProvider and ColumnNameDataProvider with ObjectNameDataProvider
// TODO replace TableNameDataProvider and ColumnNameDataProvider with ObjectNameDataProvider

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case all other lines will not be highlighted as TODO in Intellij

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can intend them with one more space (// blah ...) and then they will

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, didn't know

@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/get-columns-escape-brackets branch 2 times, most recently from a6854ed to e2c5f33 Compare October 3, 2022 12:34
@findepi
Copy link
Copy Markdown
Member

findepi commented Oct 3, 2022

@vlad-lyutenko please squash the commits. The test and the fix should typically be in same commit.

@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/get-columns-escape-brackets branch from e2c5f33 to 7a35e2a Compare October 3, 2022 14:48
Copy link
Copy Markdown
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. LGTM.

@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/get-columns-escape-brackets branch from 7a35e2a to c350eaa Compare October 3, 2022 17:57
@hashhar hashhar merged commit 19394a6 into trinodb:master Oct 4, 2022
@github-actions github-actions bot added this to the 399 milestone Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants