Skip to content

Support sheet table function for google sheets connector#15829

Merged
findepi merged 5 commits intotrinodb:masterfrom
grantatspothero:gn/sheetsTableFunction
Mar 6, 2023
Merged

Support sheet table function for google sheets connector#15829
findepi merged 5 commits intotrinodb:masterfrom
grantatspothero:gn/sheetsTableFunction

Conversation

@grantatspothero
Copy link
Copy Markdown
Contributor

@grantatspothero grantatspothero commented Jan 24, 2023

Description

Supports sheet table function for the google sheets connector.

Allows querying google sheets directly by id instead of requiring a metadata sheet mapping.

Also makes the gsheets.metadata-sheet-id config optional since the sheet table function makes it no longer strictly required.

Additional context and related issues

N/A

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Google Sheets
* Add `sheet` table function to the connector. ({issue}`12502`)

@cla-bot cla-bot bot added the cla-signed label Jan 24, 2023
@github-actions github-actions bot added the docs label Jan 24, 2023
@grantatspothero grantatspothero force-pushed the gn/sheetsTableFunction branch 3 times, most recently from c22f047 to caa428b Compare January 25, 2023 20:13
@grantatspothero grantatspothero changed the title Support query table function for google sheets connector Support sheet table function for google sheets connector Jan 25, 2023
@grantatspothero grantatspothero force-pushed the gn/sheetsTableFunction branch 3 times, most recently from 42170f1 to 29aee06 Compare January 25, 2023 20:40
@grantatspothero grantatspothero force-pushed the gn/sheetsTableFunction branch 3 times, most recently from 80ee121 to c776428 Compare January 25, 2023 20:55
@grantatspothero
Copy link
Copy Markdown
Contributor Author

Ready for another review.

@grantatspothero grantatspothero force-pushed the gn/sheetsTableFunction branch 2 times, most recently from 679472e to ffa0595 Compare January 25, 2023 22:25
@grantatspothero grantatspothero force-pushed the gn/sheetsTableFunction branch 2 times, most recently from 7e968a4 to d8abfb7 Compare February 1, 2023 20:17
@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented Feb 8, 2023

Could you confirm and fix the commit boundaries? check-commit failed due to a missing symbol.

@grantatspothero
Copy link
Copy Markdown
Contributor Author

Could you confirm and fix the commit boundaries? check-commit failed due to a missing symbol.

Removed toSchemaTableName in one commit too early, fixed.

@anusudarsan
Copy link
Copy Markdown
Member

anusudarsan commented Feb 14, 2023

@kasiafi or @martint can you please do a final review/approve (and merge) this?

@grantatspothero
Copy link
Copy Markdown
Contributor Author

@wendigo mind giving a final set of eyes? @ebyhr already gave a detailed review.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Engine is responsible for serializing the handles, but i don't think it cares about their common interfaces.
Thus i think we can separate SheetsConnectorTableHandle serialization and ignore (not test) whether jsonCodec(SheetsConnectorTableHandle) is able to find the subclasses.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just confirmed, you are correct this bit is not necessary.

Will remove and just test that the specific table handles can be serialized/deserialized.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast of expected is redundant. why not just declare the variable with a more specific type

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed after the above refactor.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does googleSheetIdHere typically look like? are they numeric, alphanumeric?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See TestSheetsPlugin for an example sheet id. For example: 1S625j2oTptRepg6Yci68fCYE1269tdoSjljNOmTgQ3U

Comment on lines 157 to 161
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks verbose. @mosabua @m57lyra is it how we typically format SQL in docs?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the Mozilla SQL style ideally .. in this case I would confirmhow we format in all other docs .. but I think this is right

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this formatting compatible with how Trino SQL formatter would format this?

Copy link
Copy Markdown
Contributor Author

@grantatspothero grantatspothero Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this in a debugger:

var statement = new SqlParser().createStatement("SELECT * FROM TABLE(example.system.sheet(id => 'googleSheetIdHere', range => 'TabName!A1:B4'))", new ParsingOptions());

SqlFormatter.formatSql(statement);

The formatter decides to write the query like this:

SELECT *
FROM
  TABLE(example.system.sheet(
      id => 'googleSheetIdHere',
      range => 'TabName!A1:B4'))

I did not know that formatter existed, good to know. Changed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does it take to run this test class?

Copy link
Copy Markdown
Contributor Author

@grantatspothero grantatspothero Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty fast, ~15 seconds on my local machine. most of which is creating the catalog.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about adding a test case in TestSheetsPlugin (instead?)

Copy link
Copy Markdown
Contributor Author

@grantatspothero grantatspothero Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that helps, TestSheetsPlugin runs in 1-2 seconds currently.

Moving the test case in there will just make TestSheetsPlugin take ~15 seconds instead of ~1-2 seconds. I need to not just verify the connector can startup without the metadata sheet id, but also that the connector can run a table function query successfully without a metadata id.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grantatspothero i understand the approach currently taken is best from test coverage perspective.
@nineinchnick and I were recently debating our inherently limited ability to have test coverage, and how much we can test in total on the CI. Looks like there are and will always be limits to that. This made me more cautious about time being added to tests with respect to additional confidence gained when doing so. Ultimately, we need to accept the fact we cannot test everything, which is especially sad to me.

(no change requested here, but I'd still consider not having this test class)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to mention that this is the example catalog..

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@grantatspothero grantatspothero force-pushed the gn/sheetsTableFunction branch from 2cafc75 to 154492a Compare March 2, 2023 21:04
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what 2 and 5 are referring to?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error codes for: SHEETS_UNKNOWN_TABLE_ERROR and NOT_FOUND respectively. changed the comment to use the names instead of the codes to be clearer.

@grantatspothero grantatspothero force-pushed the gn/sheetsTableFunction branch 3 times, most recently from d0309f3 to 61c8c5b Compare March 2, 2023 22:10
Copy link
Copy Markdown
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

thanks @grantatspothero for quick response loop

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, now i can read the comment at least :)

however, i don't understand why we would want to throw "not found" within IOException catch block
to me, the current SHEETS_UNKNOWN_TABLE_ERROR looks fine (the IOException is just some error)

if we want to detect that IOException conveys "specific sheet doesn't exist", we could do that (but we wouldnt' say TableNotFound even then, because the requested sheet may not be a table)

if it was up to me to decide, and if no new information is provided to me, i would just delete this TODO comment and worry about that later, and only if we learn later there is something to worry about

@findepi findepi force-pushed the gn/sheetsTableFunction branch from 61c8c5b to d3ed6c2 Compare March 6, 2023 09:00
@findepi
Copy link
Copy Markdown
Member

findepi commented Mar 6, 2023

(rebased to resolve conflict)

@findepi
Copy link
Copy Markdown
Member

findepi commented Mar 6, 2023

/test-with-secrets sha=d3ed6c25f1b3add9cdd97d9590780e24ecc3a38d

@findepi
Copy link
Copy Markdown
Member

findepi commented Mar 6, 2023

CI green. Waiting for secrets run to finish.

@grantatspothero
Copy link
Copy Markdown
Contributor Author

grantatspothero commented Mar 6, 2023

Could not find the secret run here: https://github.com/trinodb/trino/actions?query=event%3Arepository_dispatch

I see other secrets runs from other PRs, but not for the commit d3ed6c25f1b3add9cdd97d9590780e24ecc3a38d

@findepi
Copy link
Copy Markdown
Member

findepi commented Mar 6, 2023

@grantatspothero thanks for reminding me we don't need a run with secrets for this PR

@findepi findepi merged commit cf1960d into trinodb:master Mar 6, 2023
@github-actions github-actions bot added this to the 410 milestone Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

7 participants