Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add table sampling support #1421

Merged
merged 7 commits into from
Mar 19, 2024
Merged

Conversation

jczhong84
Copy link
Collaborator

Here we add two approaches of table sampling support:

  • Sampled Table
    There will be a new separated sampled version of the original table being created.

  • Bucketing
    There will be no new table, but the original table will be reorganized so that it can be queried with additional filters from a small portion of the dataset. This will be using the official TABLESAMPLE clause from the language.

  • If a table support sampling, we can add it in the column of custom_properties of table data_table_information

    • sampling: bool
      Indicating if this table supports sampling or not.
    • sampled_table: str
      Name of the sampled version of the table. This has higher priotiry. If sampled_table is not provided while sampling is true, we'll use the TABLESAMPLE clause.
  • If any table in the query support sampling, we'll show the Sample selector on the right side of the query engine selector

  • If a sample rate is selected, we'll replace the table with either with sampled table or use the TABLESAMPLE clause to transform the query before running.

table_sampling.mp4

Comment on lines +22 to +35
def format_query(query: str, language: Optional[str] = None):
dialect = _get_sqlglot_dialect(language)
statements = transpile(
query,
read=dialect,
write=dialect,
pretty=True,
)

return _statements_to_query(statements)


def get_select_statement_limit(
statement: Union[exp.Expression, str],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added those functions for query limiting, but has not been used.

querybook/server/lib/query_analysis/transform.py Outdated Show resolved Hide resolved
querybook/server/lib/query_analysis/transform.py Outdated Show resolved Hide resolved
querybook/server/lib/query_analysis/transform.py Outdated Show resolved Hide resolved
querybook/server/lib/query_analysis/transform.py Outdated Show resolved Hide resolved
querybook/server/lib/query_analysis/transform.py Outdated Show resolved Hide resolved
Comment on lines 105 to 113
const tableSamplingDOM =
!disabled && TableSamplingConfig.enabled && hasSamplingTables ? (
<TableSamplingSelector
sampleRate={sampleRate}
setSampleRate={onSampleRateChange}
tooltipPos={runButtonTooltipPos}
/>
) : null;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this live outside of QueryRunButton? I dont think it should control the UI for TableSamplingSelector, that's the job of the datacell or querycomposer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will address it in a separate PR to use backend limit logic.

}))
);
const DEFAULT_SAMPLE_RATE = TableSamplingConfig.default_sample_rate;
const TableSamplingSelector: React.FC<{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to another component file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will address it in a separate PR to use backend limit logic.

querybook/webapp/const/metastore.ts Outdated Show resolved Hide resolved
requirements/parser/sqlglot.txt Outdated Show resolved Hide resolved
czgu
czgu previously approved these changes Mar 19, 2024
@jczhong84 jczhong84 merged commit 4287294 into pinterest:master Mar 19, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants