-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add table sampling support #1421
Conversation
def format_query(query: str, language: Optional[str] = None): | ||
dialect = _get_sqlglot_dialect(language) | ||
statements = transpile( | ||
query, | ||
read=dialect, | ||
write=dialect, | ||
pretty=True, | ||
) | ||
|
||
return _statements_to_query(statements) | ||
|
||
|
||
def get_select_statement_limit( | ||
statement: Union[exp.Expression, str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added those functions for query limiting, but has not been used.
const tableSamplingDOM = | ||
!disabled && TableSamplingConfig.enabled && hasSamplingTables ? ( | ||
<TableSamplingSelector | ||
sampleRate={sampleRate} | ||
setSampleRate={onSampleRateChange} | ||
tooltipPos={runButtonTooltipPos} | ||
/> | ||
) : null; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this live outside of QueryRunButton? I dont think it should control the UI for TableSamplingSelector, that's the job of the datacell or querycomposer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will address it in a separate PR to use backend limit logic.
})) | ||
); | ||
const DEFAULT_SAMPLE_RATE = TableSamplingConfig.default_sample_rate; | ||
const TableSamplingSelector: React.FC<{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to another component file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will address it in a separate PR to use backend limit logic.
Here we add two approaches of table sampling support:
Sampled Table
There will be a new separated sampled version of the original table being created.
Bucketing
There will be no new table, but the original table will be reorganized so that it can be queried with additional filters from a small portion of the dataset. This will be using the official
TABLESAMPLE
clause from the language.If a table support sampling, we can add it in the column of
custom_properties
of tabledata_table_information
Indicating if this table supports sampling or not.
Name of the sampled version of the table. This has higher priotiry. If sampled_table is not provided while sampling is true, we'll use the
TABLESAMPLE
clause.If any table in the query support sampling, we'll show the
Sample
selector on the right side of the query engine selectorIf a sample rate is selected, we'll replace the table with either with sampled table or use the
TABLESAMPLE
clause to transform the query before running.table_sampling.mp4