-
Notifications
You must be signed in to change notification settings - Fork 0
Alamb/suggestions #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…e#19622) ## Which issue does this PR close? - Closes apache#19591 ## Rationale for this change Explained in issue itself ## What changes are included in this PR? - Removed coalesce batches rule - Deprecate `CoalesceBatchesExec` ## Are these changes tested? Yes ## Are there any user-facing changes? Yes, added a deprecation tag on `CoalesceBatchesExec`
… indexing (apache#19590) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> This PR improves the performance of the `substring_index` function by optimizing delimiter search and substring extraction: - Single-byte fast path: introduces a specialized byte-based search for single-byte delimiters (e.g. `.`, `,`), avoiding UTF-8 pattern matching overhead. - Efficient index discovery: replaces the split-and-sum-length approach with direct index location using `match_indices` / `rmatch_indices`. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> - Added a fast path for `delimiter.len() == 1` using byte-based search. - Refactored the general path to use `match_indices` and `rmatch_indices` for more efficient positioning. ### Benchmarks - Single-byte delimiter benchmarks show ~2–3× speedup across batch sizes. - Multi-byte delimiters see a consistent ~10–15% improvement. ``` group main_substrindex perf_substrindex ----- ---------------- ---------------- substr_index/substr_index_10000_long_delimiter 1.12 548.2±18.39µs ? ?/sec 1.00 488.4±15.62µs ? ?/sec substr_index/substr_index_10000_single_delimiter 2.14 543.4±15.12µs ? ?/sec 1.00 254.0±7.80µs ? ?/sec substr_index/substr_index_1000_long_delimiter 1.12 43.1±1.63µs ? ?/sec 1.00 38.6±2.03µs ? ?/sec substr_index/substr_index_1000_single_delimiter 3.51 46.4±2.21µs ? ?/sec 1.00 13.2±0.99µs ? ?/sec substr_index/substr_index_100_long_delimiter 1.01 3.7±0.18µs ? ?/sec 1.00 3.7±0.20µs ? ?/sec substr_index/substr_index_100_single_delimiter 2.15 3.6±0.14µs ? ?/sec 1.00 1675.9±79.15ns ? ?/sec ``` ## Are these changes tested? - Yes, Existing unit tests pass. - New benchmarks added to verify performance improvement. <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? No. <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
## Which issue does this PR close? - Closes apache#19623 ## Rationale for this change Explained in issue ## What changes are included in this PR? Implement baseline metrics for `AsyncFuncExec` operator ## Are these changes tested? Have added a test with PR ## Are there any user-facing changes? There are user facing changes, now `explain analyze` will show metrics of `AsyncFuncExec` too, do we need to update any documentation regarding this? Asking cause I couldn't find anything on a quick search. --------- Co-authored-by: Yongting You <[email protected]>
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Part of apache#14763. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? Replace `TypeSignature::Exact` patterns with cleaner APIs: - `isnan/iszero`: `Signature::coercible` with `TypeSignatureClass::Float` - `nanvl`: `Signature::uniform(2, [Float32, Float64])` <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? Yes <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Jeffrey Vo <[email protected]>
…ructors (apache#19668) ### Which issue does this PR close? Closes apache#19664 --- ### Rationale for this change After apache#18880, `BatchPartitioner::try_new` gained additional parameters that are only relevant for round-robin repartitioning. This made the constructor API confusing, as hash repartitioning received parameters it does not use. Splitting the constructor improves clarity and avoids passing round-robin–specific parameters to hash partitioning. --- ### What changes are included in this PR? - Introduce `BatchPartitioner::try_new_hash` - Introduce `BatchPartitioner::try_new_round_robin` - Refactor callers to use the specialized constructors - Retain `BatchPartitioner::try_new` as a delegator for backward compatibility This is a pure refactor; behavior is unchanged. --- ### Are these changes tested? Yes. Existing tests cover this code path. All builds pass locally. --- ### Are there any user-facing changes? No. This change is internal only and does not affect user-facing behavior or APIs. --------- Co-authored-by: Your Name <[email protected]>
…ng decoding (apache#19545) ## Which issue does this PR close? * Closes apache#18560. ## Rationale for this change DataFusion’s Parquet row-level filter pushdown previously rejected all nested Arrow types (lists/structs), which prevented common and performance-sensitive filters on list columns (for example `array_has`, `array_has_all`, `array_has_any`) from being evaluated during Parquet decoding. Enabling safe pushdown for a small, well-defined set of list-aware predicates allows Parquet decoding to apply these filters earlier, reducing materialization work and improving scan performance, while still keeping unsupported nested projections (notably structs) evaluated after batches are materialized. ## What changes are included in this PR? * Allow a registry of list-aware predicates to be considered pushdown-compatible: * `array_has`, `array_has_all`, `array_has_any` * `IS NULL` / `IS NOT NULL` * Introduce `supported_predicates` module to detect whether an expression tree contains supported list predicates. * Update Parquet filter candidate selection to: * Accept list columns only when the predicate semantics are supported. * Continue rejecting struct columns (and other unsupported nested types). * Switch Parquet projection mask construction from root indices to **leaf indices** (`ProjectionMask::leaves`) so nested list filters project the correct leaf columns for decoding-time evaluation. * Expand root column indices to leaf indices for nested columns using the Parquet `SchemaDescriptor`. * Add unit tests verifying: * List columns are accepted for pushdown when used by supported predicates. * Struct columns (and mixed struct+primitive predicates) prevent pushdown. * `array_has`, `array_has_all`, `array_has_any` actually filter rows during decoding using a temp Parquet file. * Add sqllogictest coverage proving both correctness and plan behavior: * Queries return expected results. * `EXPLAIN` shows predicates pushed into `DataSourceExec` for Parquet. ## Are these changes tested? Yes. * Rust unit tests in `datafusion/datasource-parquet/src/row_filter.rs`: * Validate pushdown eligibility for list vs struct predicates. * Create a temp Parquet file and confirm list predicates prune/match the expected rows via Parquet decoding row filtering. * SQL logic tests in `datafusion/sqllogictest/test_files/parquet_filter_pushdown.slt`: * Add end-to-end coverage for `array_has`, `array_has_all`, `array_has_any` and combinations (OR / AND with other predicates). * Confirm pushdown appears in the physical plan (`DataSourceExec ... predicate=...`). ## Are there any user-facing changes? Yes. * Parquet filter pushdown now supports list columns for the following predicates: * `array_has`, `array_has_all`, `array_has_any` * `IS NULL`, `IS NOT NULL` This can improve query performance for workloads that filter on array/list columns. No breaking changes are introduced; unsupported nested types (for example structs) continue to be evaluated after decoding. ## LLM-generated code disclosure This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested. --------- Co-authored-by: Andrew Lamb <[email protected]> Co-authored-by: Qi Zhu <[email protected]>
…001` (apache#19666) ## Which issue does this PR close? - Part of apache#19656 ## Rationale for this change rust_decimal is a one person crate and is released somewhat infrequently * https://github.com/paupino/rust-decimal * https://crates.io/crates/rust_decimal It also uses a non trivial number of dependencies, including https://crates.io/crates/rkyv, some sort of zero copy deserialization framework that was recently subject to a RUSTSEC security advisory, see apache#19656 / apache#19657 Since `rust_decimal` is only used for sqllogictests to parse the results from postgres, we can pretty easily remove the dependency on `rust_decimal` and inline the very small amount functionality we need for sqllogictests This will both decrease the build time and our dependency trail. ## What changes are included in this PR? Removes the `rust_decimal` dependency from DataFusion and inlines the smallest required subset of decimal functionality we need for sqllogictests (which turns out to be pretty small) ## Are these changes tested? Yes by CI ## Are there any user-facing changes? No, this is all internal testing infrastructure
) (apache#19319) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes #apache#19141. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Sergey Zhukov <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>
… SimplifyInfo (apache#19505) In trying to fix apache#19418 I kept getting turned around about what was needed where. The `SimplifyInfo` trait made it extra hard to understand. I ended up realizing that the main reason for the trait to exist was tests. Removing the trait and adding a builder style API to `SimplifyContext` made it IMO more ergonomic for tests and other call sites, easier to track the code (no trait opaqueness) and clearer what simplification capabilities are available in each site. This got rid of e.g. some places where we were calling `ExecutionProps::new()` just to pass that into `SimplifyContext` which in turn would hand out references to the default query time, a default `ContigOptions`, etc; or in `datafusion/core/src/execution/session_state.rs` where we did `let dummy_schema = DFSchema::empty()`. This let me solve several problems: - Can store optimized logical plans for prepared statements - Users can optionally run an optimizer pass on logical plans without evaluating time functions Compared to apache#19426 this avoids adding a config option and is actually less lines of code (negative diff). Fixes apache#19418, closes apache#19426 (replaces it). --------- Co-authored-by: Claude Opus 4.5 <[email protected]>
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Part of apache#19025. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? - Added Time64/Time32 signatures to date_trunc - Added time truncation logic (hour, minute, second, millisecond, microsecond) - Error for invalid granularities (day, week, month, quarter, year) <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? Yes <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Jeffrey Vo <[email protected]>
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> `ParquetOpener::open()` is a critical function for parquet planning, it's the entry point for many major steps like row-group/file pruning. It has almost 400 lines of code now, this PR adds some markers to the code blocks/important steps, to make this function easier to navigate. (though I may have overlooked some critical steps) Ideally, we should break these blocks into utilities. I tried extracting some of them with AI, but the resulting utilities still have unclear semantics, with many input arguments and output items. Overall, the complexity doesn’t seem reduced after the change. I think it’s possible to factor them into helper functions with clear semantics, but that likely requires someone who understands the implementation details very well. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
## Which issue does this PR close? Closes apache#19347 ## Rationale for this change Native decimal log() (added in apache#17023) only supports integer bases. Non-integer bases like log(2.5, x) error out, which is a regression from the previous float-based implementation. ## What changes are included in this PR? Changes : - Fallback to f64 computation when base is non-integer - Integer bases (2, 10, etc.) still use efficient ilog() algorithm Refactoring: - Unified log_decimal32, log_decimal64, log_decimal128 into single generic log_decimal<T> using num_traits::ToPrimitive. - Used ToPrimitive::to_f64() and to_u128() - Invalid bases (≤1, NaN, Inf) now return NaN instead of erroring - matches f64::log behavior Large Decimal256 values that don't fit in i128 now work via f64 fallback. ## Are these changes tested? Yes: - All existing log_decimal* unit tests pass - Updated test_log_decimal128_invalid_base - expects NaN instead of error - Updated test_log_decimal256_large - now succeeds via fallback ## Are there any user-facing changes? Yes: ```sql -- Previously errored, now works SELECT log(2.5, 100::decimal(38,0)); -- Invalid base now returns NaN instead of error (consistent with float behavior) SELECT log(-2, 64::decimal(38,0)); -- Returns NaN --------- Co-authored-by: Jeffrey Vo <[email protected]>
…che#19369) ## Which issue does this PR close? Closes apache#19348 ## Rationale for this change Previously, pow() on decimal types would error for negative exponents and non-integer exponents with messages like: - Arrow error: Arithmetic overflow: Unsupported exp value: -5 - Compute error: Cannot use non-integer exp - This was a regression from when decimals were cast to float before pow(). The efficient integer-based algorithm for computing power on scaled integers cannot handle these cases. ## What changes are included in this PR? - Modified pow_decimal_int to fallback to pow_decimal_float for negative exponents - Modified pow_decimal_float to use an efficient integer path for non-negative integer exponents, otherwise fallback to f64 computation Added pow_decimal_float_fallback function that: - Converts the decimal to f64 - Computes powf(exp) - Converts back to the original decimal type with proper scaling - Added decimal_from_i128 helper to convert i128 results back to generic decimal types (needed for Decimal256 support) - Updated sqllogictests to expect success for negative/non-integer exponents ## Are these changes tested? Yes: Unit tests for pow_decimal_float_fallback covering negative exponents, fractional exponents, cube roots Updated SQL logic tests in decimal.slt ## Are there any user-facing changes? Yes. The following queries now work instead of returning errors: ```sql -- Negative exponent SELECT power(4::decimal(38, 5), -1); -- Returns 0.25 -- Non-integer exponent SELECT power(2.5, 4.2); -- Returns 46.9 -- Square root via power SELECT power(4::decimal, 0.5); -- Returns 2
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Builds on apache#19388 - Closes apache#19573 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> This PR explores one way to make `ListFilesCache` table scoped. A session level cache is still used, but the cache key is made a "table-scoped" path, for which a new struct ``` pub struct TableScopedPath(pub Option<TableReference>, pub Path); ``` is defined. `TableReference` comes from `CreateExternalTable` passed to `ListingTableFactory::create`. Additionally, when a table is dropped, all entries related to a table is dropped by modifying `SessionContext::find_and_deregister` method. Testing (change on adding `list_files_cache()` for cli is included for easier testing). - Testing cache reuse on a single table. ``` > \object_store_profiling summary ObjectStore Profile mode set to Summary > create external table test stored as parquet location 's3://overturemaps-us-west-2/release/2025-12-17.0/theme=base/'; 0 row(s) fetched. Elapsed 14.878 seconds. Object Store Profiling Instrumented Object Store: instrument_mode: Summary, inner: AmazonS3(overturemaps-us-west-2) Summaries: +-----------+----------+-----------+-----------+-------------+-------------+-------+ | Operation | Metric | min | max | avg | sum | count | +-----------+----------+-----------+-----------+-------------+-------------+-------+ | Get | duration | 0.030597s | 0.209235s | 0.082396s | 36.254189s | 440 | | Get | size | 204782 B | 857230 B | 497304.88 B | 218814144 B | 440 | | List | duration | 0.192037s | 0.192037s | 0.192037s | 0.192037s | 1 | | List | size | | | | | 1 | +-----------+----------+-----------+-----------+-------------+-------------+-------+ > select table, path, unnest(metadata_list) from list_files_cache() limit 1; +-------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | table | path | UNNEST(list_files_cache().metadata_list) | +-------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | test | release/2025-12-17.0/theme=base | {file_path: release/2025-12-17.0/theme=base/type=bathymetry/part-00000-dd0f2f50-b436-4710-996f-f1b06181a3a1-c000.zstd.parquet, file_modified: 2025-12-17T22:32:50, file_size_bytes: 40280159, e_tag: "15090401f8f936c3f83bb498cb99a41d-3", version: NULL} | +-------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row(s) fetched. Elapsed 0.058 seconds. Object Store Profiling > select count(*) from test where type = 'infrastructure'; +-----------+ | count(*) | +-----------+ | 142969564 | +-----------+ 1 row(s) fetched. Elapsed 0.028 seconds. Object Store Profiling ``` - Test separate cache entries are created for two tables with same path ``` > create external table test2 stored as parquet location 's3://overturemaps-us-west-2/release/2025-12-17.0/theme=base/'; 0 row(s) fetched. Elapsed 14.798 seconds. Object Store Profiling Instrumented Object Store: instrument_mode: Summary, inner: AmazonS3(overturemaps-us-west-2) Summaries: +-----------+----------+-----------+-----------+-------------+-------------+-------+ | Operation | Metric | min | max | avg | sum | count | +-----------+----------+-----------+-----------+-------------+-------------+-------+ | Get | duration | 0.030238s | 0.350465s | 0.073670s | 32.414654s | 440 | | Get | size | 204782 B | 857230 B | 497304.88 B | 218814144 B | 440 | | List | duration | 0.133334s | 0.133334s | 0.133334s | 0.133334s | 1 | | List | size | | | | | 1 | +-----------+----------+-----------+-----------+-------------+-------------+-------+ > select count(*) from test2 where type = 'bathymetry'; +----------+ | count(*) | +----------+ | 59963 | +----------+ 1 row(s) fetched. Elapsed 0.009 seconds. Object Store Profiling > select table, path from list_files_cache(); +-------+---------------------------------+ | table | path | +-------+---------------------------------+ | test | release/2025-12-17.0/theme=base | | test2 | release/2025-12-17.0/theme=base | +-------+---------------------------------+ 2 row(s) fetched. Elapsed 0.004 seconds. ``` - Test cache associated with a table is dropped when table is dropped, and the other table with same path is unaffected. ``` > drop table test; 0 row(s) fetched. Elapsed 0.015 seconds. Object Store Profiling > select table, path from list_files_cache(); +-------+---------------------------------+ | table | path | +-------+---------------------------------+ | test2 | release/2025-12-17.0/theme=base | +-------+---------------------------------+ 1 row(s) fetched. Elapsed 0.005 seconds. Object Store Profiling > select count(*) from list_files_cache() where table = 'test'; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row(s) fetched. Elapsed 0.014 seconds. > select count(*) from test2 where type = 'infrastructure'; +-----------+ | count(*) | +-----------+ | 142969564 | +-----------+ 1 row(s) fetched. Elapsed 0.013 seconds. Object Store Profiling ``` - Test that dropping a view does not remove cache ``` > create view test2_view as (select * from test2 where type = 'infrastructure'); 0 row(s) fetched. Elapsed 0.103 seconds. Object Store Profiling > select count(*) from test2_view; +-----------+ | count(*) | +-----------+ | 142969564 | +-----------+ 1 row(s) fetched. Elapsed 0.094 seconds. Object Store Profiling > drop view test2_view; 0 row(s) fetched. Elapsed 0.002 seconds. Object Store Profiling > select table, path from list_files_cache(); +-------+---------------------------------+ | table | path | +-------+---------------------------------+ | test2 | release/2025-12-17.0/theme=base | +-------+---------------------------------+ 1 row(s) fetched. Elapsed 0.007 seconds. ``` ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
|
|
||
| for (path, entry) in list_files_cache.list_entries() { | ||
| table_arr.push(path.table.map_or("NULL".to_string(), |t| t.to_string())); | ||
| table_arr.push(path.table.map(|t| t.to_string())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be nicer to us actual nulls rather than a "NULL" string here
| table_path = table_path | ||
| .with_glob(glob.as_ref())? | ||
| .with_table_ref(cmd.name.clone()); | ||
| table_path = table_path.with_glob(glob.as_ref())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated with_glob to preserve the other fields too
| Self::try_new(self.url, Some(glob)) | ||
| pub fn with_glob(mut self, glob: &str) -> Result<Self> { | ||
| self.glob = | ||
| Some(Pattern::new(glob).map_err(|e| DataFusionError::External(Box::new(e)))?); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self::try_new doesn't do anything to the url, so there is no need to reparse the url
c5d173e to
8a9bf43
Compare
|
Superceded by apache#19703 |
These are some code suggestions I had while reviewing the following PR in DataFusion
Merging this PR will update the PR in datafusion
I think it ok to merge apache#19616 without this PR and then merge it as a follow on PR in DataFusion