Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge string-view2 branch: reading from parquet up to 2x faster for some ClickBench queries (not on by default) #11667

Merged
merged 20 commits into from
Jul 29, 2024

Commits on Jul 16, 2024

  1. Configuration menu
    Copy the full SHA
    987e33b View commit details
    Browse the repository at this point in the history
  2. Update for deprecated method

    alamb committed Jul 16, 2024
    Configuration menu
    Copy the full SHA
    2c808fb View commit details
    Browse the repository at this point in the history

Commits on Jul 19, 2024

  1. Add a config to force using string view in benchmark (#11514)

    * add a knob to force string view in benchmark
    
    * fix sql logic test
    
    * update doc
    
    * fix ci
    
    * fix ci only test
    
    * Update benchmarks/src/util/options.rs
    
    Co-authored-by: Andrew Lamb <[email protected]>
    
    * Update datafusion/common/src/config.rs
    
    Co-authored-by: Andrew Lamb <[email protected]>
    
    * update tests
    
    ---------
    
    Co-authored-by: Andrew Lamb <[email protected]>
    XiangpengHao and alamb authored Jul 19, 2024
    Configuration menu
    Copy the full SHA
    8d8732c View commit details
    Browse the repository at this point in the history
  2. Add String view helper functions (#11517)

    * add functions
    
    * add tests for hash util
    XiangpengHao authored Jul 19, 2024
    Configuration menu
    Copy the full SHA
    8e0ca1a View commit details
    Browse the repository at this point in the history
  3. Add ArrowBytesViewMap and ArrowBytesViewSet (#11515)

    * Update `string-view` branch to arrow-rs main (#10966)
    
    * Pin to arrow main
    
    * Fix clippy with latest arrow
    
    * Uncomment test that needs new arrow-rs to work
    
    * Update datafusion-cli Cargo.lock
    
    * Update Cargo.lock
    
    * tapelo
    
    * merge
    
    * update cast
    
    * consistent dep
    
    * fix ci
    
    * add more tests
    
    * make doc happy
    
    * update new implementation
    
    * fix bug
    
    * avoid unused dep
    
    * update dep
    
    * update
    
    * fix cargo check
    
    * update doc
    
    * pick up the comments change again
    
    ---------
    
    Co-authored-by: Andrew Lamb <[email protected]>
    XiangpengHao and alamb authored Jul 19, 2024
    Configuration menu
    Copy the full SHA
    db65772 View commit details
    Browse the repository at this point in the history

Commits on Jul 20, 2024

  1. Enable GroupValueBytesView for aggregation with StringView types (#…

    …11519)
    
    * add functions
    
    * Update `string-view` branch to arrow-rs main (#10966)
    
    * Pin to arrow main
    
    * Fix clippy with latest arrow
    
    * Uncomment test that needs new arrow-rs to work
    
    * Update datafusion-cli Cargo.lock
    
    * Update Cargo.lock
    
    * tapelo
    
    * merge
    
    * update cast
    
    * consistent dep
    
    * fix ci
    
    * avoid unused dep
    
    * update dep
    
    * update
    
    * fix cargo check
    
    * better group value view aggregation
    
    * update
    
    ---------
    
    Co-authored-by: Andrew Lamb <[email protected]>
    XiangpengHao and alamb authored Jul 20, 2024
    Configuration menu
    Copy the full SHA
    efcf5c6 View commit details
    Browse the repository at this point in the history

Commits on Jul 22, 2024

  1. Initial support for regex_replace on StringViewArray (#11556)

    * initial support for string view regex
    
    * update tests
    XiangpengHao authored Jul 22, 2024
    Configuration menu
    Copy the full SHA
    34d42bc View commit details
    Browse the repository at this point in the history
  2. Add support for Utf8View for date/temporal codepaths (#11518)

    * Add StringView support for date_part and make_date funcs
    
    * run cargo update in datafusion-cli
    
    * cargo fmt
    
    ---------
    
    Co-authored-by: Andrew Lamb <[email protected]>
    a10y and alamb authored Jul 22, 2024
    Configuration menu
    Copy the full SHA
    bb780b3 View commit details
    Browse the repository at this point in the history

Commits on Jul 25, 2024

  1. GC StringViewArray in CoalesceBatchesStream (#11587)

    * gc string view when appropriate
    
    * make clippy happy
    
    * address comments
    
    * make doc happy
    
    * update style
    
    * Add comments and tests for gc_string_view_batch
    
    * better herustic
    
    * update test
    
    * Update datafusion/physical-plan/src/coalesce_batches.rs
    
    Co-authored-by: Andrew Lamb <[email protected]>
    
    ---------
    
    Co-authored-by: Andrew Lamb <[email protected]>
    XiangpengHao and alamb authored Jul 25, 2024
    Configuration menu
    Copy the full SHA
    2b58fd5 View commit details
    Browse the repository at this point in the history

Commits on Jul 26, 2024

  1. Configuration menu
    Copy the full SHA
    2b2b8ab View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ea11a9d View commit details
    Browse the repository at this point in the history
  3. [Bug] fix bug in return type inference of utf8_to_int_type (#11662)

    * fix bug in return type inference
    
    * update doc
    
    * add tests
    
    ---------
    
    Co-authored-by: Andrew Lamb <[email protected]>
    XiangpengHao and alamb authored Jul 26, 2024
    Configuration menu
    Copy the full SHA
    f13bb82 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    fb79638 View commit details
    Browse the repository at this point in the history
  5. Fix clippy

    alamb committed Jul 26, 2024
    Configuration menu
    Copy the full SHA
    281fbed View commit details
    Browse the repository at this point in the history

Commits on Jul 27, 2024

  1. Increase ByteViewMap block size to 2MB (#11674)

    * better default block size
    
    * fix related test
    XiangpengHao authored Jul 27, 2024
    Configuration menu
    Copy the full SHA
    5690712 View commit details
    Browse the repository at this point in the history
  2. Change --string-view to only apply to parquet formats (#11663)

    * use inferenced schema, don't load schema again
    
    * move config to parquet-only
    
    * update
    
    * update
    
    * better format
    
    * format
    
    * update
    XiangpengHao authored Jul 27, 2024
    Configuration menu
    Copy the full SHA
    322c3d2 View commit details
    Browse the repository at this point in the history
  3. Implement native support StringView for character length (#11676)

    * native support for character length
    
    * Update datafusion/functions/src/unicode/character_length.rs
    
    ---------
    
    Co-authored-by: Andrew Lamb <[email protected]>
    XiangpengHao and alamb authored Jul 27, 2024
    Configuration menu
    Copy the full SHA
    ab8005d View commit details
    Browse the repository at this point in the history

Commits on Jul 29, 2024

  1. Configuration menu
    Copy the full SHA
    561aee8 View commit details
    Browse the repository at this point in the history
  2. Remove uneeded patches

    alamb committed Jul 29, 2024
    Configuration menu
    Copy the full SHA
    2e9c8a0 View commit details
    Browse the repository at this point in the history
  3. cargo fmt

    alamb committed Jul 29, 2024
    Configuration menu
    Copy the full SHA
    f1f22fa View commit details
    Browse the repository at this point in the history