Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Ellipsize glob scan paths #2809

Merged

Conversation

anmolsingh20
Copy link
Contributor

@anmolsingh20 anmolsingh20 commented Sep 8, 2024

Resolves #2709

  • Previously if there were multiple urls in the logical plan, it cluttered the output of 'df.explain' with massive text. Now we add ellipses if there are more than six, improving readability.
  • The current test emulates multiple urls from the same test fixture 'mvp.parquet'. Ideally there should be a test fixture with multiple parts of a parquet file.

@anmolsingh20 anmolsingh20 changed the title ellipsize glob scan paths [FEAT] ellipsize glob scan paths Sep 8, 2024
@github-actions github-actions bot added the enhancement New feature or request label Sep 8, 2024
Copy link

codspeed-hq bot commented Sep 8, 2024

CodSpeed Performance Report

Merging #2809 will degrade performances by 56.78%

Comparing anmolsingh20:ellipsize_glob_paths_2709 (deb577f) with main (58d1856)

Summary

❌ 2 regressions
✅ 14 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main anmolsingh20:ellipsize_glob_paths_2709 Change
test_explain[100 Small Files] 36.8 ms 43.5 ms -15.27%
test_show[100 Small Files] 49.6 ms 114.7 ms -56.78%

Copy link

codecov bot commented Sep 8, 2024

Codecov Report

Attention: Patch coverage is 97.95918% with 1 line in your changes missing coverage. Please review.

Please upload report for BASE (main@7fe3dbc). Learn more about missing BASE report.
Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-scan/src/glob.rs 91.66% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #2809   +/-   ##
=======================================
  Coverage        ?   63.29%           
=======================================
  Files           ?     1007           
  Lines           ?   114181           
  Branches        ?        0           
=======================================
  Hits            ?    72276           
  Misses          ?    41905           
  Partials        ?        0           
Files with missing lines Coverage Δ
src/daft-scan/src/lib.rs 62.01% <100.00%> (ø)
src/daft-scan/src/glob.rs 87.35% <91.66%> (ø)

@anmolsingh20 anmolsingh20 changed the title [FEAT] ellipsize glob scan paths [FEAT] Ellipsize glob scan paths Sep 8, 2024
Copy link
Member

@samster25 samster25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welcome and thanks for the contribution! Just a minor fix and we should be good to go!

@@ -272,9 +272,28 @@ impl ScanOperator for GlobScanOperator {
}

fn multiline_display(&self) -> Vec<String> {
let condensed_glob_paths = if self.glob_paths.len() <= 6 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be 7, otherwise we take the same amount of lines and omit 1 file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


for _ in 0..num_sources {
sources.push(format!("../../tests/assets/parquet-data/mvp.parquet"));
// sources.push(format!("File {}", i + 1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented out line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the commented line.

On a side note, I am not sure of the failing benchmarking tests. The numbers also seem highly varying, though there was no change in functionality.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anmolsingh20 the benchmarks are a known issue right now, you can just ignore them for the time being.

@samster25 samster25 merged commit d30e62a into Eventual-Inc:main Sep 9, 2024
34 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ellipsize glob paths similar to https://github.com/Eventual-Inc/Daft/pull/2695
3 participants