Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make display of interval types more pretty #6006

Merged
merged 6 commits into from
Jul 16, 2024

Conversation

Rachelint
Copy link
Contributor

@Rachelint Rachelint commented Jul 5, 2024

Which issue does this PR close?

Closes #5914

Rationale for this change

Make display of interval types more pretty.

What changes are included in this PR?

don't print the part in interval when it is zero.

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jul 5, 2024
@Rachelint Rachelint marked this pull request as ready for review July 5, 2024 13:51
@Rachelint Rachelint changed the title Make interval more pretty Make interval types display more pretty Jul 5, 2024
@Rachelint Rachelint changed the title Make interval types display more pretty Make display of interval types more pretty Jul 5, 2024
@tustvold tustvold added the api-change Changes to the arrow API label Jul 6, 2024
@alamb
Copy link
Contributor

alamb commented Jul 7, 2024

Thanks @Rachelint -- this looks great. I think now that we have merged #6012 (thanks again BTW!) this PR could be merged / rebased and CI would pass

@Rachelint
Copy link
Contributor Author

Rachelint commented Jul 7, 2024

Thanks @Rachelint -- this looks great. I think now that we have merged #6012 (thanks again BTW!) this PR could be merged / rebased and CI would pass

Oh, forget to push yesterday... Have updated now.
Thanks for the review.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Rachelint -- I think this looks great personally.

As @tustvold has signaled, I think this PR changes the semantics enough that we should treat it as a "public API" change and thus hold it for the next major release: #6016

Comment on lines +755 to +771
if first_part {
write!(
f,
"{}{}.{:09} secs",
secs_sign,
secs.abs(),
nanoseconds.abs()
)?;
} else {
write!(
f,
" {}{}.{:09} secs",
secs_sign,
secs.abs(),
nanoseconds.abs()
)?;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably avoid some repetition here if with a constant str. Something like

Suggested change
if first_part {
write!(
f,
"{}{}.{:09} secs",
secs_sign,
secs.abs(),
nanoseconds.abs()
)?;
} else {
write!(
f,
" {}{}.{:09} secs",
secs_sign,
secs.abs(),
nanoseconds.abs()
)?;
}
let prefix = if first_part { " " } else { "" }
write!(
f,
"{prefix}{}{}.{:09} secs",
secs_sign,
secs.abs(),
nanoseconds.abs()
)

However that might be slightly slower now it has to check prefix. You could possibly even use a field like prefix: &'static str instead of bool first_part if you wanted to make it even more concise.

A similar comment applies to the other clauses

I don't think this is required, I just wanted to mention it

@alamb alamb added the next-major-release the PR has API changes and it waiting on the next major version label Jul 7, 2024
@alamb
Copy link
Contributor

alamb commented Jul 7, 2024

Marking as waiting on next major release. @Rachelint feel free to make changes as well; I am marking this as a draft until main opens for 53

@alamb
Copy link
Contributor

alamb commented Jul 16, 2024

Per discussion on #6050 I am going to merge this branch to the 53.0.0-dev feature branch, where we are collecting PRs with breaking changes before master opens

@alamb alamb marked this pull request as ready for review July 16, 2024 22:09
@alamb alamb changed the base branch from master to 53.0.0-dev July 16, 2024 22:09
@alamb alamb merged commit bb5f12b into apache:53.0.0-dev Jul 16, 2024
26 checks passed
@Rachelint
Copy link
Contributor Author

Rachelint commented Jul 17, 2024

Per discussion on #6050 I am going to merge this branch to the 53.0.0-dev feature branch, where we are collecting PRs with breaking changes before master opens

Oh, sorry... forget to push the commit about eliminating the repeated codes, I make a new pr to make it.
Make it in #6080

@alamb
Copy link
Contributor

alamb commented Jul 17, 2024

Thank you @Rachelint

alamb added a commit that referenced this pull request Jul 26, 2024
* bump `tonic` to 0.12 and `prost` to 0.13 for `arrow-flight` (#6041)

* bump `tonic` to 0.12 and `prost` to 0.13 for `arrow-flight`

Signed-off-by: Bugen Zhao <[email protected]>

* fix example tests

Signed-off-by: Bugen Zhao <[email protected]>

---------

Signed-off-by: Bugen Zhao <[email protected]>

* Remove `impl<T: AsRef<[u8]>> From<T> for Buffer`  that easily accidentally copies data (#6043)

* deprecate auto copy, ask explicit reference

* update comments

* make cargo doc happy

* Make display of interval types more pretty (#6006)

* improve dispaly for interval.

* update test in pretty, and fix display problem.

* tmp

* fix tests in arrow-cast.

* fix tests in pretty.

* fix style.

* Update snafu (#5930)

* Update Parquet thrift generated structures (#6045)

* update to latest thrift (as of 11 Jul 2024) from parquet-format

* pass None for optional size statistics

* escape HTML tags

* don't need to escape brackets in arrays

* Revert "Revert "Write Bloom filters between row groups instead of the end  (#…" (#5933)

This reverts commit 22e0b44.

* Revert "Update snafu (#5930)" (#6069)

This reverts commit 756b1fb.

* Update pyo3 requirement from 0.21.1 to 0.22.1 (fixed) (#6075)

* Update pyo3 requirement from 0.21.1 to 0.22.1

Updates the requirements on [pyo3](https://github.com/pyo3/pyo3) to permit the latest version.
- [Release notes](https://github.com/pyo3/pyo3/releases)
- [Changelog](https://github.com/PyO3/pyo3/blob/main/CHANGELOG.md)
- [Commits](PyO3/pyo3@v0.21.1...v0.22.1)

---
updated-dependencies:
- dependency-name: pyo3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* refactor: remove deprecated `FromPyArrow::from_pyarrow`

"GIL Refs" are being phased out.

* chore: update `pyo3` in integration tests

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* remove repeated codes to make the codes more concise. (#6080)

* Add `unencoded_byte_array_data_bytes` to `ParquetMetaData` (#6068)

* update to latest thrift (as of 11 Jul 2024) from parquet-format

* pass None for optional size statistics

* escape HTML tags

* don't need to escape brackets in arrays

* add support for unencoded_byte_array_data_bytes

* add comments

* change sig of ColumnMetrics::update_variable_length_bytes()

* rename ParquetOffsetIndex to OffsetSizeIndex

* rename some functions

* suggestion from review

Co-authored-by: Andrew Lamb <[email protected]>

* add Default trait to ColumnMetrics as suggested in review

* rename OffsetSizeIndex to OffsetIndexMetaData

---------

Co-authored-by: Andrew Lamb <[email protected]>

* Update pyo3 requirement from 0.21.1 to 0.22.2 (#6085)

Updates the requirements on [pyo3](https://github.com/pyo3/pyo3) to permit the latest version.
- [Release notes](https://github.com/pyo3/pyo3/releases)
- [Changelog](https://github.com/PyO3/pyo3/blob/v0.22.2/CHANGELOG.md)
- [Commits](PyO3/pyo3@v0.21.1...v0.22.2)

---
updated-dependencies:
- dependency-name: pyo3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Deprecate read_page_locations() and simplify offset index in `ParquetMetaData` (#6095)

* deprecate read_page_locations

* add to_thrift() to OffsetIndexMetaData

* Update parquet/src/column/writer/mod.rs

Co-authored-by: Ed Seidl <[email protected]>

---------

Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Bugen Zhao <[email protected]>
Co-authored-by: Xiangpeng Hao <[email protected]>
Co-authored-by: kamille <[email protected]>
Co-authored-by: Jesse <[email protected]>
Co-authored-by: Ed Seidl <[email protected]>
Co-authored-by: Marco Neumann <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
alamb added a commit that referenced this pull request Jul 26, 2024
…aData` (#6105)

* bump `tonic` to 0.12 and `prost` to 0.13 for `arrow-flight` (#6041)

* bump `tonic` to 0.12 and `prost` to 0.13 for `arrow-flight`

Signed-off-by: Bugen Zhao <[email protected]>

* fix example tests

Signed-off-by: Bugen Zhao <[email protected]>

---------

Signed-off-by: Bugen Zhao <[email protected]>

* Remove `impl<T: AsRef<[u8]>> From<T> for Buffer`  that easily accidentally copies data (#6043)

* deprecate auto copy, ask explicit reference

* update comments

* make cargo doc happy

* Make display of interval types more pretty (#6006)

* improve dispaly for interval.

* update test in pretty, and fix display problem.

* tmp

* fix tests in arrow-cast.

* fix tests in pretty.

* fix style.

* Update snafu (#5930)

* Update Parquet thrift generated structures (#6045)

* update to latest thrift (as of 11 Jul 2024) from parquet-format

* pass None for optional size statistics

* escape HTML tags

* don't need to escape brackets in arrays

* Revert "Revert "Write Bloom filters between row groups instead of the end  (#…" (#5933)

This reverts commit 22e0b44.

* Revert "Update snafu (#5930)" (#6069)

This reverts commit 756b1fb.

* Update pyo3 requirement from 0.21.1 to 0.22.1 (fixed) (#6075)

* Update pyo3 requirement from 0.21.1 to 0.22.1

Updates the requirements on [pyo3](https://github.com/pyo3/pyo3) to permit the latest version.
- [Release notes](https://github.com/pyo3/pyo3/releases)
- [Changelog](https://github.com/PyO3/pyo3/blob/main/CHANGELOG.md)
- [Commits](PyO3/pyo3@v0.21.1...v0.22.1)

---
updated-dependencies:
- dependency-name: pyo3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* refactor: remove deprecated `FromPyArrow::from_pyarrow`

"GIL Refs" are being phased out.

* chore: update `pyo3` in integration tests

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* remove repeated codes to make the codes more concise. (#6080)

* Add `unencoded_byte_array_data_bytes` to `ParquetMetaData` (#6068)

* update to latest thrift (as of 11 Jul 2024) from parquet-format

* pass None for optional size statistics

* escape HTML tags

* don't need to escape brackets in arrays

* add support for unencoded_byte_array_data_bytes

* add comments

* change sig of ColumnMetrics::update_variable_length_bytes()

* rename ParquetOffsetIndex to OffsetSizeIndex

* rename some functions

* suggestion from review

Co-authored-by: Andrew Lamb <[email protected]>

* add Default trait to ColumnMetrics as suggested in review

* rename OffsetSizeIndex to OffsetIndexMetaData

---------

Co-authored-by: Andrew Lamb <[email protected]>

* deprecate read_page_locations

* add level histograms to metadata

* add to_thrift() to OffsetIndexMetaData

* Update pyo3 requirement from 0.21.1 to 0.22.2 (#6085)

Updates the requirements on [pyo3](https://github.com/pyo3/pyo3) to permit the latest version.
- [Release notes](https://github.com/pyo3/pyo3/releases)
- [Changelog](https://github.com/PyO3/pyo3/blob/v0.22.2/CHANGELOG.md)
- [Commits](PyO3/pyo3@v0.21.1...v0.22.2)

---
updated-dependencies:
- dependency-name: pyo3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Deprecate read_page_locations() and simplify offset index in `ParquetMetaData` (#6095)

* deprecate read_page_locations

* add to_thrift() to OffsetIndexMetaData

* move valid test into ColumnIndexBuilder::append_histograms

* move update_histogram() inside ColumnMetrics

* Update parquet/src/column/writer/mod.rs

Co-authored-by: Ed Seidl <[email protected]>

* Implement LevelHistograms as a struct

* formatting

* fix error in docs

---------

Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Bugen Zhao <[email protected]>
Co-authored-by: Xiangpeng Hao <[email protected]>
Co-authored-by: kamille <[email protected]>
Co-authored-by: Jesse <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Marco Neumann <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
alamb added a commit that referenced this pull request Aug 2, 2024
…the footer (#6117)

* bump `tonic` to 0.12 and `prost` to 0.13 for `arrow-flight` (#6041)

* bump `tonic` to 0.12 and `prost` to 0.13 for `arrow-flight`

Signed-off-by: Bugen Zhao <[email protected]>

* fix example tests

Signed-off-by: Bugen Zhao <[email protected]>

---------

Signed-off-by: Bugen Zhao <[email protected]>

* Remove `impl<T: AsRef<[u8]>> From<T> for Buffer`  that easily accidentally copies data (#6043)

* deprecate auto copy, ask explicit reference

* update comments

* make cargo doc happy

* Make display of interval types more pretty (#6006)

* improve dispaly for interval.

* update test in pretty, and fix display problem.

* tmp

* fix tests in arrow-cast.

* fix tests in pretty.

* fix style.

* Update snafu (#5930)

* Update Parquet thrift generated structures (#6045)

* update to latest thrift (as of 11 Jul 2024) from parquet-format

* pass None for optional size statistics

* escape HTML tags

* don't need to escape brackets in arrays

* Revert "Revert "Write Bloom filters between row groups instead of the end  (#…" (#5933)

This reverts commit 22e0b44.

* Revert "Update snafu (#5930)" (#6069)

This reverts commit 756b1fb.

* Update pyo3 requirement from 0.21.1 to 0.22.1 (fixed) (#6075)

* Update pyo3 requirement from 0.21.1 to 0.22.1

Updates the requirements on [pyo3](https://github.com/pyo3/pyo3) to permit the latest version.
- [Release notes](https://github.com/pyo3/pyo3/releases)
- [Changelog](https://github.com/PyO3/pyo3/blob/main/CHANGELOG.md)
- [Commits](PyO3/pyo3@v0.21.1...v0.22.1)

---
updated-dependencies:
- dependency-name: pyo3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* refactor: remove deprecated `FromPyArrow::from_pyarrow`

"GIL Refs" are being phased out.

* chore: update `pyo3` in integration tests

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* remove repeated codes to make the codes more concise. (#6080)

* Add `unencoded_byte_array_data_bytes` to `ParquetMetaData` (#6068)

* update to latest thrift (as of 11 Jul 2024) from parquet-format

* pass None for optional size statistics

* escape HTML tags

* don't need to escape brackets in arrays

* add support for unencoded_byte_array_data_bytes

* add comments

* change sig of ColumnMetrics::update_variable_length_bytes()

* rename ParquetOffsetIndex to OffsetSizeIndex

* rename some functions

* suggestion from review

Co-authored-by: Andrew Lamb <[email protected]>

* add Default trait to ColumnMetrics as suggested in review

* rename OffsetSizeIndex to OffsetIndexMetaData

---------

Co-authored-by: Andrew Lamb <[email protected]>

* Update pyo3 requirement from 0.21.1 to 0.22.2 (#6085)

Updates the requirements on [pyo3](https://github.com/pyo3/pyo3) to permit the latest version.
- [Release notes](https://github.com/pyo3/pyo3/releases)
- [Changelog](https://github.com/PyO3/pyo3/blob/v0.22.2/CHANGELOG.md)
- [Commits](PyO3/pyo3@v0.21.1...v0.22.2)

---
updated-dependencies:
- dependency-name: pyo3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Deprecate read_page_locations() and simplify offset index in `ParquetMetaData` (#6095)

* deprecate read_page_locations

* add to_thrift() to OffsetIndexMetaData

* no longer write inline column metadata

* Update parquet/src/column/writer/mod.rs

Co-authored-by: Ed Seidl <[email protected]>

* suggestion from review

Co-authored-by: Andrew Lamb <[email protected]>

* add some more documentation

* remove write_metadata from PageWriter

---------

Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Bugen Zhao <[email protected]>
Co-authored-by: Xiangpeng Hao <[email protected]>
Co-authored-by: kamille <[email protected]>
Co-authored-by: Jesse <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Marco Neumann <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
alamb added a commit that referenced this pull request Aug 6, 2024
* bump `tonic` to 0.12 and `prost` to 0.13 for `arrow-flight` (#6041)

* bump `tonic` to 0.12 and `prost` to 0.13 for `arrow-flight`

Signed-off-by: Bugen Zhao <[email protected]>

* fix example tests

Signed-off-by: Bugen Zhao <[email protected]>

---------

Signed-off-by: Bugen Zhao <[email protected]>

* Remove `impl<T: AsRef<[u8]>> From<T> for Buffer`  that easily accidentally copies data (#6043)

* deprecate auto copy, ask explicit reference

* update comments

* make cargo doc happy

* Make display of interval types more pretty (#6006)

* improve dispaly for interval.

* update test in pretty, and fix display problem.

* tmp

* fix tests in arrow-cast.

* fix tests in pretty.

* fix style.

* Update snafu (#5930)

* Update Parquet thrift generated structures (#6045)

* update to latest thrift (as of 11 Jul 2024) from parquet-format

* pass None for optional size statistics

* escape HTML tags

* don't need to escape brackets in arrays

* Revert "Revert "Write Bloom filters between row groups instead of the end  (#…" (#5933)

This reverts commit 22e0b44.

* Revert "Update snafu (#5930)" (#6069)

This reverts commit 756b1fb.

* Update pyo3 requirement from 0.21.1 to 0.22.1 (fixed) (#6075)

* Update pyo3 requirement from 0.21.1 to 0.22.1

Updates the requirements on [pyo3](https://github.com/pyo3/pyo3) to permit the latest version.
- [Release notes](https://github.com/pyo3/pyo3/releases)
- [Changelog](https://github.com/PyO3/pyo3/blob/main/CHANGELOG.md)
- [Commits](PyO3/pyo3@v0.21.1...v0.22.1)

---
updated-dependencies:
- dependency-name: pyo3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* refactor: remove deprecated `FromPyArrow::from_pyarrow`

"GIL Refs" are being phased out.

* chore: update `pyo3` in integration tests

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* remove repeated codes to make the codes more concise. (#6080)

* Add `unencoded_byte_array_data_bytes` to `ParquetMetaData` (#6068)

* update to latest thrift (as of 11 Jul 2024) from parquet-format

* pass None for optional size statistics

* escape HTML tags

* don't need to escape brackets in arrays

* add support for unencoded_byte_array_data_bytes

* add comments

* change sig of ColumnMetrics::update_variable_length_bytes()

* rename ParquetOffsetIndex to OffsetSizeIndex

* rename some functions

* suggestion from review

Co-authored-by: Andrew Lamb <[email protected]>

* add Default trait to ColumnMetrics as suggested in review

* rename OffsetSizeIndex to OffsetIndexMetaData

---------

Co-authored-by: Andrew Lamb <[email protected]>

* Update pyo3 requirement from 0.21.1 to 0.22.2 (#6085)

Updates the requirements on [pyo3](https://github.com/pyo3/pyo3) to permit the latest version.
- [Release notes](https://github.com/pyo3/pyo3/releases)
- [Changelog](https://github.com/PyO3/pyo3/blob/v0.22.2/CHANGELOG.md)
- [Commits](PyO3/pyo3@v0.21.1...v0.22.2)

---
updated-dependencies:
- dependency-name: pyo3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Deprecate read_page_locations() and simplify offset index in `ParquetMetaData` (#6095)

* deprecate read_page_locations

* add to_thrift() to OffsetIndexMetaData

* Update parquet/src/column/writer/mod.rs

Co-authored-by: Ed Seidl <[email protected]>

* Upgrade protobuf definitions to flightsql 17.0 (#6133)

* Update FlightSql.proto to version 17.0

Adds new message CommandStatementIngest and removes `experimental` from
other messages.

* Regenerate flight sql protocol

This upgrades the file to version 17.0 of the protobuf definition.

* Add `ParquetMetadataWriter` allow ad-hoc encoding of `ParquetMetadata`

* fix loading in test by etseidl

Co-authored-by: Ed Seidl <[email protected]>

* add rough equivalence test

* one more check

* make clippy happy

* separate tests that require arrow into a separate module

* add histograms to to_thrift()

---------

Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Bugen Zhao <[email protected]>
Co-authored-by: Xiangpeng Hao <[email protected]>
Co-authored-by: kamille <[email protected]>
Co-authored-by: Jesse <[email protected]>
Co-authored-by: Ed Seidl <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Marco Neumann <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Douglas Anderson <[email protected]>
Co-authored-by: Ed Seidl <[email protected]>
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate next-major-release the PR has API changes and it waiting on the next major version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve the display of interval types
3 participants