Skip to content

Conversation

@westonpace
Copy link
Owner

No description provided.

@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@westonpace westonpace force-pushed the feature/ARROW-17966--dont-require-optional-args-2 branch from 676be24 to 4c02302 Compare October 20, 2022 21:46
@westonpace
Copy link
Owner Author

The arrow-dataset-scanner-test timeout is probably relevant to the branch as a whole, but not to this PR. I'll try and dig into it when I get a chance.

@westonpace
Copy link
Owner Author

@westonpace westonpace merged commit 73efef8 into feature/bamboo-demo Oct 20, 2022
westonpace added a commit that referenced this pull request Oct 25, 2022
…nts (#15)

* ARROW-17966: Updated to latest Substrait version.  Switched from optional enum args to proper options.  Added check for minimum Substrait version

* ARROW-17966: Add version to python substrait examples.  Fix version handling to check major version and not just minor

* ARROW-17966: Update cpp/src/arrow/engine/substrait/extension_set.cc

Co-authored-by: Benjamin Kietzman <[email protected]>

* ARROW-17966: Update cpp/src/arrow/engine/substrait/extension_set.cc

Co-authored-by: Benjamin Kietzman <[email protected]>

* ARROW-17966: Update cpp/src/arrow/engine/substrait/extension_set.cc

Co-authored-by: Benjamin Kietzman <[email protected]>

* ARROW-17966: Update cpp/src/arrow/engine/substrait/extension_set.cc

Co-authored-by: Benjamin Kietzman <[email protected]>

* ARROW-17966: Display the available choices when a user enters a valid substrait option that Acero doesn't support

* ARROW-17966: Simplify parsing boilerplate per review comments

* ARROW-17966: Gracefully error if the user does not supply any preferences for an option

* ARROW-17966: Prefer range loops where possible

* ARROW-17966: Rebase cleanup

* ARROW-17966: Minor fix to failing unit tests: remove enum="unspecified"

* ARROW-17966: Minor lint fix

* ARROW-17966: Cmake format

Co-authored-by: Benjamin Kietzman <[email protected]>
westonpace added a commit that referenced this pull request Apr 1, 2025
…ache#41152)

### Rationale for this change

An error is received installing R duckdb:

```
#15 18.13 > remotes::install_github('duckdb/duckdb-r', build = FALSE)
#15 18.27 Error: Failed to install 'unknown package' from **GitHub:**
#15 18.27   Line starting 'Roxyg ...' is malformed!
```

Some searching seems to suggest that this is because R cannot process UTF-8 characters in DESCRIPTION files if the `LANG` is set to `C`.

### What changes are included in this PR?

The `LANG` is set to `C.UTF-8` in the dockerfile for this CI job

### Are these changes tested?

The change only affects a test

### Are there any user-facing changes?

No
* GitHub Issue: apache#41145

Authored-by: Weston Pace <[email protected]>
Signed-off-by: Raúl Cumplido <[email protected]>
westonpace pushed a commit that referenced this pull request Apr 1, 2025
…n timezone (apache#45051)

### Rationale for this change

If the timezone database is present on the system, but does not contain a timezone referenced in a ORC file, the ORC reader will crash with an uncaught C++ exception.

This can happen for example on Ubuntu 24.04 where some timezone aliases have been removed from the main `tzdata` package to a `tzdata-legacy` package. If `tzdata-legacy` is not installed, trying to read a ORC file that references e.g. the "US/Pacific" timezone would crash.

Here is a backtrace excerpt:
```
#12 0x00007f1a3ce23a55 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#13 0x00007f1a3ce39391 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#14 0x00007f1a3f4accc4 in orc::loadTZDB(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#15 0x00007f1a3f4ad392 in std::call_once<orc::LazyTimezone::getImpl() const::{lambda()#1}>(std::once_flag&, orc::LazyTimezone::getImpl() const::{lambda()#1}&&)::{lambda()#2}::_FUN() () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#16 0x00007f1a4298bec3 in __pthread_once_slow (once_control=0xa5ca7c8, init_routine=0x7f1a3ce69420 <__once_proxy>) at ./nptl/pthread_once.c:116
#17 0x00007f1a3f4a9ad0 in orc::LazyTimezone::getEpoch() const ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#18 0x00007f1a3f4e76b1 in orc::TimestampColumnReader::TimestampColumnReader(orc::Type const&, orc::StripeStreams&, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#19 0x00007f1a3f4e84ad in orc::buildReader(orc::Type const&, orc::StripeStreams&, bool, bool, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#20 0x00007f1a3f4e8dd7 in orc::StructColumnReader::StructColumnReader(orc::Type const&, orc::StripeStreams&, bool, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#21 0x00007f1a3f4e8532 in orc::buildReader(orc::Type const&, orc::StripeStreams&, bool, bool, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#22 0x00007f1a3f4925e9 in orc::RowReaderImpl::startNextStripe() ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#23 0x00007f1a3f492c9d in orc::RowReaderImpl::next(orc::ColumnVectorBatch&) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#24 0x00007f1a3e6b251f in arrow::adapters::orc::ORCFileReader::Impl::ReadBatch(orc::RowReaderOptions const&, std::shared_ptr<arrow::Schema> const&, long) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
```

### What changes are included in this PR?

Catch C++ exceptions when iterating ORC batches instead of letting them slip through.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: apache#40633

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants