[Bug]: Mitigate impact of CVE-2023-47248 for Apache Beam #29392
Labels
bug
done & done
Issue has been reviewed after it was closed for verification, followups, etc.
P1
python
Milestone
What happened?
There is a recently disclosed vulnerability affecting PyArrow dependency: https://nvd.nist.gov/vuln/detail/CVE-2023-47248 , which might be a matter of concern for some Beam users who read parquet files from untrusted sources.
To address this, we have applied the mitigation provided by https://pypi.org/project/pyarrow-hotfix/ in Beam 2.52.0, and will upgrade Beam to support
pyarrow==14
in a future release.Users of Beam version 2.51.0 or below who use pyarrow in their pipelines and are concerned about CVE-2023-47248, can apply the following workround:
pyarrow-hotfix
package on the workersimport pyarrow_hotfix
in the pipeline code: if the pipeline is composed only of one module, add--save_main_session
pipeline option. If the pipeline is comprised of multiple files and uses--setup_file
, add the import in the pipeline package files, for example in the__init__.py
file.Issue Priority
Priority: 1 (major)
Issue Components
The text was updated successfully, but these errors were encountered: