Skip to content

Advanced file options#14

Draft
corwinjoy wants to merge 90 commits intomainfrom
advanced_file_options
Draft

Advanced file options#14
corwinjoy wants to merge 90 commits intomainfrom
advanced_file_options

Conversation

@corwinjoy
Copy link
Owner

Description

Re-review of PR vs. main to add encryption support (+ other features) to delta-rs

corwinjoy and others added 30 commits August 13, 2025 15:27
Signed-off-by: Corwin Joy <corwin.joy@gmail.com>
Signed-off-by: Corwin Joy <corwin.joy@gmail.com>
Signed-off-by: Corwin Joy <corwin.joy@gmail.com>
Signed-off-by: Corwin Joy <corwin.joy@gmail.com>
… constructor

Signed-off-by: Corwin Joy <corwin.joy@gmail.com>
…hanges through.

Signed-off-by: Corwin Joy <corwin.joy@gmail.com>
Signed-off-by: Corwin Joy <corwin.joy@gmail.com>
…o examples.

Signed-off-by: Corwin Joy <corwin.joy@gmail.com>
…sionContext

Signed-off-by: Corwin Joy <corwin.joy@gmail.com>
Signed-off-by: Corwin Joy <corwin.joy@gmail.com>
Signed-off-by: Corwin Joy <corwin.joy@gmail.com>
Signed-off-by: Corwin Joy <corwin.joy@gmail.com>
# Conflicts:
#	crates/core/src/operations/mod.rs
#	crates/core/src/operations/update.rs
adamreeve and others added 26 commits September 2, 2025 15:09
# Conflicts:
#	crates/core/src/delta_datafusion/mod.rs
#	crates/core/src/operations/mod.rs
#	crates/deltalake/Cargo.toml
Pass FileFormatOptions through to DataFusion scan for update and delete
Make build work without DataFusion feature
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds encryption support and other advanced file options to delta-rs by implementing a comprehensive file format options framework. The changes enable users to configure encryption settings, customize writer properties, and apply file-level formatting options when reading and writing Delta tables.

  • Introduces a FileFormatOptions trait and related infrastructure to handle file-specific configurations
  • Adds support for both simple property-based encryption and KMS-based encryption through new factory patterns
  • Updates all operation builders to accept and propagate file format options throughout the write/read pipeline

Reviewed Changes

Copilot reviewed 39 out of 39 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
python/src/lib.rs Updates Python bindings to pass None for new file format options parameter
python/src/merge.rs Adds None parameter to MergeBuilder constructor for file format options
crates/deltalake/examples/basic_operations_encryption.rs New comprehensive example demonstrating encryption capabilities
crates/deltalake/Cargo.toml Adds dev dependencies for encryption testing
crates/core/src/table/file_format_options.rs New core module implementing file format options framework
crates/core/src/table/builder.rs Adds file format options support to table builder
crates/core/src/table/mod.rs Updates DeltaTable to include file format options field
crates/core/src/operations/*.rs Updates all operation builders to accept and use file format options
crates/core/src/writer/*.rs Refactors writers to use WriterPropertiesFactory pattern
crates/core/src/delta_datafusion/*.rs Updates DataFusion integration for encryption support
crates/core/Cargo.toml Adds encryption dependencies
Cargo.toml Updates to use DataFusion fork with encryption support

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

) -> PyResult<String> {
let (table, metrics) = py.allow_threads(|| {
let mut cmd = UpdateBuilder::new(self.log_store()?, self.cloned_state()?)
let mut cmd = UpdateBuilder::new(self.log_store()?, self.cloned_state()?, None)
Copy link

Copilot AI Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The consistent use of None for file_format_options in Python bindings suggests this parameter should be optional or have a builder pattern. Consider adding a with_file_format_options method to the Python API to expose this functionality to Python users.

Copilot uses AI. Check for mistakes.
file_path: &Path,
_file_schema: &Arc<ArrowSchema>,
) -> DeltaResult<WriterProperties> {
info!("Called create_writer_properties for file: {file_path}");
Copy link

Copilot AI Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The info! log statement in create_writer_properties may be too verbose for production use since this method is called for every file written. Consider using debug! or removing this log statement for production builds.

Suggested change
info!("Called create_writer_properties for file: {file_path}");
debug!("Called create_writer_properties for file: {file_path}");

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

@adamreeve adamreeve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 This looks good to me. Some of the comments from #9 are still relevant as potential improvements, but I think we should make an upstream PR to get feedback first before polishing this too much, once delta-rs has updated to the new DataFusion release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants