Skip to content

Conversation

@argmarco-tkd
Copy link
Collaborator

Rationale for this change

In this PR, I'm creating the machinery (utils classes) for

  • Loading an implementation of the DataBatchProtectionAgent from a shared library (e.g. *.so file)
  • Wrapping that implementation in a Decorator which will ensure the shared library handle will be properly disposed of.

These changes are described in this doc

What changes are included in this PR?

  • Mostly new files + their respective tests.
  • A relatively important, but hopefully not controversial change is the addition of the "CloseDynamicLibrary" function to Arrow's io_util.cc - the same file which already contained functionality for dealing with dynamic libraries.

Are these changes tested?

  • Yes - unit tests have been created.

Additional Notes

  • The interface for DataBatchProtectionAgentInterface is still being worked on. I have created a temporary interface (parquet/encryption/external/DBPAInterface.h), which hopefully is close-enough to the final version. Once the interface is finalized, we will incorporate it here, and send the final PR.
  • This PR is of the changes between two branches: dev_dll_work (feature branch) and dev-miniapp
  • Github is being special and telling me that these files have diffs, when they are identical. Please ignore: Dockerfile.miniApp, build-clean.sh, build.sh, build-tests.sh
  • Please ignore the changes to internal_file_encryptor.cc - this diff will dissappear in the final (i.e. non-draft) PR.

argmarco-tkd and others added 29 commits July 8, 2025 10:47
…yptor related stuff. Begin moving things around.
@argmarco-tkd argmarco-tkd changed the title [Draft Review] Machinery for loading of dynamic libraries. [Draft] Machinery for loading of dynamic libraries. Aug 7, 2025
@github-actions
Copy link

github-actions bot commented Aug 7, 2025

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

Copy link
Collaborator

@avalerio-tkd avalerio-tkd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. The PR quite large. I gave it an initial look and we can discuss further offline for my comments or to dig dipper.

)

# DBPATestAgent configuration
target_link_libraries(DBPATestAgent PUBLIC
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these "test" targets the unittests for the loading module or a temporary testing module we're adding as an intermediate step?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a test target for unit testing. The library build here will never be 'released' (this is the reason that the library does not have an 'install' target)

const KmsConnectionConfig& kms_connection_config,
const DecryptionConfiguration& decryption_config, const std::string& file_path,
const std::shared_ptr<::arrow::fs::FileSystem>& file_system) {
std::cout << "Getting file decryption properties!!" << std::endl;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add TODO note to remove printouts or move to logging.debug. Same elsewhere.

Copy link
Collaborator Author

@argmarco-tkd argmarco-tkd Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file should not be here, but will do where pertinent.


int32_t ExternalDecryptorImpl::Decrypt(span<const uint8_t> ciphertext, span<const uint8_t> key,
span<const uint8_t> aad, span<uint8_t> plaintext) {
std::cout << "ExternalDecryptorImpl::Decrypt called" << std::endl;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I'm thinking: Are these your comments or some printouts you inherited when rebasing your client. If these are not yours, may be you'd like to rebase again so the diff are only your changes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. There was a bit of github trickery that went into this PR. There are a lot more diffs than I would have expected - not because of file changes, but because of my branching setup + added to this: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-comparing-branches-in-pull-requests

std::cout << "Here I would call the external decryption service. Hold for params." << std::endl;
}


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: extra whitespaces. Same elsewhere.

@@ -0,0 +1,47 @@
// Licensed to the Apache Software Foundation (ASF) under one
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove the license note for now and replace with a TODO to find the right License note later. We can also put something in the backlog regarding this if you'd like.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Will do. Both.

@@ -0,0 +1,95 @@
// Licensed to the Apache Software Foundation (ASF) under one
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here for the license.


// Interface for loadable encryptors that can be dynamically loaded from shared libraries
// This extends the base EncryptorInterface with initialization capabilities
class PARQUET_EXPORT LoadableEncryptorInterface : public parquet::encryption::EncryptorInterface {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the EncryptorInterface from the mini-app initial refactor, or is this a new one? Should this be added when the refactor is done instead of here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be here.

encryption_properties->app_context(), false);
} else {
//TODO: move this elsewhere.
bool use_dll_encryptor = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit hardcoded at the moment it seems. It's ok if it is, let's just add a short comment on what should look like when integrated with dev_phase2 branch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do.

@argmarco-tkd
Copy link
Collaborator Author

argmarco-tkd commented Aug 8, 2025

Thanks for the comments, @avalerio-tkd. I'm closing this PR, as it contains more files than it should and the PR is a bit messy, I'll admit. (I'm blaming the tooling this time). Will address your comments and send out a new, cleaner PR.

This is the new PR: #37

argmarco-tkd added a commit that referenced this pull request Aug 8, 2025
@argmarco-tkd argmarco-tkd changed the title [Draft] Machinery for loading of dynamic libraries. [Abandoned-Replaced] Machinery for loading of dynamic libraries. Sep 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants