-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add padding to base64 before decoding #769
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Introduce Integration Tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to 89b0afb in 32 seconds
More details
- Looked at
72
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. r2r/base/abstractions/document.py:46
- Draft comment:
The implementation ofdecode_base64
method correctly handles the addition of necessary padding to base64 strings before decoding, which should resolve the issue with strings having lengths not multiple of four. Good use of regex for cleaning the input and exception handling for robust error feedback. - Reason this comment was not posted:
Confidence changes required:0%
The PR introduces a method to add padding to base64 strings before decoding. This is generally a good practice as base64 encoding requires the string length to be a multiple of 4. The method first checks if the data is a string and encodes it to ASCII, then removes any non-base64 characters, and finally adds the necessary padding before attempting to decode it. The exception handling is appropriate, raising a ValueError if the decoding fails. Overall, the implementation seems correct and should resolve the issue described in the PR.
Workflow ID: wflow_vxBupECWIxy4sCpb
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes issue where strings with multiples of four characters caused "invalid start byte" errors during document ingestion.
Summary:
Added base64 padding and normalization in
Document
class to prevent decoding errors.Key points:
Document.decode_base64
method to handle base64 padding and normalization inr2r/base/abstractions/document.py
.Document.__init__
to usedecode_base64
for decoding base64 strings.Generated with ❤️ by ellipsis.dev