Add padding to base64 before decoding #769

NolanTrem · 2024-07-26T06:33:45Z

Fixes issue where strings with multiples of four characters caused "invalid start byte" errors during document ingestion.

🚀	This description was created by Ellipsis for commit `89b0afb`

Summary:

Added base64 padding and normalization in Document class to prevent decoding errors.

Key points:

Added Document.decode_base64 method to handle base64 padding and normalization in r2r/base/abstractions/document.py.
Updated Document.__init__ to use decode_base64 for decoding base64 strings.
Ensures proper handling of base64 strings with multiples of four characters, preventing "invalid start byte" errors.

Generated with ❤️ by ellipsis.dev

Introduce Integration Tests

ellipsis-dev

👍 Looks good to me! Reviewed everything up to 89b0afb in 32 seconds

More details

Looked at 72 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. r2r/base/abstractions/document.py:46

Draft comment:
The implementation of decode_base64 method correctly handles the addition of necessary padding to base64 strings before decoding, which should resolve the issue with strings having lengths not multiple of four. Good use of regex for cleaning the input and exception handling for robust error feedback.
Reason this comment was not posted:
Confidence changes required: 0%
The PR introduces a method to add padding to base64 strings before decoding. This is generally a good practice as base64 encoding requires the string length to be a multiple of 4. The method first checks if the data is a string and encodes it to ASCII, then removes any non-base64 characters, and finally adds the necessary padding before attempting to decode it. The exception handling is appropriate, raising a ValueError if the decoding fails. Overall, the implementation seems correct and should resolve the issue described in the PR.

Workflow ID: wflow_vxBupECWIxy4sCpb

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

NolanTrem added 30 commits July 25, 2024 10:04

Add integration test workflow

40705a4

Edit tests

f8fd07b

Run on all PRs

7562b10

Check status

c290f57

Print logs

35d906e

Env vars

5a790a7

Iterate

e9adfbd

Add health check

a549eda

OAI

41cdbc2

again

1d3a6cb

more mods

b6cb1c9

test

eae72d1

rebase to main

dc97ec2

up

e869a9e

Remove wait

2d7b1b5

Install R2R CLI

78908c7

use venv

942067f

pip instead of pip3

ca06ad0

remove pip update

2ba71e1

Try to get pip to work

485f501

Add remaining endpoints

7d10949

echo not ehco

2456125

echo not ehco

d75e347

Remove analytics test

c679ba3

Add Windows Integration tests

1bed81c

Make sure names match

ae7d2b7

Local LLM Workflow

72ab9d2

Remove it flag

5f2f058

Rename parallel network

3653594

Increase disk size

a5e7ae8

NolanTrem added 9 commits July 25, 2024 16:48

Move to python over python3

0c90d22

Explicity set OS

85ee589

Clean up venvs

07c7315

Clean up venvs

3cba905

Fix memory

e49042a

More clean up

e63d0fe

Add in CICD for Linux [skip ci]

6a0b506

Merge pull request #761 from SciPhi-AI/Nolan/IntegrationTests

c11f9e1

Introduce Integration Tests

Add padding to base64 before decoding

89b0afb

ellipsis-dev bot reviewed Jul 26, 2024

View reviewed changes

emrgnt-cmplxty changed the base branch from main to dev July 26, 2024 14:15

emrgnt-cmplxty merged commit b3657ee into dev Jul 26, 2024
3 of 4 checks passed

NolanTrem deleted the Nolan/FixBase64Decoding branch July 26, 2024 23:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add padding to base64 before decoding #769

Add padding to base64 before decoding #769

NolanTrem commented Jul 26, 2024 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

Add padding to base64 before decoding #769

Add padding to base64 before decoding #769

Conversation

NolanTrem commented Jul 26, 2024 • edited by ellipsis-dev bot Loading

Summary:

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

NolanTrem commented Jul 26, 2024 •

edited by ellipsis-dev bot

Loading