-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pdf2parquet transform #416
Commits on Jul 25, 2024
-
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a0d485e - Browse repository at this point
Copy the full SHA a0d485eView commit details -
add docling for converting pdf documents to md
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f6bed21 - Browse repository at this point
Copy the full SHA f6bed21View commit details -
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 81ee45e - Browse repository at this point
Copy the full SHA 81ee45eView commit details -
fix Makefile, improve dockerfile and address feedback
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f92010d - Browse repository at this point
Copy the full SHA f92010dView commit details -
add ray, simplify CLI params parsing, use Pdf2Md as name
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 58aa5d8 - Browse repository at this point
Copy the full SHA 58aa5d8View commit details -
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9031fdf - Browse repository at this point
Copy the full SHA 9031fdfView commit details -
add /src in root dir and avoid double COPY
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d5dcc08 - Browse repository at this point
Copy the full SHA d5dcc08View commit details -
add transform_class in TransformConfiguration
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e938f88 - Browse repository at this point
Copy the full SHA e938f88View commit details -
transform_class arg for ray, simplify cli args, remove download model…
…s (done automatically), cleanup prints Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for bca0aaa - Browse repository at this point
Copy the full SHA bca0aaaView commit details -
Configuration menu - View commit details
-
Copy full SHA for b378ad1 - Browse repository at this point
Copy the full SHA b378ad1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 32a0f40 - Browse repository at this point
Copy the full SHA 32a0f40View commit details -
fix test_pdf2md.py input/output
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 134c48c - Browse repository at this point
Copy the full SHA 134c48cView commit details -
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6a5a466 - Browse repository at this point
Copy the full SHA 6a5a466View commit details -
ignore verbose contents since we check its content hash
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 526484a - Browse repository at this point
Copy the full SHA 526484aView commit details -
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a6a0950 - Browse repository at this point
Copy the full SHA a6a0950View commit details -
update expected results to match container versions
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2a386e9 - Browse repository at this point
Copy the full SHA 2a386e9View commit details -
temporary add another CI test to validate pdf2parquet independently
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4d65403 - Browse repository at this point
Copy the full SHA 4d65403View commit details -
Configuration menu - View commit details
-
Copy full SHA for 52df94f - Browse repository at this point
Copy the full SHA 52df94fView commit details -
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9efae7d - Browse repository at this point
Copy the full SHA 9efae7dView commit details -
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4531cc3 - Browse repository at this point
Copy the full SHA 4531cc3View commit details -
add free up space for test-universal
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7e1624a - Browse repository at this point
Copy the full SHA 7e1624aView commit details -
Configuration menu - View commit details
-
Copy full SHA for ec2b469 - Browse repository at this point
Copy the full SHA ec2b469View commit details -
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 172760d - Browse repository at this point
Copy the full SHA 172760dView commit details -
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6d27e73 - Browse repository at this point
Copy the full SHA 6d27e73View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7c5efd7 - Browse repository at this point
Copy the full SHA 7c5efd7View commit details -
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0e12d6e - Browse repository at this point
Copy the full SHA 0e12d6eView commit details -
Configuration menu - View commit details
-
Copy full SHA for e31318b - Browse repository at this point
Copy the full SHA e31318bView commit details -
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b07f46e - Browse repository at this point
Copy the full SHA b07f46eView commit details -
remove blacklisting of contents
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 916eb1b - Browse repository at this point
Copy the full SHA 916eb1bView commit details -
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8c57e1a - Browse repository at this point
Copy the full SHA 8c57e1aView commit details -
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for c07d01e - Browse repository at this point
Copy the full SHA c07d01eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 340a0cc - Browse repository at this point
Copy the full SHA 340a0ccView commit details -
add more tests for do_ocr and pdf2parquet_do_table_structure
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2fdbf83 - Browse repository at this point
Copy the full SHA 2fdbf83View commit details -
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4d9a985 - Browse repository at this point
Copy the full SHA 4d9a985View commit details -
extend timeout for the language images
Signed-off-by: Michele Dolfi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8e06993 - Browse repository at this point
Copy the full SHA 8e06993View commit details