You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
Before I begin, thank you for creating such a great package!
I'm trying to load bunch of pdfs from google drive into google Colab and extract their tables.
when I run for a single pdf (thus using load_from_file functionality) everything works great but when I give load_pdfs_images function a directory path I get the following error:
In surya/input/load.py , load_from_folder (as showing in above error) calls load_pdf function in the same script to assign a value to text_line variable.
Inside load_pdf function, following codes generate None of text_lines variable:
if load_text_lines:
from surya.input.pdflines import get_page_text_lines # Putting import here because pypdfium2 causes warnings if its not the top import
text_lines = get_page_text_lines(
pdf_path,
page_indices,
[i.size for i in images]
)
It seems that get_page_text_lines returns None when it reaches empty PDF pages which raises error for the whole process rather than skipping the empty page
The text was updated successfully, but these errors were encountered:
Hi,
Before I begin, thank you for creating such a great package!
I'm trying to load bunch of pdfs from google drive into google Colab and extract their tables.
when I run for a single pdf (thus using
load_from_file
functionality) everything works great but when I giveload_pdfs_images
function a directory path I get the following error:In
surya/input/load.py
,load_from_folder
(as showing in above error) callsload_pdf
function in the same script to assign a value totext_line
variable.Inside
load_pdf
function, following codes generate None oftext_lines
variable:It seems that
get_page_text_lines
returns None when it reaches empty PDF pages which raises error for the whole process rather than skipping the empty pageThe text was updated successfully, but these errors were encountered: