You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have got the same problem . And I got logs like this:
So I read the source ,and got this code :
sections = []
if re.search(r"\.docx?$", filename, re.IGNORECASE):
callback(0.1, "Start to parse.")
for txt in Docx()(filename, binary):
sections.append(txt)
callback(0.8, "Finish parsing.")
elif re.search(r"\.pdf$", filename, re.IGNORECASE):
pdf_parser = Pdf() if kwargs.get(
"parser_config", {}).get(
"layout_recognize", True) else PlainParser()
for txt, poss in pdf_parser(filename if not binary else binary,
from_page=from_page, to_page=to_page, callback=callback)[0]:
sections.append(txt + poss)
elif re.search(r"\.txt$", filename, re.IGNORECASE):
callback(0.1, "Start to parse.")
txt = ""
if binary:
txt = binary.decode("utf-8")
else:
with open(filename, "r") as f:
while True:
l = f.readline()
if not l:
break
txt += l
sections = txt.split("\n")
sections = [l for l in sections if l]
callback(0.8, "Finish parsing.")
else:
raise NotImplementedError(
"file type not supported yet(docx, pdf, txt supported)")
In fact doc is not supported . But the judgement allows doc type file .
Is there an existing issue for the same bug?
Branch name
main
Commit ID
fsjfj23ir23rpwfkwfke
Other environment information
Actual behavior
after ragflow start normally, upload doc file. it failed and report that "File is not a zip file"
Expected behavior
No response
Steps to reproduce
Additional information
No response
The text was updated successfully, but these errors were encountered: