Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: pdf law parsing error: IndexError: list index out of range #423

Closed
1 task done
nexussdad opened this issue Apr 18, 2024 · 1 comment
Closed
1 task done
Labels
bug Something isn't working

Comments

@nexussdad
Copy link

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

453c291

Other environment information

OS type: centos 7.6
method: docker compose

Actual behavior

parsing pdfs using law method, most succeeded but one failed.

check logs:

Traceback (most recent call last):
  File "/ragflow/rag/svr/task_executor.py", line 130, in build
    cks = chunker.chunk(row["name"], binary=binary, from_page=row["from_page"],
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ragflow/rag/app/laws.py", line 106, in chunk
    for txt, poss in pdf_parser(filename if not binary else binary,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ragflow/rag/app/laws.py", line 80, in __call__
    return [(b["text"], self._line_tag(b, zoomin))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ragflow/rag/app/laws.py", line 80, in <listcomp>
    return [(b["text"], self._line_tag(b, zoomin))
                        ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ragflow/deepdoc/parser/pdf_parser.py", line 831, in _line_tag
    while bott * ZM > self.page_images[pn[-1] - 1].size[1]:
                      ~~~~~~~~~~~~~~~~^^^^^^^^^^^^
IndexError: list index out of range
WeChatWorkScreenshot_6376b078-0847-43f2-95f4-9160730a2c0a

企业微信截图_1713409756480

Expected behavior

No response

Steps to reproduce

parsing pdfs using law method.

Additional information

No response

@nexussdad nexussdad added the bug Something isn't working label Apr 18, 2024
KevinHuSh added a commit that referenced this issue Apr 18, 2024
### What problem does this PR solve?

#423 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
@KevinHuSh
Copy link
Collaborator

fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants