Skip to content

Commit

Permalink
rm page number exception for pdf parser (#424)
Browse files Browse the repository at this point in the history
### What problem does this PR solve?

#423 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
  • Loading branch information
KevinHuSh committed Apr 18, 2024
1 parent 453c291 commit 0499a3f
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions deepdoc/parser/pdf_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -830,6 +830,7 @@ def _line_tag(self, bx, ZM):
pn = [bx["page_number"]]
top = bx["top"] - self.page_cum_height[pn[0] - 1]
bott = bx["bottom"] - self.page_cum_height[pn[0] - 1]
if pn[-1] - 1 >= len(self.page_images): return ""
while bott * ZM > self.page_images[pn[-1] - 1].size[1]:
bott -= self.page_images[pn[-1] - 1].size[1] / ZM
pn.append(pn[-1] + 1)
Expand Down

0 comments on commit 0499a3f

Please sign in to comment.