Skip to content

Commit

Permalink
fix: bumped the glm version and adjusted the tests (#83)
Browse files Browse the repository at this point in the history
* bumped the glm version and adjusted the tests

Signed-off-by: Peter Staar <[email protected]>

* updated the poetry lock

Signed-off-by: Peter Staar <[email protected]>

* fix hooks

Signed-off-by: Michele Dolfi <[email protected]>

* fixed the tests

Signed-off-by: Peter Staar <[email protected]>

* reformatted the code

Signed-off-by: Peter Staar <[email protected]>

* added the tests for tables

Signed-off-by: Peter Staar <[email protected]>

---------

Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Co-authored-by: Michele Dolfi <[email protected]>
  • Loading branch information
PeterStaar-IBM and dolfim-ibm authored Sep 18, 2024
1 parent 8242bce commit 442443a
Show file tree
Hide file tree
Showing 11 changed files with 406 additions and 361 deletions.
736 changes: 385 additions & 351 deletions poetry.lock

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ python = "^3.10"
pydantic = "^2.0.0"
docling-core = "^1.3.0"
docling-ibm-models = "^1.2.0"
deepsearch-glm = "^0.21.0"
deepsearch-glm = "^0.21.1"
filetype = "^1.2.0"
pypdfium2 = "^4.30.0"
pydantic-settings = "^2.3.0"
Expand Down
2 changes: 1 addition & 1 deletion tests/data/2203.01017v2.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/data/2206.01062.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/data/2305.03393v1-pg9.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/data/2305.03393v1-pg9.pages.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/data/2305.03393v1.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/data/redp5110.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/data/redp5110.pages.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/data/redp5695.json

Large diffs are not rendered by default.

13 changes: 12 additions & 1 deletion tests/verify_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,10 +96,17 @@ def verify_tables(doc_pred: DsDocument, doc_true: DsDocument):
for i, row in enumerate(true_item.data):
for j, col in enumerate(true_item.data[i]):

# print("true: ", true_item.data[i][j])
# print("pred: ", pred_item.data[i][j])

assert (
true_item.data[i][j].text == pred_item.data[i][j].text
), "table-cell does not have the same text"

assert (
true_item.data[i][j].obj_type == pred_item.data[i][j].obj_type
), "table-cell does not have the same type"

return True


Expand Down Expand Up @@ -156,9 +163,13 @@ def verify_conversion_result(
), f"Mismatch in PDF cell prediction for {input_path}"

# assert verify_output(
# doc_pred, doc_true
# doc_pred, doc_true
# ), f"Mismatch in JSON prediction for {input_path}"

assert verify_tables(
doc_pred, doc_true
), f"verify_tables(doc_pred, doc_true) mismatch for {input_path}"

assert verify_md(
doc_pred_md, doc_true_md
), f"Mismatch in Markdown prediction for {input_path}"

0 comments on commit 442443a

Please sign in to comment.