📦 v2023-12-07 #400

kba · 2023-12-06T11:38:07Z

Updates core to v2.59.1 which includes the workflow endpoint, additional features for chunking and additional output formats for ocrd workspace list-page; fixing the file naming in the bagger; and the filtering by file group for clone, zip bag etc.

@stweil improved the page2img script in format-converters significantly.

@mikegerber did some house cleaning work on dinglehopper and ocrd_calamari

ocrd_pagetopdf should now work properly on MacOS and supports the METS Server.

workflow-configuration contains additional XSLT to detect ID clashes and add missing confidence values, supports pretty printing XML in the CLIs and supports the METS Server.

tesseract is also updated to the latest state in master.

~~I will merge this tomorrow, let me know if I missed something.~~ I forgot to click on "Create pull request". Will merge ASAP once the CI is fixed.

stweil · 2023-12-06T12:36:38Z

It looks like CI has problems with ocr-fileformat, maybe because of stricter tests.

stweil · 2023-12-06T12:45:50Z

Yes, the problem is in textract2page. cc @rue-a.

textract2page$ pip install .
Looking in indexes: https://pypi.org/simple, https://code.bib.uni-mannheim.de/api/packages/stweil/pypi/simple/
Processing /UB-Mannheim/ocr-fileformat/vendor/textract2page
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [212 lines of output]
      /tmp/pip-build-env-rrv2e39h/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:75: _MissingDynamic: `description` defined outside of `pyproject.toml` is ignored.
      !!
      
              ********************************************************************************
              The following seems to be defined outside of `pyproject.toml`:
      
              `description = 'Convert AWS Textract JSON to PRImA PAGE XML'`
      
              According to the spec (see the link below), however, setuptools CANNOT
              consider this value unless `description` is listed as `dynamic`.
      
              https://packaging.python.org/en/latest/specifications/declaring-project-metadata/
      
              To prevent this problem, you can list `description` under `dynamic` or alternatively
              remove the `[project]` table from your file and rely entirely on other means of
              configuration.
              ********************************************************************************
      
      !!
        _handle_missing_dynamic(dist, project_table)
      /tmp/pip-build-env-rrv2e39h/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:75: _MissingDynamic: `readme` defined outside of `pyproject.toml` is ignored.
      !!
[...]

kba · 2023-12-06T13:00:02Z

Yes, the problem is in textract2page. cc @rue-a.

textract2page$ pip install .
Looking in indexes: https://pypi.org/simple, https://code.bib.uni-mannheim.de/api/packages/stweil/pypi/simple/
Processing /UB-Mannheim/ocr-fileformat/vendor/textract2page
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [212 lines of output]
      /tmp/pip-build-env-rrv2e39h/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:75: _MissingDynamic: `description` defined outside of `pyproject.toml` is ignored.
      !!
      
              ********************************************************************************
              The following seems to be defined outside of `pyproject.toml`:
      
              `description = 'Convert AWS Textract JSON to PRImA PAGE XML'`
      
              According to the spec (see the link below), however, setuptools CANNOT
              consider this value unless `description` is listed as `dynamic`.
      
              https://packaging.python.org/en/latest/specifications/declaring-project-metadata/
      
              To prevent this problem, you can list `description` under `dynamic` or alternatively
              remove the `[project]` table from your file and rely entirely on other means of
              configuration.
              ********************************************************************************
      
      !!
        _handle_missing_dynamic(dist, project_table)
      /tmp/pip-build-env-rrv2e39h/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:75: _MissingDynamic: `readme` defined outside of `pyproject.toml` is ignored.
      !!
[...]

Yeah, and I can reproduce locally, will preparare a PR after tech call

stweil · 2023-12-06T13:09:03Z

See slub/textract2page#13 for a hackish fix.

kba · 2023-12-06T16:54:03Z

See slub/textract2page#13 for a hackish fix.

Now updating ocrd_fileformat to include UB-Mannheim/ocr-fileformat#171 which in turn includes slub/textract2page#13 to test the CI.

📦 v2023-12-06

4590edf

stweil mentioned this pull request Dec 6, 2023

Fix broken pip install slub/textract2page#13

Closed

update ocrd_fileformat to include OCR-D/ocrd_fileformat#51

5282cd0

kba changed the title ~~📦 v2023-12-06~~ 📦 v2023-12-07 Dec 7, 2023

kba merged commit 1126724 into master Dec 7, 2023
1 check passed

kba deleted the update-2023-12-06 branch December 7, 2023 11:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📦 v2023-12-07 #400

📦 v2023-12-07 #400

kba commented Dec 6, 2023

stweil commented Dec 6, 2023

stweil commented Dec 6, 2023 •

edited

Loading

kba commented Dec 6, 2023

stweil commented Dec 6, 2023

kba commented Dec 6, 2023

📦 v2023-12-07 #400

📦 v2023-12-07 #400

Conversation

kba commented Dec 6, 2023

stweil commented Dec 6, 2023

stweil commented Dec 6, 2023 • edited Loading

kba commented Dec 6, 2023

stweil commented Dec 6, 2023

kba commented Dec 6, 2023

stweil commented Dec 6, 2023 •

edited

Loading