Skip to content

Commit

Permalink
more intuitive ID for output file, OCR-D#26
Browse files Browse the repository at this point in the history
  • Loading branch information
kba committed Dec 6, 2018
1 parent b6e1f59 commit 33542f0
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions ocrd_tesserocr/recognize.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,11 +113,11 @@ def process(self):
if not regions:
log.warning("Page contains no text regions")
self._process_regions(regions, maxlevel, tessapi)
ID = concat_padded(self.output_file_grp, n)
ID = concat_padded(self.output_file_grp, int(re.replace('[^\d]', '', input_file.ID)))

This comment has been minimized.

Copy link
@finkf

finkf Dec 6, 2018

Could this lead to problems, if the input file ID contains any digits other then the sequence number (for example internal IDs, year ...)?

This comment has been minimized.

Copy link
@finkf

finkf Dec 6, 2018

or maybe even better:

ID = concat_padded(self.output_file_grp, os.path.basename(input_file.url)[:-4])
self.workspace.add_file(
ID=ID,
file_grp=self.output_file_grp,
basename=ID + '.xml',
basename="%s.xml" % ID,
mimetype=MIMETYPE_PAGE,
content=to_xml(pcgts),
)
Expand Down

0 comments on commit 33542f0

Please sign in to comment.