Skip to content
forked from OCR-D/PAGE-XML

PAGE XML format collection for document image page content and more

License

Notifications You must be signed in to change notification settings

bertsky/PAGE-XML

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PAGE-XML

PAGE XML format collection for document image page content and more

For an introduction, please see the following publication: http://www.primaresearch.org/publications/ICPR2010_Pletschacher_PAGE

The most actively used XML formats are:

  • PAGE XML for page content (regions, text lines, words, glyphs, reading order, text content, ...)
  • PAGE XML for layout analysis evaluation (evaluation profiles, evaluation results, ...)
  • PAGE XML for document image dewarping (dewarping grids)

All formats are defined by an XML schema, hosted officially on primaresearch.org: http://www.primaresearch.org/schema/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd http://www.primaresearch.org/schema/PAGE/eval/layout/2013-07-15/layouteval.xsd http://www.primaresearch.org/schema/PAGE/gts/dewarping/2014-08-26/dewarping.xsd

Please see the wiki for more information.

Note: The master branch contains the proposed changes for the next release.

Page Content

Proposed media type for page content: "application/vnd.prima.page+xml"

About

PAGE XML format collection for document image page content and more

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • XSLT 57.7%
  • Python 13.9%
  • Visual Basic .NET 13.5%
  • JavaScript 10.3%
  • HTML 2.6%
  • CSS 2.0%