Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker fixes #33

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,12 @@ MAINTAINER Konstantin Baierer <[email protected]>
EXPOSE 8080
ADD . /ocr-fileformat
WORKDIR /ocr-fileformat
RUN apk add --no-cache openjdk8-jre php7 php7-json py-lxml git make ca-certificates wget \
RUN apk add --no-cache openjdk8-jre php7 php7-json py-lxml git make ca-certificates wget bash \
&& update-ca-certificates \
&& make install \
&& mv web /ocr-fileformat-web \
&& rm -rf /ocr-fileformat \
&& apk del git make wget
VOLUME /data
WORKDIR /data
CMD php7 -S $(hostname -i):8080 -t /ocr-fileformat-web
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
PKG_NAME = ocr-fileformat
PKG_VERSION = 0.1.0
DOCKER_IMAGE = ubma/ocr-fileformat

CP = cp -r
LN = ln -sf
Expand Down Expand Up @@ -73,6 +74,9 @@ clean:
realclean: clean
$(MAKE) -C vendor clean

docker:
docker build -t "$(DOCKER_IMAGE)" .

release:
$(RM) $(PKG_NAME)_$(PKG_VERSION)
$(MKDIR) $(PKG_NAME)_$(PKG_VERSION)
Expand Down
35 changes: 33 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,18 @@
# ocr-fileformat

[![Build Status](https://travis-ci.org/UB-Mannheim/ocr-fileformat.svg?branch=master)](https://travis-ci.org/UB-Mannheim/ocr-fileformat)
[![Build Status](https://travis-ci.org/UB-Mannheim/ocr-fileformat.svg?branch=master)](https://travis-ci.org/UB-Mannheim/ocr-fileformat) [![ocr-fileformat Docker build](https://img.shields.io/docker/automated/ubma/ocr-fileformat.svg?maxAge=2592000?style=plastic)](https://hub.docker.com/r/ubma/ocr-fileformat)

Validate and transform between OCR file formats (hOCR, ALTO, PAGE, FineReader)

![Screenshot GUI](./screenshot.png)

<!-- vim :GenTocGFM -->
<!-- BEGIN-MARKDOWN-TOC -->
* [Installation](#installation)
* [Docker](#docker)
* [System-wide](#system-wide)
* [Usage](#usage)
* [CLI](#cli)
* [GUI](#gui)
* [API](#api)
* [Transformation](#transformation)
* [Transformation CLI](#transformation-cli)
Expand All @@ -23,8 +26,31 @@ Validate and transform between OCR file formats (hOCR, ALTO, PAGE, FineReader)
* [Supported Validation Formats](#supported-validation-formats)
* [License](#license)

<!-- END-MARKDOWN-TOC -->

## Installation

### Docker

You can run the [command line scripts](#cli) and [web interface](#gui) as a
[Docker container](https://hub.docker.com/r/ubma/ocr-fileformat), you only need
Docker installed.

To start the web interface on [http://localhost:8080](http://localhost:8080):

```sh
docker run --rm -it -p 8080:8080 ubma/ocr-fileformat
```

To run the command line scripts, mount the directory containing your input
files into the container's `/data` directory:

```sh
docker run --rm -it -v "$PWD:/data" ubma/ocr-fileformat ocr-transform alto2.0 hocr somefile.alto
```

### System-wide

To install system-wide to `/usr/local`:

```sh
Expand Down Expand Up @@ -63,6 +89,11 @@ script (CLI), using a web interface (GUI) or in you own tools (API)
* [`ocr-transform`](./blob/master/bin/ocr-transform.sh): Transformation of OCR output between OCR formats
* [`ocr-validate`](./blob/master/bin/ocr-validate.sh): Validation of OCR output against OCR format schemas

### GUI

The web interface is for testing validation and transformations. You can upload
a file or select an input file by URL.

### API

* [`$PREFIX/share/ocr-fileformat/xslt`](./xslt) - XSLT stylesheets
Expand Down
10 changes: 5 additions & 5 deletions lib.sh
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ declare -Ax OCR_VALIDATORS=()
setup_transformations () {
declare -a transformers=($(
find "$SHAREDIR/xslt" "$SHAREDIR/script/transform" \
! -type d \( -name '*.xsl' -or -executable \) \
! -type d \( -name '*.xsl' -or -perm -005 \) \
))
local in_fmt out_fmt
for path in "${transformers[@]}";do
Expand All @@ -69,8 +69,8 @@ setup_transformations () {
setup_validations () {
declare -a validators=($(
find "$SHAREDIR/xsd" "$SHAREDIR/script/validate" \
! -type d \( -name '*.xsd' -or -executable \) \
|sort -h))
! -type d \( -name '*.xsd' -or -perm -005 \) \
|sort))
local path fmt
for path in "${validators[@]}";do
fmt=${path##*/}
Expand All @@ -90,7 +90,7 @@ setup
# show_schemas ()
show_schemas() {
local schema schemagroup
declare -a sorted=($(IFS=$'\n'; echo "${!OCR_VALIDATORS[*]}"|sort -V))
declare -a sorted=($(IFS=$'\n'; echo "${!OCR_VALIDATORS[*]}"|sort -t- -nk2 -k1))
for schema in "${sorted[@]}";do
[[ -n "$schemagroup" && "$schemagroup" != ${schema%%-*} ]] && echo
echo -n "$schema "
Expand All @@ -106,7 +106,7 @@ show_transformations() {
for out_fmt in "${out_fmts[@]}";do
echo "${in_fmt} ${out_fmt}";
done
done|sort -V
done|sort
}

# show_saxon_options ()
Expand Down
2 changes: 1 addition & 1 deletion vendor/Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
CP = cp -rv
MKDIR = mkdir -p
RM = rm -rfv
UNZIP = unzip -u
UNZIP = unzip -o
WGET = wget --progress=bar:force --no-verbose
GIT_CLONE = git clone --depth 1

Expand Down