Building Infrastructure for Data-Driven Research

Speaker: Philipp Zumstein
Venue: Social Science Data Lab, MZES, Mannheim
Date: March 15th, 2017, at 12 noon
Location: MZES, A-231

Abstract

Most methods for data-driven research (including Big Data, Data Science, and Digital Humanities) work primarily on text data or numbers. However, there is also a lot of information which is only available in printed books or newspapers. This information has to be first digitized and then further processed to extract the text or data. The main focus of the talk is optical character recognition (OCR). We will see the OCR workflow in general, discuss some OCR software, and how you can use these tools practically. Building such an infrastructure or performing these initial steps may need a reasonable amount of time and resources, or also be a project itself. The Mannheim University Library has in this area some infrastructure projects which are briefly mentioned.

Keywords

optical character recognition (OCR)
infrastructure for research
data-driven research

Slides

View HTML-presentation online:
- on GitHub pages: https://socialsciencedatalab.github.io/building-infrastructure-for-data-driven-research/
- on slides.com: http://slides.com/zuphilip/ssdl-2017/#/
View and Download PDF:
- on speakerdeck: https://speakerdeck.com/zuphilip/building-infrastructure-for-data-driven-research
- direct: /pdf/ssdl2017.pdf
Source files:
- /docs/index.html content (CC-BY)
- /docs/css/theme/white-blue-montserrat.css layout info
- all other files in /docs from reveal.js (MIT)

Links

OCR Software
OCR in general
- Links to awesome OCR projects: https://github.com/kba/awesome-ocr
- collected list of publications for OCR by @OCR-D-project: https://www.zotero.org/groups/ocr-d/items/
Some of our projects

Feedback, Questions, Discussion

Feel free to ask also questions here by opening up a new issue and we can continue discussion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Building Infrastructure for Data-Driven Research

Abstract

Keywords

Slides

Links

Feedback, Questions, Discussion

Files

README.md

Latest commit

History

README.md

File metadata and controls

Building Infrastructure for Data-Driven Research

Abstract

Keywords

Slides

Links

Feedback, Questions, Discussion