Skip to content

Latest commit

 

History

History
59 lines (42 loc) · 3.35 KB

File metadata and controls

59 lines (42 loc) · 3.35 KB

Building Infrastructure for Data-Driven Research

  • Speaker: Philipp Zumstein font-awesome_4-7-0_github_24_0_000000_none font-awesome_4-7-0_twitter_24_0_007dff_none
  • Venue: Social Science Data Lab, MZES, Mannheim
  • Date: March 15th, 2017, at 12 noon
  • Location: MZES, A-231

Abstract

Most methods for data-driven research (including Big Data, Data Science, and Digital Humanities) work primarily on text data or numbers. However, there is also a lot of information which is only available in printed books or newspapers. This information has to be first digitized and then further processed to extract the text or data. The main focus of the talk is optical character recognition (OCR). We will see the OCR workflow in general, discuss some OCR software, and how you can use these tools practically. Building such an infrastructure or performing these initial steps may need a reasonable amount of time and resources, or also be a project itself. The Mannheim University Library has in this area some infrastructure projects which are briefly mentioned.

Keywords

Slides

Links

Feedback, Questions, Discussion

Feel free to ask also questions here by opening up a new issue and we can continue discussion.