Tracing re-use in audiovisual collections, document datascopes and publish results in enhanced publications
- Status: In Progress
- Type: Generic
- Work Package: WP5
- Research Coordinators: Jasmijn Van Gorp, Nanne van Noord, Victor de Boer
- Coordinators for CLARIAH: Roeland Ordelman
- Participating Institutes: UU, NISV, UvA, VU
- End-users: Journalists, Communication Scientists, Humanities Researchers
- Developers: NISV
- Interest Groups: Workflows, Annotation, Audiovisual Processing
- Task IDs: (zero or more task IDs if this is addressed in existing CLARIAH-PLUS tasks)
In this project we have three goals:
-
trace the the re-use of video, speech, image in large audiovisual archives through DANE
-
develop methods to document data manipulations (linked datascopes) and publish them in Media Suite datastories
-
investigate the possibilities of enhanced publications (datastories) with audiovisual, privacy-protected and copyright-protected data in Media Suite datastories
The research consists of four phases:
A. Pilot phase: work with a comprehensive body of broadcasts of 1 week. In this small subset of the NISV collections, multiple types of re-use will appear. Since DANE is not in place yet, this pilot research will be conducted both manually (manual annotation) and computationally/experimentally
B. Documentation phase (parallel to A): in order to document datascopes, we will look into different methods of the humanities and the social sciences
C. Testing DANE phase: once DANE is in place the research will be conducted again, as test for the DANE pipeline
D. Publication phase: We publish the results in an enhanced publication, a Media Suite data story in a multimodal format, and incorporate the documentation of datascopes (Hoekstra and Koolen, 2019). Copyright and privacy questions are needed to be tackled as part of the publication process
The main goal of this project is tracing the re-use of images and speech throughout the archive.
The pilot study use case is television programming and instances of re-use of television. The use case is generic and can be broadened to any multimodal (framing) analysis in which the tracing of re-use of speech, images, text and video are key. It can, therefore, also function as a test and a pilot for other projects that would like to use the NISV collection in DANE.
Methodologically, the research starts from datascopes as method (Hoekstra & Koolen, 2018). This will be further discussed in relation to implementation of linked datascopes in the Media Suite (cf. De Boer et al, 2020).
It is relevant use case for, a.o. (1) media studies, communication sciences, history and journalism studies, but also for (2) journalists, media professionals and the heritage institutes that would like to find instances of re-use in their archives.
-
Annotation has to be done manually.
-
Automated detection of programmes, scenes, segments themselves and their modalities through DANE is not in place yet.
-
Unclear what kind of data can be published in publications, and in Media Suite Data Stories in particular, and under which conditions. We should have an overview of what is allowed and what is not allowed in terms of privacy and copyright of audiovisual data published in datastories and in other publication formats.
-
A Jupyter Notebook for tracing re-use.
-
With this small-scale research project, we develop methods that can be used to trace instances of re-use in the entire audio-visual collection, and also across different collections, as part of the DANE pipeline
-
Query-by-image, query-by-speech: looking for an image, speech transcription or speech-file to detect instances of re-use in the archive
-
Expertise on copyrights and privacy in relation to Audiovisual Data
- NISV collection Television
- DANE: Computer vision enrichments
- DANE: ASR enrichments
- Media Suite Search
- Media Suite Resource Viewer
- Media Suite Datastories
- DANE
Media Suite
(How do we evaluate the solution to the problem?)
References to related resources and publications and especially links to related use-cases:
-
Rik Hoekstra & Marijn Koolen (2019) Data scopes for digital history research, Historical Methods: A Journal of Quantitative and Interdisciplinary History, 52:2, 79-94, DOI: 10.1080/01615440.2018.1484676
-
Victor de Boer, Ivette Bonestroo, Rik Hoekstra and Marijn Koolen (2020) A Linked Data Model for Data Scopes. In Proceedings of 14th International Conference on Metadata and Semantics Research.