August 9, 2017
This meeting is a hybrid teleconference and IRC chat. Anyone is welcome to join. Here is the info:
- Time: 3:00pm-4:00pm Eastern Daylight Time US (UTC-4)
- Dial-in Number: (641) 715-3570
- Participant Code: 304589#
- International numbers: [[Conference Call Information]]
- Web Access: https://www.freeconferencecallhd.com/wp-content/themes/responsive/flashphone/flash-phone.php
- IRC:
- Join the #islandora chat room via Freenode Web IRC (enter a unique nick)
- Or point your IRC client to #islandora on irc.freenode.net
- Marcus Barnes (Chair)
- Mark McFate (Grinnell College)
- Melissa Anez, Islandora Foundation
- Natkeeran Ledchumykanthan
- Derek Merleaux
- Megan Kudzia
- Updates and announcements
- Showcase of tools
- Guest speaker: Megan Kudzia, University of Michigan - Transcriptinator + audiogrep
- Module customizations
- Standards
- Next Meeting
-
Updates and announcements - Mark McFate - Selected Oral History objects will be made online by Grinnell College (Mark McFate). Grinnell College has done UI work related to presentation. Interested people can contact Mark McFate or Marcus Barnes.
-
Megan Kudzia - Megan Kudzia is a programmer/librarian from University of Michigan. She is the developer of the Transcriptinator. Transcriptinator tool creates an XML output taking audiogrep generated transcripts as the input. The generated XML formatted transcript can be ingested by the Islandora OH module. She made the following presentation: (Working with Islandora’s Oral Histories Module + Transcriptinator)[https://docs.google.com/presentation/d/1Qx_nio2tq7Ybim0HrDWVdu-tNiySNoihakQ-hWHo5P4/edit#slide=id.p].
-
Megan Kudzia - We were interested in generating transcripts for the Women's Overseas Service League oral histories project. We needed to generate a first go time-coded transcript automatically. I looked at using (CMUSphinx)[https://cmusphinx.github.io/] directly, but found the documentation lacking. Then went with AudioGrep, which is designed to create audio supercuts, but also generates transcripts. There is also VideoGrep.
-
Megan Kudzia - Note that we have a custom Python based batch upload process for Islandora. We do the derivative generation separately, then do the ingest.
-
Megan Kudzia - To install the Transcriptinator, you will need Audiogrep, Python 3 and LXML. Some people have had issues installing LXML. It is a command line tool. Data file is usually one folder down. File paths needs to be updated based on local setup.
-
Megan Kudzia - Audiogrep can take long time to execute, approximately half the audio length of the audio. The Transcriptinator does not take much time. The quality of the Audiogrep output is not high. One alternative maybe to use youtube transcript feature. I can possibly explore that option.
-
Megan Kudzia - We are not using OH module at this time. We can use OH for some content types such as Michigan Supreme Court collection's interviews.
-
Derek Merleaux - Can you pls provide examples of where OH modules are being used.
-
Megan Kudzia - Cesare sneaks into Jane's bedroom is a good example. Other known installations can be found here.
-
Mark McFate - We used the Transcriptinator tool for several months. But, the quality of the audiogrep output is not acceptable. Is there any way to train audiogrep.
-
Megan Kudzia - Possibly yes, but we have not tried that.
-
Mark McFate - We are looking to see if we can create synchronized images (video) from tagged XML.
-
Marcus - A tool like Transcriptinator can possibly be developed do create video from tagged transcript. Interesting idea for presentation. It is a non-trivial problem though. Perhaps a student project.
-
Mark McFate - Yes, can be part of a post-processing workflow. Interesting area to explore.
-
Marcus - I am not sure about the agenda item "Module customizations". Will skip that one. Also, the standards discussion is something we would have to revisit again as well.
-
Marcus - We here are focused on the technical side of oral histories. There are associations in US and Canada dedicated to Oral Histories. The following are links of oral histores associations.
-
Next Meeting
- Our September meeting will happen on Monday, September 11, 2017 so we can bring in a guest speaker from the Samvera community. Our guest speaker is Ben Armintor, Head of Development, Infrastructure and Applications at Columbia University and Fedora and Samvera committer (https://github.com/barmintor). Ben will speak about the archive project funded by the Carnegie Corporation of New York to develop software to support oral histories, and to talk about how other Hydra/Samvera projects support oral histories and other time-based media.