Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement processor to split LENA recordings #326

Open
lucasgautheron opened this issue Nov 3, 2021 · 3 comments
Open

Implement processor to split LENA recordings #326

lucasgautheron opened this issue Nov 3, 2021 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@lucasgautheron
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

Users may want to split LENA recordings into contiguous blocks, as in EL1000.
This involves splitting the recordings in the metadata and splitting the audio accordingly.

Describe the solution you'd like

Implement a processor (in pipelines.processors)

  • fill lena_recording_num
  • set date_iso and start_time properly for each block (increment the original date by the correct amount for each block)
  • set session_id and session_offset (what if they already exist?)
  • Should work when the dataset also has non-LENA recordings
  • Should be idempotent and work when some of the recordings have already been split
@lucasgautheron lucasgautheron added the enhancement New feature or request label Nov 3, 2021
@lucasgautheron lucasgautheron self-assigned this Nov 3, 2021
@lucasgautheron lucasgautheron changed the title split LENA recordings Implement processor to split LENA recordings Nov 3, 2021
@MarvinLvn
Copy link
Contributor

MarvinLvn commented Nov 3, 2021

Ideally :

  1. Existing metadata and annotations (vtc, lena, etc) should be cut accordingly.
  2. recordings.csv shouldn't be erased. I think having a sessions.csv instead would be useful (you may want to work on your dataset at the longform level, or at the session-level)

@lucasgautheron
Copy link
Collaborator Author

lucasgautheron commented Nov 3, 2021

Ideally :

  1. Existing metadata and annotations (vtc, lena, etc) should be cut accordingly.
  2. recordings.csv shouldn't be erased. I think having a sessions.csv instead would be useful (you may want to work on your dataset at the longform level, or at the session-level)
  1. Oh yeah, this one is going to be PITA too. What if you want to re-import annotations at a later stage? Need to think about this...
  2. Currently, this is done by groupby by session_id. For instance, some of ChildProject's features (like sampling) already allow the user to decide which level to work at. Do we need a separate metadata file for that?

@MarvinLvn
Copy link
Contributor

MarvinLvn commented Nov 3, 2021

  1. That seems like a huge amount of work
  2. Agree, it's the simplest approach

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants