-
Notifications
You must be signed in to change notification settings - Fork 0
harvested wac files keep wav file as original, closes #35 #49
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution, we just need to consider a few edge cases before this is merged.
target = File.join(File.dirname(source), File.basename(source, '.wac') + '.wav') | ||
|
||
# only try to create the wav file if it doesn't already exist | ||
@audio.modify(source, target) unless File.size?(target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cofiem - why this unless
condition? Couldn't this lead to non-idempotent behaviour on repeated runs?
I guess you're assuming an identical .wav file may have already been uploaded and you don't want to run wac2wav if that is the case... in which case, that check should be explicit.
Adding to that, the output name of the .wav file should be unique so duplicate runs don't mistake converted wav files as original wav files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file = wav_file | ||
rescue StandardError => e | ||
@logger.error(@class_name) { | ||
"Could not convert .wac file #{file} using utc offset '#{utc_offset}': #{format_error(e)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming the utc_offset
part of the exception has some relevance?
"Source .wac file #{wac_file} was not renamed." | ||
} | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this block itself is fine, but the implication is disturbing: if we harvest files in WAac then we'll use triplicate storage until harvester_to_do
is cleaned up. Given #50 & #36 this pattern will become more frequent as well.
I think we should keep source file in harvest_to_do
, and delete the converted file out of harvester_to_do
after it is copied to original_audio
. Thoughts?
@@ -183,10 +183,10 @@ | |||
password = 'password' | |||
auth_token = 'auth token this is' | |||
|
|||
file_hash = 'SHA256::c6d561c91664a92a1598bda3b79734d5ee29266ad0411ffaea1188f29d5b6439' | |||
file_hash = 'SHA256::b06458c96ac2de75d77b83e1b3acb3e64b84a4bfe671c794153855bf7b5e96fd' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to the changes below as well: what information is stored in the database?
It appears the hash, original file name, and relative path all refer to the wav file.
- The hash maps to the
original_audio
file so it should be the hash of the file that is stored - the WAV file 👍 - However, other two fields refer to the original source file. Up until Allow storing audio files in FLAC when harvesting. #36, and When harvesting allow decoding .wac files #35 the source files have never been converted so their definition has been unambiguous. I'm leaning towards changing their meaning to mean "original_file_name before conversion". Thoughts?
- it would be nice for the behaviour to be the same as the FLAC use case in Allow storing audio files in FLAC when harvesting. #36 however, I think WAac is a throwaway format and the use case might be slightly different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely do not change the meaning of existing fields. Do this properly. Otherwise it's a quick way to shoot yourself in the foot.
First of all, why do you care what format the file used to be?
If you must store info about pre-conversion, what info do you want to store?
Add new fields to audio_recording
to store this information. Or, maybe, store it in the existing machine readable notes field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it's less changing what the fields mean and more clarifying what they mean in a new scenario. Does original file name refer to the source file or the converted file? That needs to be defined.
Why do we care? The only reason we have both of those fields (name and path (in notes)) is so we can map back to the original files in other storage mediums (E.g. Availae or a portable hdd). Storing the name of the converted file breaks that.
@atruskie please review