Any explanation on feature window re-ordering? #49

Jackbennett · 2021-12-10T14:50:50Z

Hi, I'm looking at shrinking the processing window down from the entire audio file at once.

Could you shed any light on this line?

Line 19 in d9f1ada

    
           feature = np.concatenate((np.roll(feature, 1, axis=0), feature, np.roll(feature, -1, axis=0)), axis=1)

Why does it use np.roll to move the frames to the front, and the end as well as joining all 3 together to widen the sample?

I'd spread out the one-liner as below to try and figure it out.

    rollup   = np.roll(feature, 1, axis=0)  # make last feature first
    rolldown = np.roll(feature, -1, axis=0) # make first feature last

    combined = np.concatenate((rollup, feature, rolldown), axis=1) # join all feature on second axis
    windowed = combined[::3, ] # removes features with overlapping samples

    return windowed

It seems to make all the overlapping features into 1 deeper sample and then drops all the overlaps by getting every 3rd item. But why the np.roll ?

The text was updated successfully, but these errors were encountered:

xinjli · 2021-12-11T21:08:46Z

Hi, thanks for your question.

The intention here is to make each frame cover a longer audio span, roll is to enable you to cover neighbor features.

let's say you originally have features [1,2,3,4,5,6],

by roll up and rolldown you create two other features [2,3,4,5,6,1] and [6,1,2,3,4,5],
concatenating them give you [[6,1,2], [1,2,3], ..., [4,5,6], [5,6,1]]
then drop the overlapping ones you have [[6,1,2], [3,4,5]]

so now you have a smaller number of feature (6 -> 2), but each feature covers longer range (1 -> 3)

there are some mistakes at the beginning and at the ending because 6 should not before 1, but it is usually a small mistake and can be ignored.

willstott101 · 2022-07-26T08:46:03Z

We're experimenting with trying to create a live-streaming version of this project.

Would you accept a PR to change this logic to work better for live-streaming?

[1,2,3,4,5,6,7] -> [[1,2,3],[4,5,6],[7,7,7]]
[1,2,3,4,5,6,7,8] -> [[1,2,3],[4,5,6],[7,8,8]]

Are there any ramifications to do with phoneme timings if we were to change this? If so, and if they're not easily resolvable perhaps this would work better:

[1,2,3,4,5,6,7] -> [[1,1,2],[3,4,5],[6,7,7]]
[1,2,3,4,5,6,7,8,9] -> [[1,1,2],[3,4,5],[6,7,8],[9,9,9]]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any explanation on feature window re-ordering? #49

Any explanation on feature window re-ordering? #49

Jackbennett commented Dec 10, 2021 •

edited

Loading

xinjli commented Dec 11, 2021

willstott101 commented Jul 26, 2022 •

edited

Loading

Any explanation on feature window re-ordering? #49

Any explanation on feature window re-ordering? #49

Comments

Jackbennett commented Dec 10, 2021 • edited Loading

xinjli commented Dec 11, 2021

willstott101 commented Jul 26, 2022 • edited Loading

Jackbennett commented Dec 10, 2021 •

edited

Loading

willstott101 commented Jul 26, 2022 •

edited

Loading