Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any explanation on feature window re-ordering? #49

Open
Jackbennett opened this issue Dec 10, 2021 · 2 comments
Open

Any explanation on feature window re-ordering? #49

Jackbennett opened this issue Dec 10, 2021 · 2 comments

Comments

@Jackbennett
Copy link

Jackbennett commented Dec 10, 2021

Hi, I'm looking at shrinking the processing window down from the entire audio file at once.

Could you shed any light on this line?

feature = np.concatenate((np.roll(feature, 1, axis=0), feature, np.roll(feature, -1, axis=0)), axis=1)

Why does it use np.roll to move the frames to the front, and the end as well as joining all 3 together to widen the sample?

I'd spread out the one-liner as below to try and figure it out.

    rollup   = np.roll(feature, 1, axis=0)  # make last feature first
    rolldown = np.roll(feature, -1, axis=0) # make first feature last

    combined = np.concatenate((rollup, feature, rolldown), axis=1) # join all feature on second axis
    windowed = combined[::3, ] # removes features with overlapping samples

    return windowed

It seems to make all the overlapping features into 1 deeper sample and then drops all the overlaps by getting every 3rd item. But why the np.roll ?

@xinjli
Copy link
Owner

xinjli commented Dec 11, 2021

Hi, thanks for your question.

The intention here is to make each frame cover a longer audio span, roll is to enable you to cover neighbor features.

let's say you originally have features [1,2,3,4,5,6],

  • by roll up and rolldown you create two other features [2,3,4,5,6,1] and [6,1,2,3,4,5],
  • concatenating them give you [[6,1,2], [1,2,3], ..., [4,5,6], [5,6,1]]
  • then drop the overlapping ones you have [[6,1,2], [3,4,5]]

so now you have a smaller number of feature (6 -> 2), but each feature covers longer range (1 -> 3)

there are some mistakes at the beginning and at the ending because 6 should not before 1, but it is usually a small mistake and can be ignored.

@willstott101
Copy link
Contributor

willstott101 commented Jul 26, 2022

We're experimenting with trying to create a live-streaming version of this project.

Would you accept a PR to change this logic to work better for live-streaming?

[1,2,3,4,5,6,7] -> [[1,2,3],[4,5,6],[7,7,7]]
[1,2,3,4,5,6,7,8] -> [[1,2,3],[4,5,6],[7,8,8]]

Are there any ramifications to do with phoneme timings if we were to change this? If so, and if they're not easily resolvable perhaps this would work better:

[1,2,3,4,5,6,7] -> [[1,1,2],[3,4,5],[6,7,7]]
[1,2,3,4,5,6,7,8,9] -> [[1,1,2],[3,4,5],[6,7,8],[9,9,9]]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants