Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert MusicLDM #4579

Merged
merged 22 commits into from
Aug 25, 2023
Merged

Conversation

sanchit-gandhi
Copy link
Contributor

@sanchit-gandhi sanchit-gandhi commented Aug 11, 2023

What does this PR do?

Adds the conversion script for MusicLDM and a new pipeline class, closely based on the existing AudioLDM pipeline.

Changes compared to the existing AudioLDM pipeline:

  1. AudioLDM only uses the CLAP text branch. MusicLDM uses the full CLAP model (text + audio branch) for similarity scoring: the cosine similarity is computed between the generated waveforms and the text inputs, and the audios ranked based on these scores (most similar -> least similar). For MusicLDM, this scoring has quite a big effect on the quality of the generated audios when num_waveforms_per_prompt>1.
  2. Addition of the CLAP feature extractor for pre-processing the audio waveforms for the CLAP audio branch: the feature extractor is registered as a new module in the __init__, and is used in the score_waveforms method

TODO:

  • Finalise design - are we happy with adding a new pipeline as described above, or do we want to try and make the existing pipeline compatible with the two changes described above, possibly at the expense of greater code complexity (need to condition every call to the text_encoder)
  • Add tests & update docs

cc @Vaibhavs10 @sayakpaul

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Aug 11, 2023

The documentation is not available anymore as the PR was closed or merged.

@sayakpaul
Copy link
Member

Depending on the complexity, I think it's okay to add a separate pipeline for this.

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clean! Only left some nits

@sanchit-gandhi sanchit-gandhi merged commit b1290d3 into huggingface:main Aug 25, 2023
@sanchit-gandhi sanchit-gandhi deleted the convert-musicldm branch August 25, 2023 12:31
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
* from audioldm

* fix vae

* move to new pipeline

* copied from audioldm

* remove redundant control flow

* iterate

* fix docstring

* finish pipeline

* tests: from audioldm2

* iterate

* finish fast tests

* finish slow integration tests

* add docs

* remove dtype test

* update toctree

* "copied from" in conversion (where possible)

* Update docs/source/en/api/pipelines/musicldm.md

Co-authored-by: Patrick von Platen <[email protected]>

* fix docstring

* make nightly

* style

* fix dtype test

---------

Co-authored-by: Patrick von Platen <[email protected]>
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
* from audioldm

* fix vae

* move to new pipeline

* copied from audioldm

* remove redundant control flow

* iterate

* fix docstring

* finish pipeline

* tests: from audioldm2

* iterate

* finish fast tests

* finish slow integration tests

* add docs

* remove dtype test

* update toctree

* "copied from" in conversion (where possible)

* Update docs/source/en/api/pipelines/musicldm.md

Co-authored-by: Patrick von Platen <[email protected]>

* fix docstring

* make nightly

* style

* fix dtype test

---------

Co-authored-by: Patrick von Platen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants