Skip to content
/ msanii Public

A novel diffusion-based model for synthesizing long-context, high-fidelity music efficiently.

License

Notifications You must be signed in to change notification settings

Kinyugo/msanii

Repository files navigation

Msanii: High Fidelity Music Synthesis on a Shoestring Budget

arXiv Hugging Face Spaces Open In Colab GitHub Repo stars

A novel diffusion-based model for synthesizing long-context, high-fidelity music efficiently.

Abstract

In this paper, we present Msanii, a novel diffusion-based model for synthesizing long-context, high-fidelity music efficiently. Our model combines the expressiveness of mel spectrograms, the generative capabilities of diffusion models, and the vocoding capabilities of neural vocoders. We demonstrate the effectiveness of Msanii by synthesizing tens of seconds (190 seconds) of stereo music at high sample rates (44.1 kHz) without the use of concatenative synthesis, cascading architectures, or compression techniques. To the best of our knowledge, this is the first work to successfully employ a diffusion-based model for synthesizing such long music samples at high sample rates. Our demo can be found here and our code here.

Disclaimer

This is a work in progress and has not been finalized. The results and approach presented are subject to change and should not be considered final.

Samples

See more here.

Midnight Melodies Echoes of Yesterday
 Midnight Melodies  Echoes of Yesterday
Rainy Day Reflections Starlight Sonatas
 Rainy Day Reflections  Starlight Sonatas

Setup

Setup your virtual environment using conda or venv.

Install package from git

    pip install -q git+https://github.com/Kinyugo/msanii.git

Install package in edit mode

    git clone https://github.com/Kinyugo/msanii.git
    cd msanii
    pip install -q -r requirements.txt
    pip install -e .

Training

Notebook

Open In Colab

CLI

To train via CLI you need to define a config file. Check for sample config files within the conf directory.

    wandb login
    python -m msanii.scripts.training <path-to-your-config.yml-file>

Inference

Notebook

Open In Colab

CLI

Msanii supports the following inference tasks:

  • sampling
  • audio2audio
  • interpolation
  • inpainting
  • outpainting

Each task requires a different config file. Check conf directory for samples.

    gdown 1G9kF0r5vxYXPSdSuv4t3GR-sBO8xGFCe # model checkpoint
    python -m msanii.scripts.inference <task> <path-to-your-config.yml-file>

Demo

HF Spaces & Notebook

Hugging Face Spaces Open In Colab

CLI

To run the demo via CLI you need to define a config file. Check for sample config files within the conf directory.

    gdown 1G9kF0r5vxYXPSdSuv4t3GR-sBO8xGFCe # model checkpoint
    python -m msanii.demo.demo <path-to-your-config.yml-file>

Contribute to the Project

We are always looking for ways to improve and expand our project, and we welcome contributions from the community. Here are a few ways you can get involved:

  • Bug Fixes and Feature Requests: If you find any issues with the project, please open a GitHub issue or submit a pull request with a fix.
  • Data Collection: We are always in need of more data to improve the performance of our models. If you have any relevant data that you would like to share, please let us know.
  • Feedback: We value feedback from our users and would love to hear your thoughts on the project. Please feel free to reach out to us with any suggestions or comments.
  • Funding: If you find our project helpful, consider supporting us through GitHub Sponsors. Your support will help us continue to maintain and improve the project.
  • Computational Resources: If you have access to computational resources such as GPU clusters, you can help us by providing access to these resources to run experiments and improve the project.
  • Code Contributions: If you are a developer and want to contribute to the codebase, feel free to open a pull request.
  • Documentation: If you have experience with documentation and want to help improve the project's documentation please let us know.
  • Promotion: Help increase the visibility and attract more contributors by sharing the project with your friends, colleagues, and on social media.
  • Educational Material: If you are an educator or content creator you can help by creating tutorials, guides or educational material that can help others understand the project better.
  • Discussing and Sharing Ideas: Even if you don't have the time or technical skills to contribute directly to the code or documentation, you can still help by sharing and discussing ideas with the community. This can help identify new features or use cases, or find ways to improve existing ones.
  • Ethical Review: Help us ensure that the project follows ethical standards by reviewing data and models for potential infringements. Additionally, please do not use the project or its models to train or generate copyrighted works without proper authorization.

About

A novel diffusion-based model for synthesizing long-context, high-fidelity music efficiently.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published