Video2Music: What is this project ?

The main objective of this project is to recommend music based on emotions captured from a short frontal-face video of the user. This objective is achieved in two parts : 1) A computer vision pipeline which takes a short-video and returns top 3 predicted emotions, 2) A recommendation system to recommend music based on the captured emotions.

What dataset and libraries are used ?

This project makes use of PyTorch framework with libraries like torch and torchvision, along with sciKit-learn for developing both the pipelines, evaluating, and integrating them. Other than this, the PIL, Numpy, Pandas, Matplotlib, and Transformers (from Hugging Face) libraries are used for supporting the development of the project and visualizing the results. The "FER_2013_Kaggle" dataset is used to develop the face classification pipeline while the "MusicCaps" dataset is used for training the recommendation system pipeline.

How to use this program ?

To test the program:

Download Vid2Music.ipynb, Vid2MusicRecommendation.py, and Video_Classification_Pipeline.py and put them in a common folder.
Get the apprpriate model weights from the links in weights.txt and modify the path in Video_Classification_Pipeline.py for the same.
Upload a short frontal-face video and get some amazing music recommendations.

To train on your own datasets:

In addition to step 1 from above, download Img_Emotion_Classifier.py and re-train the image classifier model on your own dataset or even different models by updating the appropriate sections commented in the code.
Use this new saved model with the Video_Classification_Pipeline.
To train the recommendation system pipeline, update Vid2MusicRecommendation.py with the appropriate text-to-music dataset and use those new weights for future predictions.

Future Development ?

Currently, the face classification pipeline only achieves 67% accuracy on its task while SOTA on this dataset is 72% so, some hyperparameter tuning or experimenting with different models is required. Further, more class-balanced datasets for both pipelines could help improve the performance significantly thus, more research in this area is necessary as the Vid2Music domain hasn't been researched as much though it has many useful applications. Lastly, a more robust and throughly tested recommendation system as well as better metrics for evaluating both pipelines could help standardize the results so more research in that area should be explored.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video2Music: What is this project ?

What dataset and libraries are used ?

How to use this program ?

Future Development ?

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Img_Emotion_Classifier.py		Img_Emotion_Classifier.py
LICENSE		LICENSE
README.md		README.md
Vid2Music.ipynb		Vid2Music.ipynb
Vid2MusicRecommendation.py		Vid2MusicRecommendation.py
Video_Classification_Pipeline.py		Video_Classification_Pipeline.py
weights.txt		weights.txt

License

sVedantWork/Video2Music

Folders and files

Latest commit

History

Repository files navigation

Video2Music: What is this project ?

What dataset and libraries are used ?

How to use this program ?

Future Development ?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages