AVSA-SEP introduces a groundbreaking approach to sound separation in audiovisual contexts, focusing on the challenge of separating sounds that are not directly visible within video frames. This project is based on the research presented at the ICCV 2023 Workshop on AV4D: Visual Learning of Sounds in Spaces, showcasing a method to enhance the understanding of complex audiovisual scenes for improved sound separation.
For a detailed exploration of our approach and findings, consult our paper:
Authors: Yiyang Su, Ali Vosoughi, Shijian Deng, Yapeng Tian, Chenliang Xu.
We welcome contributions to improve AVSA-SEP. Please submit an issue or pull request with your proposed changes or enhancements.
This project is released under the MIT License. See the LICENSE file for more details.
Please cite our work if you use AVSA-SEP in your research:
@article{su2023separating,
title={Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation},
author={Su, Yiyang and Vosoughi, Ali and Deng, Shijian and Tian, Yapeng and Xu, Chenliang},
journal={Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshop on AV4D: Visual Learning of Sounds in Spaces},
year={2023}
}
For further information and support, please contact us.