GitHub - GISense/Soundscape-to-Image

From Hearing to Seeing: Linking Auditory and Visual Place Perceptions with Soundscape-to-Image Generative Artificial Intelligence

Citation

If you use this algorithm in your research or applications, please cite this source:

@article{ZHUANG2024102122,
title = {From hearing to seeing: Linking auditory and visual place perceptions with soundscape-to-image generative artificial intelligence},
journal = {Computers, Environment and Urban Systems},
volume = {110},
pages = {102122},
year = {2024},
issn = {0198-9715},
doi = {https://doi.org/10.1016/j.compenvurbsys.2024.102122},
url = {https://www.sciencedirect.com/science/article/pii/S0198971524000516},
author = {Yonggai Zhuang and Yuhao Kang and Teng Fei and Meng Bian and Yunyan Du},
keywords = {Soundscape, Street view images, Sense of place, Stable diffusion, Generative AI, LLMs},
}

About The Project

People experience the world through multiple senses simultaneously, contributing to our sense of place. Prior quantitative geography studies have mostly emphasized human visual perceptions, neglecting human auditory perceptions at place due to the challenges in characterizing the acoustic environment vividly. Also, few studies have synthesized the two-dimensional (auditory and visual) perceptions in understanding human sense of place. To bridge these gaps, we propose a Soundscape-to-Image Stable Diffusion model, a generative Artificial Intelligence (AI) model supported by Large Language Models (LLMs), aiming to visualize soundscapes through the generation of street view images. By creating audio-image pairs, acoustic environments are first represented as high-dimensional semantic audio vectors. Our proposed Soundscape-to-Image Stable Diffusion model, which contains a Low-Resolution Diffusion Model and a Super-Resolution Diffusion Model, can then translate those semantic audio vectors into visual representations of place effectively. We evaluated our proposed model by using both machine-based and human-centered approaches and proved that the generated street view images align with our common perceptions, and accurately create several key street elements of the original soundscapes. It also demonstrates that soundscapes provide sufficient visual information places. This study stands at the forefront of the intersection between generative AI and human geography, demonstrating how human multi-sensory experiences can be linked. We aim to enrich geospatial data science and AI studies with human experiences. It has the potential to inform multiple domains such as human geography, environmental psychology, and urban design and planning, as well as advancing our knowledge of human-environment relationships.

Code Usage

Environment: Python 3.9 or newer
See requirements.txt

We combine our audio encoder with imagen decoder. Recommand to first train the audio encoder.

install imagen_pytorch, change its diffusion decoder max_text_len into 343,change max_seq_len into 768
put you train audio set and image set in ./extract/audio and ./extract/image
train by train.py
and put you test audio in testaudio
sample you result by sample.py

Folder Structure

The folders and files are organized as follows.
Put you traing audio and image in the extract folder and after training, put you test audio in the testaudio folder

project
|-- extract
|   |-- audio
|   |-- image
|-- torchvggish
|-- testaudio
|-- testresult
|-- train.py
|-- sample.py

Contact

Yonggai Zhuang: [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

From Hearing to Seeing: Linking Auditory and Visual Place Perceptions with Soundscape-to-Image Generative Artificial Intelligence

Table of Contents

Citation

About The Project

Code Usage

Folder Structure

Contact

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
extract		extract
githubimage		githubimage
testaudio		testaudio
testresult		testresult
torchvggish		torchvggish
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
sample.py		sample.py
train.py		train.py

License

GISense/Soundscape-to-Image

Folders and files

Latest commit

History

Repository files navigation

From Hearing to Seeing: Linking Auditory and Visual Place Perceptions with Soundscape-to-Image Generative Artificial Intelligence

Table of Contents

Citation

About The Project

Code Usage

Folder Structure

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages