JOANNA: Music Generation Using Weird Autoencoder and LSTM Architecture in Keras

I was able to generate music by training a NN model over Joanna Newsom's song "Sapokanikan" (here's the video for context and comparison, although in my opinion you should really listen to it anyways because it's gorgeous):

A 90 second sample with sample rate of 16000 hz from the song is overlappingly sliced into 2048 samples and then STFT'd, and use the frequency phase and magnitude as input. Then a 16 layer Denoising Autoencoder is trained with ELU activation followed by dropout layers for the encoder layers and linear activation for the decoder layers. I then put 2 LSTM layers in between the last encoder and first decoder (i.e. the deepest layer in the autoencoder). Each Autoencoder layer reduces the dimension by 250. I then train the LSTM model with overlapping encoded data. Unsurprisingly the LSTM layers add a lot more capacity into the autoencoder.

To generate the model that will create new data, I feed the encoded data output from last encoder layer (i.e. before it's being processed by LSTM layers) onto layers of LSTM's, and then top it off with the LSTM layers from the autoencoder with frozen weights (i.e. untrainable). Instead of shifted blocks of input like I did last time, I instead use the shifted blocks of the output of the autoencoder after the LSTM layers.

I use 2 methods of generation: appended parts and whole generation. Appended parts appends the last block of each predictions into the results, while whole generation just use the whole output as the next input.

Part Generated:

Whole Generated:

I also tried different generation sequence length for comparison (listen with headphones):

You need the following dependencies:

Keras
Theano
numpy, scipy
pyyaml
HDF5 and h5py (optional, required if you use model saving/loading functions)
Optional but recommended if you use CNNs: cuDNN.

Before you run this script, you need to open configure the script by opening config.py

settings = {}
settings['source'] = './sources/sapo-160.wav' #this is the source wav material
settings['overlap'] = 2 #overlapping number for STFT
settings['coded-file'] = 'coded-file' #encoded output from autoencoder model
settings['lstm-file'] = 'lstm-file' #lstm model output
settings['phase-encoder'] = 'phase-encoder' #phase autoencoder weight file name
settings['magnitude-encoder'] = 'magnitude-encoder' #magnitude autoencoder weight file name
settings['phase-result'] = 'phase-result' #phase result file name
settings['magnitude-result'] = 'magnitude-result' #magnitude result file name
settings['lstm-epoch'] = 200
settings['ae-epoch'] = 200
settings['section-count'] = 8 #sample number
settings['ae-iteration'] = 40
settings['lstm-iteration'] = 30
settings['block-count'] = 175 #sample dimension
settings['layer-count'] = 4 #autoencoder layers count / 2
settings['sample-rate'] = 16000
settings['dim-decrease'] = 250 #dimensionality decrease for autoencoder
settings['load-weights'] = True #whether to load weights (usually for testing)
settings['dropout'] = 0.4 #dropout rate

After configuration, execute run.sh

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
autoencoder-weights		autoencoder-weights
coded-data		coded-data
lstm-weights		lstm-weights
results-data		results-data
results-wav		results-wav
sources		sources
LICENSE		LICENSE
README.md		README.md
coded-lstm-generator.py		coded-lstm-generator.py
coded-lstm-trainer.py		coded-lstm-trainer.py
config.py		config.py
decode-waveform.py		decode-waveform.py
decoder.py		decoder.py
encoder-lstm-decoder-full.py		encoder-lstm-decoder-full.py
encoder.py		encoder.py
finetuning-autoencoder.py		finetuning-autoencoder.py
layerwise-autoencoder.py		layerwise-autoencoder.py
lstm-autoencoder.py		lstm-autoencoder.py
nn_utils.py		nn_utils.py
run.sh		run.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JOANNA: Music Generation Using Weird Autoencoder and LSTM Architecture in Keras

About

Releases

Packages

Languages

License

ppramesi/JOANNA

Folders and files

Latest commit

History

Repository files navigation

JOANNA: Music Generation Using Weird Autoencoder and LSTM Architecture in Keras

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages