Research papers CNN ARCHITECTURES FOR LARGE-SCALE AUDIO CLASSIFICATION UTTERANCE-LEVEL AGGREGATION FOR SPEAKER RECOGNITION IN THE WILD SPEAKER DIARIZATION WITH LSTM SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing WAVE-U-NET: A MULTI-SCALE NEURAL NETWORK FOR END-TO-END AUDIO SOURCE SEPARATION Deep Speaker: an End-to-End Neural Speaker Embedding System X-VECTORS: ROBUST DNN EMBEDDINGS FOR SPEAKER RECOGNITION