[Bug] tts_to_file gives TypeError: Invalid file: None #3067

perrylets · 2023-10-13T14:15:20Z

Describe the bug

When using the xtts-1 model on windows (python 3.11.6), every time I run the tts_to_file function, it gives the error TypeError: Invalid file: None

To Reproduce

On windows with python 3.11.6, with torch, torchaudio (not sure if needed, but just to be sure) and TTS installed, run this snippet

import torch
from TTS.api import TTS
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v1").to("cuda" if toch.cuda.is_available() else "cpu")
# any combination of parameters gives the same error.
tts.tts_to_file("Hello, world!", language="en") # Expected error: "TypeError: Invalid file: None"

Expected behavior

The audio output should be written to output.wav, or the specified file name.

Logs

No response

Environment

- 🐸TTS Version: 0.17.8
- PyTorch Version: 2.1.0+cpu
- Python Version: 3.11.6
- OS: Windows 11
- CUDA/cuDNN version: null
- GPU models and configuration: AMD Ryzen 7 5700G with Radeon Graphics
- How you installed PyTorch: pip on a virtual environment

Additional context

No response

The text was updated successfully, but these errors were encountered:

erogol · 2023-10-16T09:56:36Z

@Aya-AlJafari can you check this ?

taha9881 · 2023-10-16T11:07:10Z

speaker_wav="cloning/audio.wav"
file_path="output.wav"

Try adding this two argument, Make the respective directory for speaker_wav and add sample audio file in .wav format.

perrylets · 2023-10-16T16:43:33Z

I already did that before making the issue.

Aya-AlJafari · 2023-10-17T01:27:12Z

Hi @perrylets, can you please post the full log after executing this command:

tts.tts_to_file("Hello, world!",file_path="output.wav", speaker_wav="path/to/wavefile", language="en")

because the missing speaker_wav in tts.tts_to_file("Hello, world!", language="en") is the source of the None error, and the alternative above should fix it. I'm curious to see the log if it's still not working on your side.

perrylets · 2023-10-17T01:28:46Z

Where are the logs? Is it just the console output?

Aya-AlJafari · 2023-10-17T01:29:41Z

@perrylets yes the full output.

Mikerhinos · 2023-10-21T13:33:49Z

I'm having the same error while using the tts.tts_with_vc_to_file() method even after adding the speaker_wav path, using xtts_v1 or xtts_v1.1
Full output :

 > Using model: xtts
 > Text splitted to sentences.
['Experience has shown that it is not because you think this process is critical for you, that it is for your project.']
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[12], line 32
     14 
   (...)
     27 #tts = TTS(model_name="voice_conversion_models/multilingual/vctk/freevc24", progress_bar=False).to("cuda")
     28 #tts.voice_conversion_to_file(source_wav="C:\\Users\\miker\\Downloads\\output_synth_"+now_string+".wav", target_wav="C:\\Users\\miker\\Downloads\\output_audio_"+now_string+".wav", file_path="C:\\Users\\miker\\Downloads\\output_cloned_"+now_string+".wav")
     31 tts = TTS("tts_models/multilingual/multi-dataset/xtts_v1", gpu=True).to(device)
---> 32 tts.tts_with_vc_to_file(
     33     text=translated_text,
     34     speaker_wav="C:\\Users\\miker\\Downloads\\output_audio_"+now_string+".wav",
     35     file_path="C:\\Users\\miker\\Downloads\\output_cloned_"+now_string+".wav",
     36     language='en'
     37 )
     39 # Display audio widget to play the generated audio
     40 audio_widget = Audio(filename="C:\\Users\\miker\\Downloads\\output_cloned_"+now_string+".wav", autoplay=False)

File ~\anaconda3\envs\colab\lib\site-packages\TTS\api.py:488, in TTS.tts_with_vc_to_file(self, text, language, speaker_wav, file_path)
    469 def tts_with_vc_to_file(
    470     self, text: str, language: str = None, speaker_wav: str = None, file_path: str = "output.wav"
    471 ):
    472     """Convert text to speech with voice conversion and save to file.
    473 
    474     Check `tts_with_vc` for more details.
   (...)
    486             Output file path. Defaults to "output.wav".
    487     """
--> 488     wav = self.tts_with_vc(text=text, language=language, speaker_wav=speaker_wav)
    489     save_wav(wav=wav, path=file_path, sample_rate=self.voice_converter.vc_config.audio.output_sample_rate)

File ~\anaconda3\envs\colab\lib\site-packages\TTS\api.py:463, in TTS.tts_with_vc(self, text, language, speaker_wav)
    444 """Convert text to speech with voice conversion.
    445 
    446 It combines tts with voice conversion to fake voice cloning.
   (...)
    459         Defaults to None.
    460 """
    461 with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as fp:
    462     # Lazy code... save it to a temp file to resample it while reading it for VC
--> 463     self.tts_to_file(text=text, speaker=None, language=language, file_path=fp.name)
    464 if self.voice_converter is None:
    465     self.load_vc_model_by_name("voice_conversion_models/multilingual/vctk/freevc24")

File ~\anaconda3\envs\colab\lib\site-packages\TTS\api.py:403, in TTS.tts_to_file(self, text, speaker, language, speaker_wav, emotion, speed, pipe_out, file_path, **kwargs)
    393 if self.csapi is not None:
    394     return self.tts_coqui_studio(
    395         text=text,
    396         speaker_name=speaker,
   (...)
    401         pipe_out=pipe_out,
    402     )
--> 403 wav = self.tts(text=text, speaker=speaker, language=language, speaker_wav=speaker_wav, **kwargs)
    404 self.synthesizer.save_wav(wav=wav, path=file_path, pipe_out=pipe_out)
    405 return file_path

File ~\anaconda3\envs\colab\lib\site-packages\TTS\api.py:341, in TTS.tts(self, text, speaker, language, speaker_wav, emotion, speed, **kwargs)
    337 if self.csapi is not None:
    338     return self.tts_coqui_studio(
    339         text=text, speaker_name=speaker, language=language, emotion=emotion, speed=speed
    340     )
--> 341 wav = self.synthesizer.tts(
    342     text=text,
    343     speaker_name=speaker,
    344     language_name=language,
    345     speaker_wav=speaker_wav,
    346     reference_wav=None,
    347     style_wav=None,
    348     style_text=None,
    349     reference_speaker_name=None,
    350     **kwargs,
    351 )
    352 return wav

File ~\anaconda3\envs\colab\lib\site-packages\TTS\utils\synthesizer.py:374, in Synthesizer.tts(self, text, speaker_name, language_name, speaker_wav, style_wav, style_text, reference_wav, reference_speaker_name, **kwargs)
    372 for sen in sens:
    373     if hasattr(self.tts_model, "synthesize"):
--> 374         outputs = self.tts_model.synthesize(
    375             text=sen,
    376             config=self.tts_config,
    377             speaker_id=speaker_name,
    378             voice_dirs=self.voice_dir,
    379             d_vector=speaker_embedding,
    380             speaker_wav=speaker_wav,
    381             language=language_name,
    382             **kwargs,
    383         )
    384     else:
    385         # synthesize voice
    386         outputs = synthesis(
    387             model=self.tts_model,
    388             text=sen,
   (...)
    396             language_id=language_id,
    397         )

File ~\anaconda3\envs\colab\lib\site-packages\TTS\tts\models\xtts.py:462, in Xtts.synthesize(self, text, config, speaker_wav, language, **kwargs)
    459 if isinstance(speaker_wav, list):
    460     speaker_wav = speaker_wav[0]
--> 462 return self.inference_with_config(text, config, ref_audio_path=speaker_wav, language=language, **kwargs)

File ~\anaconda3\envs\colab\lib\site-packages\TTS\tts\models\xtts.py:484, in Xtts.inference_with_config(self, text, config, ref_audio_path, language, **kwargs)
    472 settings = {
    473     "temperature": config.temperature,
    474     "length_penalty": config.length_penalty,
   (...)
    481     "decoder_sampler": config.decoder_sampler,
    482 }
    483 settings.update(kwargs)  # allow overriding of preset settings with kwargs
--> 484 return self.full_inference(text, ref_audio_path, language, **settings)

File ~\anaconda3\envs\colab\lib\site-packages\torch\utils\_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File ~\anaconda3\envs\colab\lib\site-packages\TTS\tts\models\xtts.py:570, in Xtts.full_inference(self, text, ref_audio_path, language, temperature, length_penalty, repetition_penalty, top_k, top_p, gpt_cond_len, do_sample, decoder_iterations, cond_free, cond_free_k, diffusion_temperature, decoder_sampler, decoder, **hf_generate_kwargs)
    486 @torch.inference_mode()
    487 def full_inference(
    488     self,
   (...)
    507     **hf_generate_kwargs,
    508 ):
    509     """
    510     This function produces an audio clip of the given text being spoken with the given reference voice.
    511 
   (...)
    564         Sample rate is 24kHz.
    565     """
    566     (
    567         gpt_cond_latent,
    568         diffusion_conditioning,
    569         speaker_embedding
--> 570     ) = self.get_conditioning_latents(audio_path=ref_audio_path, gpt_cond_len=gpt_cond_len)
    571     return self.inference(
    572         text,
    573         language,
   (...)
    589         **hf_generate_kwargs,
    590     )

File ~\anaconda3\envs\colab\lib\site-packages\TTS\tts\models\xtts.py:435, in Xtts.get_conditioning_latents(self, audio_path, gpt_cond_len)
    433 diffusion_cond_latents = None
    434 if self.args.use_hifigan:
--> 435     speaker_embedding = self.get_speaker_embedding(audio_path)
    436 else:
    437     diffusion_cond_latents = self.get_diffusion_cond_latents(audio_path)

File ~\anaconda3\envs\colab\lib\site-packages\torch\utils\_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File ~\anaconda3\envs\colab\lib\site-packages\TTS\tts\models\xtts.py:421, in Xtts.get_speaker_embedding(self, audio_path)
    416 @torch.inference_mode()
    417 def get_speaker_embedding(
    418     self,
    419     audio_path
    420 ):
--> 421     audio = load_audio(audio_path, self.hifigan_decoder.speaker_encoder_audio_config["sample_rate"])
    422     speaker_embedding = self.hifigan_decoder.speaker_encoder.forward(
    423         audio.to(self.device), l2_norm=True
    424     ).unsqueeze(-1).to(self.device)
    425     return speaker_embedding

File ~\anaconda3\envs\colab\lib\site-packages\TTS\tts\models\xtts.py:34, in load_audio(audiopath, sr)
     23 def load_audio(audiopath, sr=22050):
     24     """
     25     Load an audio file from disk and resample it to the specified sampling rate.
     26 
   (...)
     32         Tensor: Audio waveform tensor with shape (1, T), where T is the number of samples.
     33     """
---> 34     audio, sampling_rate = torchaudio.load(audiopath)
     36     if len(audio.shape) > 1:
     37         if audio.shape[0] < 5:

File ~\anaconda3\envs\colab\lib\site-packages\torchaudio\backend\soundfile_backend.py:221, in load(filepath, frame_offset, num_frames, normalize, channels_first, format)
    139 @_requires_soundfile
    140 def load(
    141     filepath: str,
   (...)
    146     format: Optional[str] = None,
    147 ) -> Tuple[torch.Tensor, int]:
    148     """Load audio data from file.
    149 
    150     Note:
   (...)
    219             `[channel, time]` else `[time, channel]`.
    220     """
--> 221     with soundfile.SoundFile(filepath, "r") as file_:
    222         if file_.format != "WAV" or normalize:
    223             dtype = "float32"

File ~\anaconda3\envs\colab\lib\site-packages\soundfile.py:658, in SoundFile.__init__(self, file, mode, samplerate, channels, subtype, endian, format, closefd)
    655 self._mode = mode
    656 self._info = _create_info_struct(file, mode, samplerate, channels,
    657                                  format, subtype, endian)
--> 658 self._file = self._open(file, mode_int, closefd)
    659 if set(mode).issuperset('r+') and self.seekable():
    660     # Move write position to 0 (like in Python file objects)
    661     self.seek(0)

File ~\anaconda3\envs\colab\lib\site-packages\soundfile.py:1212, in SoundFile._open(self, file, mode_int, closefd)
   1209     file_ptr = _snd.sf_open_virtual(self._init_virtual_io(file),
   1210                                     mode_int, self._info, _ffi.NULL)
   1211 else:
-> 1212     raise TypeError("Invalid file: {0!r}".format(self.name))
   1213 if file_ptr == _ffi.NULL:
   1214     # get the actual error code
   1215     err = _snd.sf_error(file_ptr)

TypeError: Invalid file: None```

gorip1 · 2023-10-22T00:22:46Z

Hi, same error for me using xtts_v1.1 & tts.tts_with_vc_to_file()

main.py :

from TTS.api import TTS
tts = TTS(model_name="tts_models/multilingual/multi-dataset/xtts_v1.1", progress_bar=True).to("cpu")
tts.tts_with_vc_to_file(
    text="Hi guys how are you ?",
    speaker_wav="TTS/real_audio_sample/me_speaking.wav",
    file_path="output.wav",
    language="en"
)

Full output :

 > tts_models/multilingual/multi-dataset/xtts_v1.1 is already downloaded.
 > Using model: xtts
/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
Traceback (most recent call last):
  File "/Users/XXX/PycharmProjects/coquiTTS/main.py", line 9, in <module>
    tts.tts_with_vc_to_file(
  File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/TTS/api.py", line 488, in tts_with_vc_to_file
    wav = self.tts_with_vc(text=text, language=language, speaker_wav=speaker_wav)
  File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/TTS/api.py", line 463, in tts_with_vc
    self.tts_to_file(text=text, speaker=None, language=language, file_path=fp.name)
  File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/TTS/api.py", line 403, in tts_to_file
    wav = self.tts(text=text, speaker=speaker, language=language, speaker_wav=speaker_wav, **kwargs)
  File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/TTS/api.py", line 341, in tts
    wav = self.synthesizer.tts(
  File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/TTS/utils/synthesizer.py", line 374, in tts
    outputs = self.tts_model.synthesize(
  File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/TTS/tts/models/xtts.py", line 462, in synthesize
    return self.inference_with_config(text, config, ref_audio_path=speaker_wav, language=language, **kwargs)
  File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/TTS/tts/models/xtts.py", line 484, in inference_with_config
    return self.full_inference(text, ref_audio_path, language, **settings)
  File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/TTS/tts/models/xtts.py", line 570, in full_inference
    ) = self.get_conditioning_latents(audio_path=ref_audio_path, gpt_cond_len=gpt_cond_len)
  File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/TTS/tts/models/xtts.py", line 435, in get_conditioning_latents
    speaker_embedding = self.get_speaker_embedding(audio_path)
  File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/TTS/tts/models/xtts.py", line 421, in get_speaker_embedding
    audio = load_audio(audio_path, self.hifigan_decoder.speaker_encoder_audio_config["sample_rate"])
  File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/TTS/tts/models/xtts.py", line 34, in load_audio
    audio, sampling_rate = torchaudio.load(audiopath)
  File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/torchaudio/_backend/utils.py", line 203, in load
    return backend.load(uri, frame_offset, num_frames, normalize, channels_first, format, buffer_size)
  File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/torchaudio/_backend/soundfile.py", line 26, in load
    return soundfile_backend.load(uri, frame_offset, num_frames, normalize, channels_first, format)
  File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/torchaudio/_backend/soundfile_backend.py", line 221, in load
    with soundfile.SoundFile(filepath, "r") as file_:
  File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/soundfile.py", line 658, in __init__
    self._file = self._open(file, mode_int, closefd)
  File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/soundfile.py", line 1212, in _open
    raise TypeError("Invalid file: {0!r}".format(self.name))
TypeError: Invalid file: None

Hope it'll help 🤞

[EDIT]

It (kind of) worked when putting directly the file_path in soundfile.SoundFile() in venv/lib/python3.9/site-packages/torchaudio/_backend/soundfile_backend.py
So the error should be somewhere in between !

Line 221 :

with soundfile.SoundFile(filepath, "r") as file_: ⤵️
with soundfile.SoundFile('MY_FILE_PATH.wav', "r") as file_:

Output :

 > tts_models/multilingual/multi-dataset/xtts_v1.1 is already downloaded.
 > Using model: xtts
/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
 > Text splitted to sentences.
['Hi guys how are you ?']
 > Processing time: 43.66520690917969
 > Real-time factor: 1.7740076433982859
 > voice_conversion_models/multilingual/vctk/freevc24 is already downloaded.
 > Using model: freevc
 > Loading pretrained speaker encoder model ...
Loaded the voice encoder model on cpu in 0.01 seconds.

lucasjinreal · 2023-11-10T05:33:59Z

Hi, how to load model from local path?

p = os.fspath(p)

TypeError: expected str, bytes or os.PathLike object, not NoneType

This reverts commit 041b4b6. Fixes coqui-ai#3143. The original issue (coqui-ai#3067) was people trying to use tts.tts_with_vc_to_file() with XTTS and was "fixed" in coqui-ai#3109. But XTTS has integrated VC and you can just do tts.tts_to_file(..., speaker_wav="..."), there is no point in passing it through FreeVC afterwards. So, reverting this commit because it breaks tts.tts_with_vc_to_file() for any model that doesn't have integrated VC, i.e. all models this method is meant for.

* Revert "fix for issue 3067" This reverts commit 041b4b6. Fixes #3143. The original issue (#3067) was people trying to use tts.tts_with_vc_to_file() with XTTS and was "fixed" in #3109. But XTTS has integrated VC and you can just do tts.tts_to_file(..., speaker_wav="..."), there is no point in passing it through FreeVC afterwards. So, reverting this commit because it breaks tts.tts_with_vc_to_file() for any model that doesn't have integrated VC, i.e. all models this method is meant for. * fix: support multi-speaker models in tts_with_vc/tts_with_vc_to_file * fix: only compute spk embeddings for models that support it Fixes #1440. Passing a `speaker_wav` argument to regular Vits models failed because they don't support voice cloning. Now that argument is simply ignored.

perrylets added the bug Something isn't working label Oct 13, 2023

erogol assigned Aya-AlJafari Oct 16, 2023

This was referenced Oct 26, 2023

fix for issue 3067 #3108

Closed

fix for issue 3067 #3109

Merged

Aya-AlJafari closed this as completed Oct 30, 2023

eginhard mentioned this issue Nov 20, 2023

[Bug] AttributeError: 'NoneType' object has no attribute 'load_wav' when using tts_with_vc_to_file #3143

Closed

This was referenced Nov 20, 2023

Fix tts_with_vc #3275

Merged

Indicate whether TTS model supports voice cloning #3293

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] tts_to_file gives TypeError: Invalid file: None #3067

[Bug] tts_to_file gives TypeError: Invalid file: None #3067

perrylets commented Oct 13, 2023

erogol commented Oct 16, 2023

taha9881 commented Oct 16, 2023

perrylets commented Oct 16, 2023

Aya-AlJafari commented Oct 17, 2023

perrylets commented Oct 17, 2023

Aya-AlJafari commented Oct 17, 2023

Mikerhinos commented Oct 21, 2023 •

edited

Loading

gorip1 commented Oct 22, 2023 •

edited

Loading

lucasjinreal commented Nov 10, 2023

[Bug] tts_to_file gives TypeError: Invalid file: None #3067

[Bug] tts_to_file gives TypeError: Invalid file: None #3067

Comments

perrylets commented Oct 13, 2023

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

erogol commented Oct 16, 2023

taha9881 commented Oct 16, 2023

perrylets commented Oct 16, 2023

Aya-AlJafari commented Oct 17, 2023

perrylets commented Oct 17, 2023

Aya-AlJafari commented Oct 17, 2023

Mikerhinos commented Oct 21, 2023 • edited Loading

gorip1 commented Oct 22, 2023 • edited Loading

lucasjinreal commented Nov 10, 2023

Mikerhinos commented Oct 21, 2023 •

edited

Loading

gorip1 commented Oct 22, 2023 •

edited

Loading