-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] tts_to_file gives TypeError: Invalid file: None #3067
Comments
@Aya-AlJafari can you check this ? |
speaker_wav="cloning/audio.wav" Try adding this two argument, Make the respective directory for speaker_wav and add sample audio file in .wav format. |
I already did that before making the issue. |
Hi @perrylets, can you please post the full log after executing this command:
because the missing |
Where are the logs? Is it just the console output? |
@perrylets yes the full output. |
I'm having the same error while using the tts.tts_with_vc_to_file() method even after adding the speaker_wav path, using xtts_v1 or xtts_v1.1 > Using model: xtts
> Text splitted to sentences.
['Experience has shown that it is not because you think this process is critical for you, that it is for your project.']
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[12], line 32
14
(...)
27 #tts = TTS(model_name="voice_conversion_models/multilingual/vctk/freevc24", progress_bar=False).to("cuda")
28 #tts.voice_conversion_to_file(source_wav="C:\\Users\\miker\\Downloads\\output_synth_"+now_string+".wav", target_wav="C:\\Users\\miker\\Downloads\\output_audio_"+now_string+".wav", file_path="C:\\Users\\miker\\Downloads\\output_cloned_"+now_string+".wav")
31 tts = TTS("tts_models/multilingual/multi-dataset/xtts_v1", gpu=True).to(device)
---> 32 tts.tts_with_vc_to_file(
33 text=translated_text,
34 speaker_wav="C:\\Users\\miker\\Downloads\\output_audio_"+now_string+".wav",
35 file_path="C:\\Users\\miker\\Downloads\\output_cloned_"+now_string+".wav",
36 language='en'
37 )
39 # Display audio widget to play the generated audio
40 audio_widget = Audio(filename="C:\\Users\\miker\\Downloads\\output_cloned_"+now_string+".wav", autoplay=False)
File ~\anaconda3\envs\colab\lib\site-packages\TTS\api.py:488, in TTS.tts_with_vc_to_file(self, text, language, speaker_wav, file_path)
469 def tts_with_vc_to_file(
470 self, text: str, language: str = None, speaker_wav: str = None, file_path: str = "output.wav"
471 ):
472 """Convert text to speech with voice conversion and save to file.
473
474 Check `tts_with_vc` for more details.
(...)
486 Output file path. Defaults to "output.wav".
487 """
--> 488 wav = self.tts_with_vc(text=text, language=language, speaker_wav=speaker_wav)
489 save_wav(wav=wav, path=file_path, sample_rate=self.voice_converter.vc_config.audio.output_sample_rate)
File ~\anaconda3\envs\colab\lib\site-packages\TTS\api.py:463, in TTS.tts_with_vc(self, text, language, speaker_wav)
444 """Convert text to speech with voice conversion.
445
446 It combines tts with voice conversion to fake voice cloning.
(...)
459 Defaults to None.
460 """
461 with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as fp:
462 # Lazy code... save it to a temp file to resample it while reading it for VC
--> 463 self.tts_to_file(text=text, speaker=None, language=language, file_path=fp.name)
464 if self.voice_converter is None:
465 self.load_vc_model_by_name("voice_conversion_models/multilingual/vctk/freevc24")
File ~\anaconda3\envs\colab\lib\site-packages\TTS\api.py:403, in TTS.tts_to_file(self, text, speaker, language, speaker_wav, emotion, speed, pipe_out, file_path, **kwargs)
393 if self.csapi is not None:
394 return self.tts_coqui_studio(
395 text=text,
396 speaker_name=speaker,
(...)
401 pipe_out=pipe_out,
402 )
--> 403 wav = self.tts(text=text, speaker=speaker, language=language, speaker_wav=speaker_wav, **kwargs)
404 self.synthesizer.save_wav(wav=wav, path=file_path, pipe_out=pipe_out)
405 return file_path
File ~\anaconda3\envs\colab\lib\site-packages\TTS\api.py:341, in TTS.tts(self, text, speaker, language, speaker_wav, emotion, speed, **kwargs)
337 if self.csapi is not None:
338 return self.tts_coqui_studio(
339 text=text, speaker_name=speaker, language=language, emotion=emotion, speed=speed
340 )
--> 341 wav = self.synthesizer.tts(
342 text=text,
343 speaker_name=speaker,
344 language_name=language,
345 speaker_wav=speaker_wav,
346 reference_wav=None,
347 style_wav=None,
348 style_text=None,
349 reference_speaker_name=None,
350 **kwargs,
351 )
352 return wav
File ~\anaconda3\envs\colab\lib\site-packages\TTS\utils\synthesizer.py:374, in Synthesizer.tts(self, text, speaker_name, language_name, speaker_wav, style_wav, style_text, reference_wav, reference_speaker_name, **kwargs)
372 for sen in sens:
373 if hasattr(self.tts_model, "synthesize"):
--> 374 outputs = self.tts_model.synthesize(
375 text=sen,
376 config=self.tts_config,
377 speaker_id=speaker_name,
378 voice_dirs=self.voice_dir,
379 d_vector=speaker_embedding,
380 speaker_wav=speaker_wav,
381 language=language_name,
382 **kwargs,
383 )
384 else:
385 # synthesize voice
386 outputs = synthesis(
387 model=self.tts_model,
388 text=sen,
(...)
396 language_id=language_id,
397 )
File ~\anaconda3\envs\colab\lib\site-packages\TTS\tts\models\xtts.py:462, in Xtts.synthesize(self, text, config, speaker_wav, language, **kwargs)
459 if isinstance(speaker_wav, list):
460 speaker_wav = speaker_wav[0]
--> 462 return self.inference_with_config(text, config, ref_audio_path=speaker_wav, language=language, **kwargs)
File ~\anaconda3\envs\colab\lib\site-packages\TTS\tts\models\xtts.py:484, in Xtts.inference_with_config(self, text, config, ref_audio_path, language, **kwargs)
472 settings = {
473 "temperature": config.temperature,
474 "length_penalty": config.length_penalty,
(...)
481 "decoder_sampler": config.decoder_sampler,
482 }
483 settings.update(kwargs) # allow overriding of preset settings with kwargs
--> 484 return self.full_inference(text, ref_audio_path, language, **settings)
File ~\anaconda3\envs\colab\lib\site-packages\torch\utils\_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)
File ~\anaconda3\envs\colab\lib\site-packages\TTS\tts\models\xtts.py:570, in Xtts.full_inference(self, text, ref_audio_path, language, temperature, length_penalty, repetition_penalty, top_k, top_p, gpt_cond_len, do_sample, decoder_iterations, cond_free, cond_free_k, diffusion_temperature, decoder_sampler, decoder, **hf_generate_kwargs)
486 @torch.inference_mode()
487 def full_inference(
488 self,
(...)
507 **hf_generate_kwargs,
508 ):
509 """
510 This function produces an audio clip of the given text being spoken with the given reference voice.
511
(...)
564 Sample rate is 24kHz.
565 """
566 (
567 gpt_cond_latent,
568 diffusion_conditioning,
569 speaker_embedding
--> 570 ) = self.get_conditioning_latents(audio_path=ref_audio_path, gpt_cond_len=gpt_cond_len)
571 return self.inference(
572 text,
573 language,
(...)
589 **hf_generate_kwargs,
590 )
File ~\anaconda3\envs\colab\lib\site-packages\TTS\tts\models\xtts.py:435, in Xtts.get_conditioning_latents(self, audio_path, gpt_cond_len)
433 diffusion_cond_latents = None
434 if self.args.use_hifigan:
--> 435 speaker_embedding = self.get_speaker_embedding(audio_path)
436 else:
437 diffusion_cond_latents = self.get_diffusion_cond_latents(audio_path)
File ~\anaconda3\envs\colab\lib\site-packages\torch\utils\_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)
File ~\anaconda3\envs\colab\lib\site-packages\TTS\tts\models\xtts.py:421, in Xtts.get_speaker_embedding(self, audio_path)
416 @torch.inference_mode()
417 def get_speaker_embedding(
418 self,
419 audio_path
420 ):
--> 421 audio = load_audio(audio_path, self.hifigan_decoder.speaker_encoder_audio_config["sample_rate"])
422 speaker_embedding = self.hifigan_decoder.speaker_encoder.forward(
423 audio.to(self.device), l2_norm=True
424 ).unsqueeze(-1).to(self.device)
425 return speaker_embedding
File ~\anaconda3\envs\colab\lib\site-packages\TTS\tts\models\xtts.py:34, in load_audio(audiopath, sr)
23 def load_audio(audiopath, sr=22050):
24 """
25 Load an audio file from disk and resample it to the specified sampling rate.
26
(...)
32 Tensor: Audio waveform tensor with shape (1, T), where T is the number of samples.
33 """
---> 34 audio, sampling_rate = torchaudio.load(audiopath)
36 if len(audio.shape) > 1:
37 if audio.shape[0] < 5:
File ~\anaconda3\envs\colab\lib\site-packages\torchaudio\backend\soundfile_backend.py:221, in load(filepath, frame_offset, num_frames, normalize, channels_first, format)
139 @_requires_soundfile
140 def load(
141 filepath: str,
(...)
146 format: Optional[str] = None,
147 ) -> Tuple[torch.Tensor, int]:
148 """Load audio data from file.
149
150 Note:
(...)
219 `[channel, time]` else `[time, channel]`.
220 """
--> 221 with soundfile.SoundFile(filepath, "r") as file_:
222 if file_.format != "WAV" or normalize:
223 dtype = "float32"
File ~\anaconda3\envs\colab\lib\site-packages\soundfile.py:658, in SoundFile.__init__(self, file, mode, samplerate, channels, subtype, endian, format, closefd)
655 self._mode = mode
656 self._info = _create_info_struct(file, mode, samplerate, channels,
657 format, subtype, endian)
--> 658 self._file = self._open(file, mode_int, closefd)
659 if set(mode).issuperset('r+') and self.seekable():
660 # Move write position to 0 (like in Python file objects)
661 self.seek(0)
File ~\anaconda3\envs\colab\lib\site-packages\soundfile.py:1212, in SoundFile._open(self, file, mode_int, closefd)
1209 file_ptr = _snd.sf_open_virtual(self._init_virtual_io(file),
1210 mode_int, self._info, _ffi.NULL)
1211 else:
-> 1212 raise TypeError("Invalid file: {0!r}".format(self.name))
1213 if file_ptr == _ffi.NULL:
1214 # get the actual error code
1215 err = _snd.sf_error(file_ptr)
TypeError: Invalid file: None``` |
Hi, same error for me using main.py : from TTS.api import TTS
tts = TTS(model_name="tts_models/multilingual/multi-dataset/xtts_v1.1", progress_bar=True).to("cpu")
tts.tts_with_vc_to_file(
text="Hi guys how are you ?",
speaker_wav="TTS/real_audio_sample/me_speaking.wav",
file_path="output.wav",
language="en"
) Full output : > tts_models/multilingual/multi-dataset/xtts_v1.1 is already downloaded.
> Using model: xtts
/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
Traceback (most recent call last):
File "/Users/XXX/PycharmProjects/coquiTTS/main.py", line 9, in <module>
tts.tts_with_vc_to_file(
File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/TTS/api.py", line 488, in tts_with_vc_to_file
wav = self.tts_with_vc(text=text, language=language, speaker_wav=speaker_wav)
File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/TTS/api.py", line 463, in tts_with_vc
self.tts_to_file(text=text, speaker=None, language=language, file_path=fp.name)
File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/TTS/api.py", line 403, in tts_to_file
wav = self.tts(text=text, speaker=speaker, language=language, speaker_wav=speaker_wav, **kwargs)
File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/TTS/api.py", line 341, in tts
wav = self.synthesizer.tts(
File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/TTS/utils/synthesizer.py", line 374, in tts
outputs = self.tts_model.synthesize(
File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/TTS/tts/models/xtts.py", line 462, in synthesize
return self.inference_with_config(text, config, ref_audio_path=speaker_wav, language=language, **kwargs)
File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/TTS/tts/models/xtts.py", line 484, in inference_with_config
return self.full_inference(text, ref_audio_path, language, **settings)
File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/TTS/tts/models/xtts.py", line 570, in full_inference
) = self.get_conditioning_latents(audio_path=ref_audio_path, gpt_cond_len=gpt_cond_len)
File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/TTS/tts/models/xtts.py", line 435, in get_conditioning_latents
speaker_embedding = self.get_speaker_embedding(audio_path)
File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/TTS/tts/models/xtts.py", line 421, in get_speaker_embedding
audio = load_audio(audio_path, self.hifigan_decoder.speaker_encoder_audio_config["sample_rate"])
File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/TTS/tts/models/xtts.py", line 34, in load_audio
audio, sampling_rate = torchaudio.load(audiopath)
File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/torchaudio/_backend/utils.py", line 203, in load
return backend.load(uri, frame_offset, num_frames, normalize, channels_first, format, buffer_size)
File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/torchaudio/_backend/soundfile.py", line 26, in load
return soundfile_backend.load(uri, frame_offset, num_frames, normalize, channels_first, format)
File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/torchaudio/_backend/soundfile_backend.py", line 221, in load
with soundfile.SoundFile(filepath, "r") as file_:
File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/soundfile.py", line 658, in __init__
self._file = self._open(file, mode_int, closefd)
File "/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/soundfile.py", line 1212, in _open
raise TypeError("Invalid file: {0!r}".format(self.name))
TypeError: Invalid file: None Hope it'll help 🤞 [EDIT] It (kind of) worked when putting directly the file_path in soundfile.SoundFile() in venv/lib/python3.9/site-packages/torchaudio/_backend/soundfile_backend.py Line 221 : with soundfile.SoundFile(filepath, "r") as file_: ⤵️
with soundfile.SoundFile('MY_FILE_PATH.wav', "r") as file_: Output : > tts_models/multilingual/multi-dataset/xtts_v1.1 is already downloaded.
> Using model: xtts
/Users/XXX/PycharmProjects/coquiTTS/venv/lib/python3.9/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
> Text splitted to sentences.
['Hi guys how are you ?']
> Processing time: 43.66520690917969
> Real-time factor: 1.7740076433982859
> voice_conversion_models/multilingual/vctk/freevc24 is already downloaded.
> Using model: freevc
> Loading pretrained speaker encoder model ...
Loaded the voice encoder model on cpu in 0.01 seconds. |
Hi, how to load model from local path?
TypeError: expected str, bytes or os.PathLike object, not NoneType |
This reverts commit 041b4b6. Fixes coqui-ai#3143. The original issue (coqui-ai#3067) was people trying to use tts.tts_with_vc_to_file() with XTTS and was "fixed" in coqui-ai#3109. But XTTS has integrated VC and you can just do tts.tts_to_file(..., speaker_wav="..."), there is no point in passing it through FreeVC afterwards. So, reverting this commit because it breaks tts.tts_with_vc_to_file() for any model that doesn't have integrated VC, i.e. all models this method is meant for.
* Revert "fix for issue 3067" This reverts commit 041b4b6. Fixes #3143. The original issue (#3067) was people trying to use tts.tts_with_vc_to_file() with XTTS and was "fixed" in #3109. But XTTS has integrated VC and you can just do tts.tts_to_file(..., speaker_wav="..."), there is no point in passing it through FreeVC afterwards. So, reverting this commit because it breaks tts.tts_with_vc_to_file() for any model that doesn't have integrated VC, i.e. all models this method is meant for. * fix: support multi-speaker models in tts_with_vc/tts_with_vc_to_file * fix: only compute spk embeddings for models that support it Fixes #1440. Passing a `speaker_wav` argument to regular Vits models failed because they don't support voice cloning. Now that argument is simply ignored.
Describe the bug
When using the xtts-1 model on windows (python 3.11.6), every time I run the
tts_to_file
function, it gives the errorTypeError: Invalid file: None
To Reproduce
On windows with python 3.11.6, with torch, torchaudio (not sure if needed, but just to be sure) and TTS installed, run this snippet
Expected behavior
The audio output should be written to output.wav, or the specified file name.
Logs
No response
Environment
Additional context
No response
The text was updated successfully, but these errors were encountered: