Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation? #306

Open
ignacio82 opened this issue Dec 11, 2023 · 25 comments
Open

Documentation? #306

ignacio82 opened this issue Dec 11, 2023 · 25 comments

Comments

@ignacio82
Copy link

I'm trying to follow https://github.com/rhasspy/piper/blob/master/TRAINING.md and I got the step of getting into the container you recommend. However, it is not clear from the documentation if I have to install additonal things, or if I have to go to some given directory. This is what I did:

docker run -ti -v /home/ignacio/out-train:/train -v /home/ignacio/piper-checkpoints:/piper-checkpoints cloning bash

=============
== PyTorch ==
=============

NVIDIA Release 22.03 (build 33569136)
PyTorch Version 1.12.0a0+2c916ef

Container image Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Copyright (c) 2014-2022 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU                      (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006      Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015      Google Inc.
Copyright (c) 2015      Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for PyTorch.  NVIDIA recommends the use of the following flags:
   docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...

root@ab644ef6e36c:/workspace# python3 -m piper_train \
>     --dataset-dir /train/ \
>     --accelerator 'gpu' \
>     --devices 1 \
>     --batch-size 32 \
>     --validation-split 0.0 \
>     --num-test-examples 0 \
>     --max_epochs 10000 \
>     --resume_from_checkpoint /piper-checkpoints/epoch=4641-step=3104302.ckpt \
>     --checkpoint-epochs 1 \
>     --precision 32
/opt/conda/bin/python3: No module named piper_train

I'm probably missing something obvious, but it would be great if that obvious step was documented. Thanks!

@aaronnewsome
Copy link

Have you installed the NVIDIA container toolkit? Does the sample NVIDIA container run and detect the GPU?

docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

@ignacio82
Copy link
Author

ignacio@xps:~$ docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu
5e8117c0bd28: Pull complete 
Digest: sha256:8eab65df33a6de2844c9aefd19efe8ddb87b7df5e9185a4ab73af936225685bb
Status: Downloaded newer image for ubuntu:latest
Tue Dec 12 03:14:08 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1060 6GB    Off | 00000000:01:00.0  On |                  N/A |
| 43%   40C    P0              24W / 120W |    587MiB /  6144MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

@aaronnewsome
Copy link

In your original post, I don't see "--runtime=nvidia --gpus all" in your docker command

@ignacio82
Copy link
Author

$ docker run --runtime=nvidia --gpus all -ti -v /home/ignacio/out-train:/train -v /home/ignacio/piper-checkpoints:/piper-checkpoints cloning bash

=============
== PyTorch ==
=============

NVIDIA Release 22.03 (build 33569136)
PyTorch Version 1.12.0a0+2c916ef

Container image Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Copyright (c) 2014-2022 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU                      (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006      Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015      Google Inc.
Copyright (c) 2015      Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for PyTorch.  NVIDIA recommends the use of the following flags:
   docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...

root@ac304033e8fc:/workspace# python3 -m piper_train \
> >     --dataset-dir /train/ \
> >     --accelerator 'gpu' \
> >     --devices 1 \
> >     --batch-size 32 \
> >     --validation-split 0.0 \
> >     --num-test-examples 0 \
> >     --max_epochs 10000 \
> >     --resume_from_checkpoint /piper-checkpoints/epoch=4641-step=3104302.ckpt \
> >     --checkpoint-epochs 1 \
> >     --precision 32
/opt/conda/bin/python3: No module named piper_train
root@ac304033e8fc:/workspace# 

No difference

@aaronnewsome
Copy link

Me personally, I didn't see much advantage to running training in docker so I just ran the training in my bare metal host OS. But I was curious why you're having so much trouble. Your docker command shows that you're running a docker container called cloning. My guess is that container doesn't have piper installed so, as expected, you can't run piper_train in that container. You could either install piper modules into that container interactively or build a new container.

I looked at TRAINING.md and it recommends the following:

It is highly recommended to train with the following Dockerfile:

FROM nvcr.io/nvidia/pytorch:22.03-py3
RUN pip3 install  'pytorch-lightning'
ENV NUMBA_CACHE_DIR=.numba_cache

But this seems incomplete to me too, since the container built from that Dockerfile also doesn't have the tools installed. If you build a container from the above Dockerfile, run it and do pip list from the command line, you'll see there's no piper-* modules.

My guess is, the 3 lines in the above Dockerfile will not build a container that's ready to do piper training. There are numerous ways to fix this. Here's a couple:

  • Add the lines required to the Dockerfile and rebuild the container
  • Provide another -v option in your docker run command to mount the .venv directory from your host and activate it in the container
  • Start the container as you've already done and then follow the instructions at the top of TRAINING.md, that is, install and configure the piper environment.

Either way, once the piper environment is configured in the container, you should be able to run piper_train in your container.

@ignacio82
Copy link
Author

re baremetal: I tried, and got errors, so I was hoping doing it inside a container would resolve that.
re Dockerfile: Yes, that is one of the issues with the documentation I think. I tried modifying the Dockerfile as follows but I'm clearly missing something:

FROM nvcr.io/nvidia/pytorch:22.03-py3

# Install system dependencies
RUN apt-get update && apt-get install -y python3-dev \
    && rm -rf /var/lib/apt/lists/*

# Clone the repository
RUN git clone --depth 1 https://github.com/rhasspy/piper.git /piper

# Set the working directory
WORKDIR /piper/src/python

# Create and activate virtual environment
RUN python3 -m venv .venv \
    && . .venv/bin/activate \
    && pip install --upgrade pip setuptools wheel \
    && pip install -e . 'pytorch-lightning'

# Deactivate the virtual environment
RUN deactivate

# Set environment variable
ENV NUMBA_CACHE_DIR=.numba_cache
$ docker build . -t cloning

...
11.19 ERROR: No matching distribution found for piper-phonemize~=1.1.0
------
Dockerfile:14
--------------------
  13 |     # Create and activate virtual environment
  14 | >>> RUN python3 -m venv .venv \
  15 | >>>     && . .venv/bin/activate \
  16 | >>>     && pip install --upgrade pip setuptools wheel \
  17 | >>>     && pip install -e . 'pytorch-lightning'
  18 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c python3 -m venv .venv     && . .venv/bin/activate     && pip install --upgrade pip setuptools wheel     && pip install -e . 'pytorch-lightning'" did not complete successfully: exit code: 1

@aaronnewsome
Copy link

Well the TRAINING.md Dockerfile example starts with:

FROM nvcr.io/nvidia/pytorch:22.03-py3

And if you run that container, and run python --version, you'll see it's python 3.8, which I know from experience can't build the piper modules. I've settled on using Python 3.10 since 3.9, 3.11 and 3.12 all didn't work for me.

This error specifically: 11.19 ERROR: No matching distribution found for piper-phonemize~=1.1.0

Seems to happen with python 3.8

I've created a new Dockerfile which installs Python 3.10. It's still building, but once it does, I can verify that piper_train runs properly with GPU support.

@aaronnewsome
Copy link

Ok, that worked after rebuilding the container with python 3.10. Specifically I added the following in the Dockerfile before installing piper or creating the venv:

RUN mkdir -pv /usr/src/python
WORKDIR /usr/src/python
RUN wget https://www.python.org/ftp/python/3.10.13/Python-3.10.13.tgz
RUN tar zxvf Python-3.10.13.tgz
RUN apt-get update && apt install -y libffi-dev && rm -rf /var/lib/apt/lists/*
WORKDIR /usr/src/python/Python-3.10.13
RUN ./configure --enable-optimizations
RUN make -j8
RUN make altinstall
WORKDIR /usr/src/piper/src/python
RUN /usr/local/bin/python3.10 -m venv .venv
RUN source .venv/bin/activate && pip list && pip install pip wheel setuptools -U && pip list && pip install -r requirements.txt && pip list && pip install -e . && pip list && pip install torchmetrics==0.11.4 && pip install piper-tts && ./build_monotonic_align.sh && pip3 install piper-tts

Earlier in the Dockerfile I install some other apt pacjages so that python compiles, those include:

espeak-ng git build-essential zlib1g-dev libbz2-dev liblzma-dev libncurses5-dev libreadline6-dev libsqlite3-dev libssl-dev libgdbm-dev liblzma-dev tk-dev lzma lzma-dev libgdbm-dev

Some of those should be removed after python is compiled, to reduce the size of the container.

After the above modifications, the container sees the GPU and piper_train loads correctly. I didn't do any training since the GPU is already busy.

@ignacio82
Copy link
Author

any chance you can push that container to docker-hub?

I'm stuck here:

ignacio@xps:~/piper-checkpoints$ docker run -ti cloning bash

=============
== PyTorch ==
=============

NVIDIA Release 22.03 (build 33569136)
PyTorch Version 1.12.0a0+2c916ef

Container image Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Copyright (c) 2014-2022 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU                      (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006      Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015      Google Inc.
Copyright (c) 2015      Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for PyTorch.  NVIDIA recommends the use of the following flags:
   docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...

root@fa766fb67ab7:/usr/src/piper/src/python# /usr/local/bin/python3.10 -m venv .venv
root@fa766fb67ab7:/usr/src/piper/src/python# source .venv/bin/activate && pip list && pip install pip wheel setuptools -U && pip list && pip install -r requirements.txt && pip list && pip install -e . && pip list && pip install torchmetrics==0.11.4 && pip install piper-tts && ./build_monotonic_align.sh && pip3 install piper-tts
Package    Version
---------- -------
pip        23.0.1
setuptools 65.5.0

[notice] A new release of pip is available: 23.0.1 -> 23.3.1
[notice] To update, run: pip install --upgrade pip
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in ./.venv/lib/python3.10/site-packages (23.0.1)
Collecting pip
  Downloading pip-23.3.1-py3-none-any.whl (2.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 1.8 MB/s eta 0:00:00
Collecting wheel
  Downloading wheel-0.42.0-py3-none-any.whl (65 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 65.4/65.4 kB 5.0 MB/s eta 0:00:00
Requirement already satisfied: setuptools in ./.venv/lib/python3.10/site-packages (65.5.0)
Collecting setuptools
  Downloading setuptools-69.0.2-py3-none-any.whl (819 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 819.5/819.5 kB 2.9 MB/s eta 0:00:00
Installing collected packages: wheel, setuptools, pip
  Attempting uninstall: setuptools
    Found existing installation: setuptools 65.5.0
    Uninstalling setuptools-65.5.0:
      Successfully uninstalled setuptools-65.5.0
  Attempting uninstall: pip
    Found existing installation: pip 23.0.1
    Uninstalling pip-23.0.1:
      Successfully uninstalled pip-23.0.1
Successfully installed pip-23.3.1 setuptools-69.0.2 wheel-0.42.0
Package    Version
---------- -------
pip        23.3.1
setuptools 69.0.2
wheel      0.42.0
ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'

And this is what I currently have for the Dockerfile:

FROM nvcr.io/nvidia/pytorch:22.03-py3

# Set environment variables
ENV NUMBA_CACHE_DIR=.numba_cache
ENV TZ=America/Los_Angeles

RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

# Install system dependencies
RUN apt-get update \
  && apt-get install -y \
        espeak-ng \
        git \
        build-essential \
        zlib1g-dev \
        libbz2-dev \
        liblzma-dev \
        libncurses5-dev \
        libreadline6-dev \
        libsqlite3-dev \
        libssl-dev \
        libgdbm-dev \
        liblzma-dev \
        lzma \
        lzma-dev \
        libgdbm-dev 
   
RUN DEBIAN_FRONTEND=noninteractive TZ=$TZ apt-get -y install tzdata \
  &&  DEBIAN_FRONTEND=noninteractive TZ=$TZ apt-get -y install tzdata \ 
  && rm -rf /var/lib/apt/lists/* 



# Install Python
ARG PYTHON_VERSION=3.10.13
RUN mkdir -pv /usr/src/python \
    && cd /usr/src/python \
    && wget https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tgz \
    && tar zxvf Python-${PYTHON_VERSION}.tgz \
    && cd Python-${PYTHON_VERSION} \
    && ./configure --enable-optimizations \
    && make -j8 \
    && make altinstall \
    && cd / \
    && rm -rf /usr/src/python

# Set working directory
WORKDIR /usr/src/piper/src/python

@aaronnewsome
Copy link

I don't see a git clone for the piper github repo in your docker file

@ignacio82
Copy link
Author

I didn't see that in the code you shared, nor i can figure out how to incorporate it in my Dockefile that I'm trying to put together. This is clearly above my head. I will keep my fingers crossed for a public container, Dockerfile, or improved documentation. Thanks for the help

@aaronnewsome
Copy link

FROM nvcr.io/nvidia/pytorch:22.03-py3
RUN pip3 install 'pytorch-lightning'
ENV NUMBA_CACHE_DIR=.numba_cache
ENV DEBIAN_FRONTEND  noninteractive
RUN apt-get update && apt install -y python3-dev python3-venv espeak-ng git build-essential zlib1g-dev libbz2-dev liblzma-dev libncurses5-dev libreadline6-dev libsqlite3-dev libssl-dev libgdbm-dev liblzma-dev tk-dev lzma lzma-dev libgdbm-dev libffi-dev && rm -rf /var/lib/apt/lists/*
RUN mkdir -pv /usr/src/
WORKDIR /usr/src/
RUN git clone https://github.com/rhasspy/piper.git
RUN mkdir -pv /usr/src/python
WORKDIR /usr/src/python
RUN wget https://www.python.org/ftp/python/3.10.13/Python-3.10.13.tgz
RUN tar zxvf Python-3.10.13.tgz
WORKDIR /usr/src/python/Python-3.10.13
RUN ./configure --enable-optimizations
RUN make -j8
RUN make altinstall
WORKDIR /usr/src/piper/src/python
RUN /usr/local/bin/python3.10 -m venv .venv
RUN source .venv/bin/activate && pip list && pip install pip wheel setuptools -U && pip list && pip install -r requirements.txt && pip list && pip install -e . && pip list && pip install torchmetrics==0.11.4 && pip install piper-tts && ./build_monotonic_align.sh && pip3 install piper-tts
RUN ln -fs /usr/share/zoneinfo/America/Chicago /etc/localtime

@ignacio82
Copy link
Author

I used to think docker was amazing for people like me, this experience has changed my mind:

178.3 pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out.
------
Dockerfile:19
--------------------
  17 |     WORKDIR /usr/src/piper/src/python
  18 |     RUN /usr/local/bin/python3.10 -m venv .venv
  19 | >>> RUN source .venv/bin/activate && pip list && pip install pip wheel setuptools -U && pip list && pip install -r requirements.txt && pip list && pip install -e . && pip list && pip install torchmetrics==0.11.4 && pip install piper-tts && ./build_monotonic_align.sh && pip3 install piper-tts
  20 |     RUN ln -fs /usr/share/zoneinfo/America/Chicago /etc/localtime
  21 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c source .venv/bin/activate && pip list && pip install pip wheel setuptools -U && pip list && pip install -r requirements.txt && pip list && pip install -e . && pip list && pip install torchmetrics==0.11.4 && pip install piper-tts && ./build_monotonic_align.sh && pip3 install piper-tts" did not complete successfully: exit code: 2

@aaronnewsome
Copy link

This seems like a transient issue. They happen. Did you try running the container build again?

@ignacio82
Copy link
Author

Tried to build it once more time, without changing anything, and it worked. Alas:

root@3f2a8ced13e2:/usr/src/piper/src/python# python3 -m piper_train \
> >     --dataset-dir /train/ \
> >     --accelerator 'gpu' \
> >     --devices 1 \
> >     --batch-size 32 \
> >     --validation-split 0.0 \
> >     --num-test-examples 0 \
> >     --max_epochs 10000 \
> >     --resume_from_checkpoint /piper-checkpoints/epoch=4641-step=3104302.ckpt \
> >     --checkpoint-epochs 1 \
> >     --precision 32
/opt/conda/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: /opt/conda/lib/python3.8/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
  warn(f"Failed to load image Python extension: {e}")
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/src/piper/src/python/piper_train/__main__.py", line 10, in <module>
    from .vits.lightning import VitsModel
  File "/usr/src/piper/src/python/piper_train/vits/lightning.py", line 15, in <module>
    from .models import MultiPeriodDiscriminator, SynthesizerTrn
  File "/usr/src/piper/src/python/piper_train/vits/models.py", line 10, in <module>
    from . import attentions, commons, modules, monotonic_align
  File "/usr/src/piper/src/python/piper_train/vits/monotonic_align/__init__.py", line 4, in <module>
    from .monotonic_align.core import maximum_path_c
ModuleNotFoundError: No module named 'piper_train.vits.monotonic_align.monotonic_align.core'
root@3f2a8ced13e2:/usr/src/piper/src/python# 

@aaronnewsome
Copy link

While in the container, before running "python3 -m piper_train", try running

source .venv/bin/activate

@ignacio82
Copy link
Author

Thanks. Getting closser I think:

(.venv) root@3f2a8ced13e2:/usr/src/piper/src/python# python3 -m piper_train \
>     --dataset-dir /train/ \
>     --accelerator 'gpu' \
>     --devices 1 \
>     --batch-size 32 \
>     --validation-split 0.0 \
>     --num-test-examples 0 \
>     --max_epochs 10000 \
>     --resume_from_checkpoint /piper-checkpoints/epoch=4641-step=3104302.ckpt \
>     --checkpoint-epochs 1 \
>     --precision 32
DEBUG:piper_train:Namespace(dataset_dir='/train/', checkpoint_epochs=1, quality='medium', resume_from_single_speaker_checkpoint=None, logger=True, enable_checkpointing=True, default_root_dir=None, gradient_clip_val=None, gradient_clip_algorithm=None, num_nodes=1, num_processes=None, devices='1', gpus=None, auto_select_gpus=False, tpu_cores=None, ipus=None, enable_progress_bar=True, overfit_batches=0.0, track_grad_norm=-1, check_val_every_n_epoch=1, fast_dev_run=False, accumulate_grad_batches=None, max_epochs=10000, min_epochs=None, max_steps=-1, min_steps=None, max_time=None, limit_train_batches=None, limit_val_batches=None, limit_test_batches=None, limit_predict_batches=None, val_check_interval=None, log_every_n_steps=50, accelerator='gpu', strategy=None, sync_batchnorm=False, precision=32, enable_model_summary=True, weights_save_path=None, num_sanity_val_steps=2, resume_from_checkpoint='/piper-checkpoints/epoch=4641-step=3104302.ckpt', profiler=None, benchmark=None, deterministic=None, reload_dataloaders_every_n_epochs=0, auto_lr_find=False, replace_sampler_ddp=True, detect_anomaly=False, auto_scale_batch_size=False, plugins=None, amp_backend='native', amp_level=None, move_metrics_to_cpu=False, multiple_trainloader_mode='max_size_cycle', batch_size=32, validation_split=0.0, num_test_examples=0, max_phoneme_ids=None, hidden_channels=192, inter_channels=192, filter_channels=768, n_layers=6, n_heads=2, seed=1234)
/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py:52: LightningDeprecationWarning: Setting `Trainer(resume_from_checkpoint=)` is deprecated in v1.5 and will be removed in v1.7. Please pass `Trainer.fit(ckpt_path=)` directly instead.
  rank_zero_deprecation(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
DEBUG:piper_train:Checkpoints will be saved every 1 epoch(s)
DEBUG:vits.dataset:Loading dataset: /train/dataset.jsonl
/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:731: LightningDeprecationWarning: `trainer.resume_from_checkpoint` is deprecated in v1.5 and will be removed in v2.0. Specify the fit checkpoint path with `trainer.fit(ckpt_path=)` instead.
  ckpt_path = ckpt_path or self.resume_from_checkpoint
Missing logger folder: /train/lightning_logs
Restoring states from the checkpoint path at /piper-checkpoints/epoch=4641-step=3104302.ckpt
DEBUG:fsspec.local:open file: /piper-checkpoints/epoch=4641-step=3104302.ckpt
/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:345: UserWarning: The dirpath has changed from '/home/hansenm/larynx2/local/en_US/ryan/medium/lightning_logs/version_0/checkpoints' to '/train/lightning_logs/version_0/checkpoints', therefore `best_model_score`, `kth_best_model_path`, `kth_value`, `last_model_path` and `best_k_models` won't be reloaded. Only `best_model_path` will be reloaded.
  warnings.warn(
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
DEBUG:fsspec.local:open file: /train/lightning_logs/version_0/hparams.yaml
Restored all states from the checkpoint file at /piper-checkpoints/epoch=4641-step=3104302.ckpt
/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/utilities/data.py:153: UserWarning: Total length of `DataLoader` across ranks is zero. Please make sure this was your intention.
  rank_zero_warn(
/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:236: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 12 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:1892: PossibleUserWarning: The number of training batches (2) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
  rank_zero_warn(
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/src/piper/src/python/piper_train/__main__.py", line 147, in <module>
    main()
  File "/usr/src/piper/src/python/piper_train/__main__.py", line 124, in main
    trainer.fit(model)
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit
    self._call_and_handle_interrupt(
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1166, in _run
    results = self._run_stage()
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1252, in _run_stage
    return self._run_train()
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1283, in _run_train
    self.fit_loop.run()
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 271, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 174, in advance
    batch = next(data_fetcher)
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py", line 184, in __next__
    return self.fetching_function()
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py", line 263, in fetching_function
    self._fetch_next_batch(self.dataloader_iter)
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py", line 277, in _fetch_next_batch
    batch = next(iterator)
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/supporters.py", line 557, in __next__
    return self.request_next_batch(self.loader_iters)
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/supporters.py", line 569, in request_next_batch
    return apply_to_collection(loader_iters, Iterator, next)
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/utilities/apply_func.py", line 99, in apply_to_collection
    return function(data, *args, **kwargs)
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 628, in __next__
    data = self._next_data()
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_data
    return self._process_data(data)
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_data
    data.reraise()
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/torch/_utils.py", line 543, in reraise
    raise exception
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 58, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/torch/utils/data/dataset.py", line 295, in __getitem__
    return self.dataset[self.indices[idx]]
  File "/usr/src/piper/src/python/piper_train/vits/dataset.py", line 80, in __getitem__
    audio_norm=torch.load(utt.audio_norm_path),
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/torch/serialization.py", line 771, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/torch/serialization.py", line 270, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/torch/serialization.py", line 251, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/home/ignacio/out-train/cache/22050/be0533dcc16fb185a824504edb199d5862d70203c43ec4f0c45e3fa133a9b857.pt'

I don't understand the error given that i'm running that command inside the container and I'm running this to get in the container:

ignacio@xps:~/piper-checkpoints$ docker run --runtime=nvidia --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -ti -v /home/ignacio/out-train:/train -v /home/ignacio/piper-checkpoints:/piper-checkpoints cloning bash

ignacio@xps:~/piper-checkpoints$ ls /home/ignacio/out-train
cache  config.json  dataset.jsonl  lightning_logs
ignacio@xps:~/piper-checkpoints$ ls /home/ignacio/out-train/cache/
22050

In case it is relevant:

ignacio@xps:~/piper-checkpoints$ tree /home/ignacio/out-train
locales-launch: Data of en_US locale not found, generating, please wait...
/home/ignacio/out-train
├── cache
│   └── 22050
│       ├── 04e01d02ea9d846ce5227caa79557650cd174774711a5a6aa97e0ca367aadcb5.pt
│       ├── 04e01d02ea9d846ce5227caa79557650cd174774711a5a6aa97e0ca367aadcb5.spec.pt
│       ├── 06cd4ef3b72e9632fa844f1ec24a1426f49e8f674b4cce04a08e5b2807bc552b.pt
│       ├── 06cd4ef3b72e9632fa844f1ec24a1426f49e8f674b4cce04a08e5b2807bc552b.spec.pt
│       ├── 0948075669e629c7d9d6fbeea0e78df1f4b51de2787171fdf9d7f151e051a0ba.pt
│       ├── 0948075669e629c7d9d6fbeea0e78df1f4b51de2787171fdf9d7f151e051a0ba.spec.pt
│       ├── 0df1b17ffb19d9ef36be002010eaeae27096c730e7ed2595d91d15b110861354.pt
│       ├── 0df1b17ffb19d9ef36be002010eaeae27096c730e7ed2595d91d15b110861354.spec.pt
│       ├── 11a4355e2d69348d330a8226a7af766b36ecf8f3d68efd7722fa22330405cac3.pt
│       ├── 11a4355e2d69348d330a8226a7af766b36ecf8f3d68efd7722fa22330405cac3.spec.pt
│       ├── 1a9b151680072e8f1c412c7542d67e3579ddb735260a4f38b4418a0636261af1.pt
│       ├── 1a9b151680072e8f1c412c7542d67e3579ddb735260a4f38b4418a0636261af1.spec.pt
│       ├── 2024851fc407962a29a90e876e1bd73e2434c507505c3b40bead0f5c3b5f2dac.pt
│       ├── 2024851fc407962a29a90e876e1bd73e2434c507505c3b40bead0f5c3b5f2dac.spec.pt
│       ├── 2cb99059aef28b0c877203e021a85ae1d2cd55588e2b8bd1a5c98706ac6c17ac.pt
│       ├── 2cb99059aef28b0c877203e021a85ae1d2cd55588e2b8bd1a5c98706ac6c17ac.spec.pt
│       ├── 303708369840cf8172919b6e0586a5538a5a20dc1480fe756e82b56c74ef96d4.pt
│       ├── 303708369840cf8172919b6e0586a5538a5a20dc1480fe756e82b56c74ef96d4.spec.pt
│       ├── 30fd0aa0d27f70e709471161c9cf3cc95f7d4a16d146b8c13c4b4a409fbc13b5.pt
│       ├── 30fd0aa0d27f70e709471161c9cf3cc95f7d4a16d146b8c13c4b4a409fbc13b5.spec.pt
│       ├── 360c548aadf51aad829e007e40615917f51494bf1716218cf5291a3a0da84736.pt
│       ├── 360c548aadf51aad829e007e40615917f51494bf1716218cf5291a3a0da84736.spec.pt
│       ├── 3f57866c2d2e0640411413403a9a9bdbbd15bdfc0ccf6d7ffbcd67c0f4ee6ee7.pt
│       ├── 3f57866c2d2e0640411413403a9a9bdbbd15bdfc0ccf6d7ffbcd67c0f4ee6ee7.spec.pt
│       ├── 49c2832bef4fc31133f4d38888645d24908a3f6a1a58079f803e80b012397dd7.pt
│       ├── 49c2832bef4fc31133f4d38888645d24908a3f6a1a58079f803e80b012397dd7.spec.pt
│       ├── 4e6ba182a31950b9917c19bde3aec13036c7ffcf54f67fbdf5694596ceeede1c.pt
│       ├── 4e6ba182a31950b9917c19bde3aec13036c7ffcf54f67fbdf5694596ceeede1c.spec.pt
│       ├── 5427257a73627162aff7d19e6d112a98c30a39ba2e663e0e975f35b53189e06e.pt
│       ├── 5427257a73627162aff7d19e6d112a98c30a39ba2e663e0e975f35b53189e06e.spec.pt
│       ├── 548272f6eb8d4a860a03b0d10c98de7a19bc0ff806b88efd4f3c256c0444357c.pt
│       ├── 548272f6eb8d4a860a03b0d10c98de7a19bc0ff806b88efd4f3c256c0444357c.spec.pt
│       ├── 57b4b77b8004ce6c31edba7991b3a20b804d6c4e02cccacbc80723c122f3d9d9.pt
│       ├── 57b4b77b8004ce6c31edba7991b3a20b804d6c4e02cccacbc80723c122f3d9d9.spec.pt
│       ├── 598a03603d426674dcf3e68b323c99517ccda4b7240bc064f5610dabf0f7e66c.pt
│       ├── 598a03603d426674dcf3e68b323c99517ccda4b7240bc064f5610dabf0f7e66c.spec.pt
│       ├── 5a3c38fdfcb352a6c1e780983278f8b30bedb9eade064118bb69a7136924c636.pt
│       ├── 5a3c38fdfcb352a6c1e780983278f8b30bedb9eade064118bb69a7136924c636.spec.pt
│       ├── 5c8fc290cd50260d9b1cade5b85256ae7d2897fd672163f4b0dfa4da5210d73f.pt
│       ├── 5c8fc290cd50260d9b1cade5b85256ae7d2897fd672163f4b0dfa4da5210d73f.spec.pt
│       ├── 5d901cbf53cee1434c263d70324654676fbf4bd821f4279592685b7b48634008.pt
│       ├── 5d901cbf53cee1434c263d70324654676fbf4bd821f4279592685b7b48634008.spec.pt
│       ├── 66b04792e1fbde0e55b161e5056bb93d7a148684bf66bf304ec5ff16f5ad25da.pt
│       ├── 66b04792e1fbde0e55b161e5056bb93d7a148684bf66bf304ec5ff16f5ad25da.spec.pt
│       ├── 6ec3592ca3a7e08caef2351aa53f2e796f921d870a1417d1edf73b1b45722b6b.pt
│       ├── 6ec3592ca3a7e08caef2351aa53f2e796f921d870a1417d1edf73b1b45722b6b.spec.pt
│       ├── 7179e033dd036c6957e93fbfe51841b2f8018088ada9185875505cab004a2b85.pt
│       ├── 7179e033dd036c6957e93fbfe51841b2f8018088ada9185875505cab004a2b85.spec.pt
│       ├── 7219bd4dcf0d8b7c481f1dfc38b3396ad0b7e8a7fd73a0019599f04d474d7517.pt
│       ├── 7219bd4dcf0d8b7c481f1dfc38b3396ad0b7e8a7fd73a0019599f04d474d7517.spec.pt
│       ├── 788360cedf32e6b73f98274883e853446315060699fb5a9e38e43cd881137bcd.pt
│       ├── 788360cedf32e6b73f98274883e853446315060699fb5a9e38e43cd881137bcd.spec.pt
│       ├── 8144dc825ab98c874ef587abc42b6b713b4d0974365274da61230df0f73a8fd8.pt
│       ├── 8144dc825ab98c874ef587abc42b6b713b4d0974365274da61230df0f73a8fd8.spec.pt
│       ├── 8e6fcd06ff034ad9a5ab4727db696ae214609f1cdb8f3d70901b02870db87fba.pt
│       ├── 8e6fcd06ff034ad9a5ab4727db696ae214609f1cdb8f3d70901b02870db87fba.spec.pt
│       ├── 9faad2f2db9feda31a0cd610efc8ba71a8c71576fc8036b74682968956967233.pt
│       ├── 9faad2f2db9feda31a0cd610efc8ba71a8c71576fc8036b74682968956967233.spec.pt
│       ├── a3e1c79bdf950cba0c5a8158b1af9f0e738baa7a563290c3f9111cb6acfbf8c6.pt
│       ├── a3e1c79bdf950cba0c5a8158b1af9f0e738baa7a563290c3f9111cb6acfbf8c6.spec.pt
│       ├── a3f27635e5976f2f7040683193089c4ba7811a787f606caf8ea10d696482a4a1.pt
│       ├── a3f27635e5976f2f7040683193089c4ba7811a787f606caf8ea10d696482a4a1.spec.pt
│       ├── b69d4a066baec4f5390e9d621a0245202fc5090adf5c5c549885fd605928e589.pt
│       ├── b69d4a066baec4f5390e9d621a0245202fc5090adf5c5c549885fd605928e589.spec.pt
│       ├── be0533dcc16fb185a824504edb199d5862d70203c43ec4f0c45e3fa133a9b857.pt
│       ├── be0533dcc16fb185a824504edb199d5862d70203c43ec4f0c45e3fa133a9b857.spec.pt
│       ├── d461b6e9cbaf7964ff6bb12f01fa3d95ef32574cca435878f7e8b54b8552fcc5.pt
│       ├── d461b6e9cbaf7964ff6bb12f01fa3d95ef32574cca435878f7e8b54b8552fcc5.spec.pt
│       ├── d4690794bbded50e17fc257a6767d9e9f13ec24e9175f8cb1c94625030618f6a.pt
│       ├── d4690794bbded50e17fc257a6767d9e9f13ec24e9175f8cb1c94625030618f6a.spec.pt
│       ├── d4dcfbfc322144297f7491c1ad8f7a7b8261b0838ec1cd895e86dbd1187c21ed.pt
│       ├── d4dcfbfc322144297f7491c1ad8f7a7b8261b0838ec1cd895e86dbd1187c21ed.spec.pt
│       ├── d88b05d05baa089db447036092abeb80fdd4e1bb404744132deb9cd13c67bcdb.pt
│       ├── d88b05d05baa089db447036092abeb80fdd4e1bb404744132deb9cd13c67bcdb.spec.pt
│       ├── d89e07bfc4ae6ae5fca8ab3a965dad2bedeec11874a6d93c7b7f64cab3821671.pt
│       ├── d89e07bfc4ae6ae5fca8ab3a965dad2bedeec11874a6d93c7b7f64cab3821671.spec.pt
│       ├── e5de78e7e3ae1848f7f9c2a88981b4e0ebb89d33c392d73b1d5adcd87d53e44f.pt
│       ├── e5de78e7e3ae1848f7f9c2a88981b4e0ebb89d33c392d73b1d5adcd87d53e44f.spec.pt
│       ├── eb75989247b6048dda0c22839e1624a154e326c1f05b65fe2c7b46401ee6f2c9.pt
│       ├── eb75989247b6048dda0c22839e1624a154e326c1f05b65fe2c7b46401ee6f2c9.spec.pt
│       ├── ef8b43e347e846da70a9cf0b3d904fbc7075cd79e44adf2ccecf8b7be20b3530.pt
│       ├── ef8b43e347e846da70a9cf0b3d904fbc7075cd79e44adf2ccecf8b7be20b3530.spec.pt
│       ├── f2507ea125f5da92ad38333853f8ee7e0339cce79d9a3062adca8533a6477031.pt
│       ├── f2507ea125f5da92ad38333853f8ee7e0339cce79d9a3062adca8533a6477031.spec.pt
│       ├── f3a9dbeced79da786bf4642e54243a90774afb1df4a75cde7855f445f8466409.pt
│       ├── f3a9dbeced79da786bf4642e54243a90774afb1df4a75cde7855f445f8466409.spec.pt
│       ├── f4c618f3269e352f45480d0efc7a6d1a40fff479b566d5fa7c2de0634379423e.pt
│       ├── f4c618f3269e352f45480d0efc7a6d1a40fff479b566d5fa7c2de0634379423e.spec.pt
│       ├── fdefa487bf76ff204c9a1a3e8ac8137ab56f87c12dca4a8f8b4e570751609470.pt
│       └── fdefa487bf76ff204c9a1a3e8ac8137ab56f87c12dca4a8f8b4e570751609470.spec.pt
├── config.json
├── dataset.jsonl
└── lightning_logs [error opening dir]

@aaronnewsome
Copy link

I'm guessing your dataset.jsonl has the paths from your local machine, like /home/igancio. Have a look in it and see. Either pre-process your wav files IN the container or use the same paths in the container, or maybe do a search and replace in the dataset.jsonl.

I think it's easier to just use the exact same paths in the container.

So:

-v /home/ignacio/piper-checkpoints:/piper-checkpoints

becomes:

-v /home/ignacio/piper-checkpoints:/home/ignacio/piper-checkpoints

and the same for other -v options too.

You're almost there. This should work.

@ignacio82
Copy link
Author

Thank you so much. On addition to the Dockefile above, these are the steps I followed (in case someone else is having the same troubles I did):

docker run --runtime=nvidia --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -ti -v /home/ignacio/ignacio-test:/ignacio-test  -v /home/ignacio/out-train:/train -v /home/ignacio/piper-checkpoints:/piper-checkpoints cloning bash

source .venv/bin/activate

python3 -m piper_train.preprocess \
  --language en-us \
  --input-dir /ignacio-test/ \
  --output-dir /train/ \
  --dataset-format ljspeech \
  --single-speaker \
  --sample-rate 22050


python3 -m piper_train \
    --dataset-dir /train/ \
    --accelerator 'gpu' \
    --devices 1 \
    --batch-size 10 \
    --validation-split 0.0 \
    --num-test-examples 0 \
    --max_epochs 10000 \
    --resume_from_checkpoint /piper-checkpoints/epoch=4641-step=3104302.ckpt \
    --checkpoint-epochs 1 \
    --precision 32

@aaronnewsome
Copy link

Glad you were able to get it sorted. Hopefully anyone trying to train with docker container will find the thread. Not sure if you can change the title of the thread, but that would help them find it.

@ignacio82
Copy link
Author

I think I got to the point of exporting the model:

DEBUG:fsspec.local:open file: /train/lightning_logs/version_3/checkpoints/epoch=9999-step=3157882.ckpt
`Trainer.fit` stopped: `max_epochs=10000` reached.

Where do i find model.ckpt and model.onnx? I tried running this:

(.venv) root@7806dca0c20f:/usr/src/piper/src/python# python3 -m piper_train.export_onnx \
>     /ignacio-test/model.ckpt \
>     /ignacio-test/model.onnx
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/src/piper/src/python/piper_train/export_onnx.py", line 109, in <module>
    main()
  File "/usr/src/piper/src/python/piper_train/export_onnx.py", line 42, in main
    model = VitsModel.load_from_checkpoint(args.checkpoint, dataset=None)
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/core/saving.py", line 137, in load_from_checkpoint
    return _load_from_checkpoint(
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/core/saving.py", line 184, in _load_from_checkpoint
    checkpoint = pl_load(checkpoint_path, map_location=map_location)
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/utilities/cloud_io.py", line 46, in load
    with fs.open(path_or_url, "rb") as f:
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/fsspec/spec.py", line 1295, in open
    f = self._open(
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/fsspec/implementations/local.py", line 180, in _open
    return LocalFileOpener(path, mode, fs=self, **kwargs)
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/fsspec/implementations/local.py", line 302, in __init__
    self._open()
  File "/usr/src/piper/src/python/.venv/lib/python3.10/site-packages/fsspec/implementations/local.py", line 307, in _open
    self.f = open(self.path, mode=self.mode)
FileNotFoundError: [Errno 2] No such file or directory: '/ignacio-test/model.ckpt'
(.venv) root@7806dca0c20f:/usr/src/piper/src/python# 

@aaronnewsome
Copy link

aaronnewsome commented Dec 15, 2023

You probably need something like:

python3 -m piper_train.export_onnx "/train/lightning_logs/version_3/checkpoints/epoch=9999-step=3157882.ckpt" /ignacio-test/model.onnx

Don't forget to copy the json file too.

cp /train/config.json  /ignacio-test/model.onnx.json

@ignacio82
Copy link
Author

Got the files and copy them to piper:

image

Alas, even after restarting piper and home assistant i cannot see my new voice

image

Any suggestions?

@aaronnewsome
Copy link

Can't help there. Don't know anything at all about Home Assistant, never used it. Hopefully someone who has will chime in. Maybe a post to the Home Assistant forums will help you get the custom voice added. My guess is there's voices in a config file somewhere or some scan operation needs to be performed.

@ignacio82
Copy link
Author

ignacio82 commented Dec 16, 2023

Ignoring home assistant, get it to work inside piper you only need to copy the file next to the other ones? That is:

image

When I look at the log of my piper container it seems like piper is not aware that the file is there:

INFO:__main__:Ready
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-9' coro=<AsyncEventHandler.run() done, defined at /usr/local/lib/python3.9/dist-packages/wyoming/server.py:28> exception=VoiceNotFoundError('ignacio')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 35, in run
    if not (await self.handle_event(event)):
  File "/usr/local/lib/python3.9/dist-packages/wyoming_piper/handler.py", line 73, in handle_event
    piper_proc = await self.process_manager.get_process(voice_name=voice_name)
  File "/usr/local/lib/python3.9/dist-packages/wyoming_piper/process.py", line 114, in get_process
    ensure_voice_exists(
  File "/usr/local/lib/python3.9/dist-packages/wyoming_piper/download.py", line 77, in ensure_voice_exists
    find_voice(name, data_dirs)
  File "/usr/local/lib/python3.9/dist-packages/wyoming_piper/download.py", line 183, in find_voice
    raise VoiceNotFoundError(name)
wyoming_piper.download.VoiceNotFoundError: ignacio
INFO:wyoming_piper.download:Downloaded /data/en_US-hfc_male-medium.onnx.json (https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/hfc_male/medium/en_US-hfc_male-medium.onnx.json)
INFO:wyoming_piper.download:Downloaded /data/en_US-hfc_male-medium.onnx (https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/hfc_male/medium/en_US-hfc_male-medium.onnx)
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-23' coro=<AsyncEventHandler.run() done, defined at /usr/local/lib/python3.9/dist-packages/wyoming/server.py:28> exception=VoiceNotFoundError('ignacio')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 35, in run
    if not (await self.handle_event(event)):
  File "/usr/local/lib/python3.9/dist-packages/wyoming_piper/handler.py", line 73, in handle_event
    piper_proc = await self.process_manager.get_process(voice_name=voice_name)
  File "/usr/local/lib/python3.9/dist-packages/wyoming_piper/process.py", line 114, in get_process
    ensure_voice_exists(
  File "/usr/local/lib/python3.9/dist-packages/wyoming_piper/download.py", line 77, in ensure_voice_exists
    find_voice(name, data_dirs)
  File "/usr/local/lib/python3.9/dist-packages/wyoming_piper/download.py", line 183, in find_voice
    raise VoiceNotFoundError(name)
wyoming_piper.download.VoiceNotFoundError: ignacio
INFO:__main__:Ready
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-6' coro=<AsyncEventHandler.run() done, defined at /usr/local/lib/python3.9/dist-packages/wyoming/server.py:28> exception=VoiceNotFoundError('ignacio')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 35, in run
    if not (await self.handle_event(event)):
  File "/usr/local/lib/python3.9/dist-packages/wyoming_piper/handler.py", line 73, in handle_event
    piper_proc = await self.process_manager.get_process(voice_name=voice_name)
  File "/usr/local/lib/python3.9/dist-packages/wyoming_piper/process.py", line 114, in get_process
    ensure_voice_exists(
  File "/usr/local/lib/python3.9/dist-packages/wyoming_piper/download.py", line 77, in ensure_voice_exists
    find_voice(name, data_dirs)
  File "/usr/local/lib/python3.9/dist-packages/wyoming_piper/download.py", line 183, in find_voice
    raise VoiceNotFoundError(name)
wyoming_piper.download.VoiceNotFoundError: ignacio

But if i get inside the piper container the files are clearly there:

$ ls -la
total 1055100
drwxrwxrwx 1 1026 users      1436 Dec 15 17:53 .
drwxr-xr-x 1 root root       4096 Dec 15 17:27 ..
-rwxrwxrwx 1 1026 users  63104526 Nov 24 16:38 en_GB-alan-low.onnx
-rwxrwxrwx 1 1026 users      4170 Nov 24 16:38 en_GB-alan-low.onnx.json
-rwxrwxrwx 1 1026 users  63201294 Nov 24 16:39 en_GB-alan-medium.onnx
-rwxrwxrwx 1 1026 users      4888 Nov 24 16:39 en_GB-alan-medium.onnx.json
-rwxrwxrwx 1 1026 users  76952753 Nov 24 16:38 en_GB-vctk-medium.onnx
-rwxrwxrwx 1 1026 users      6637 Nov 24 16:38 en_GB-vctk-medium.onnx.json
-rwxrwxrwx 1 1026 users  63104526 Nov 20 22:16 en_US-amy-low.onnx
-rwxrwxrwx 1 1026 users      4164 Nov 20 22:16 en_US-amy-low.onnx.json
-rwxrwxrwx 1 1026 users  63201294 Dec 15 17:53 en_US-hfc_male-medium.onnx
-rwxrwxrwx 1 1026 users      5033 Dec 15 17:53 en_US-hfc_male-medium.onnx.json
-rwxrwxrwx 1 1026 users  63511038 Dec 15 15:53 en_US-ignacio-low.onnx
-rwxrwxrwx 1 1026 users      7082 Dec 15 17:49 en_US-ignacio-low.onnx.json
-rwxrwxrwx 1 1026 users  63104526 Nov 22 06:27 en_US-kathleen-low.onnx
-rwxrwxrwx 1 1026 users      4169 Nov 22 06:27 en_US-kathleen-low.onnx.json
-rwxrwxrwx 1 1026 users 113895201 Nov 20 22:15 en_US-lessac-high.onnx
-rwxrwxrwx 1 1026 users      4883 Nov 20 22:15 en_US-lessac-high.onnx.json
-rwxrwxrwx 1 1026 users  63201294 Nov 24 16:40 en_US-lessac-low.onnx
-rwxrwxrwx 1 1026 users      4882 Nov 24 16:40 en_US-lessac-low.onnx.json
-rwxrwxrwx 1 1026 users 136673811 Nov 24 16:40 en_US-libritts-high.onnx
-rwxrwxrwx 1 1026 users     20163 Nov 24 16:40 en_US-libritts-high.onnx.json
-rwxrwxrwx 1 1026 users 120786792 Nov 24 16:40 en_US-ryan-high.onnx
-rwxrwxrwx 1 1026 users      4166 Nov 24 16:40 en_US-ryan-high.onnx.json
-rwxrwxrwx 1 1026 users  63104526 Nov 20 22:15 en_US-ryan-low.onnx
-rwxrwxrwx 1 1026 users      4165 Nov 20 22:15 en_US-ryan-low.onnx.json
-rwxrwxrwx 1 1026 users  63201294 Nov 20 19:34 en_US-ryan-medium.onnx
-rwxrwxrwx 1 1026 users      4883 Nov 20 19:34 en_US-ryan-medium.onnx.json
-rwxrwxrwx 1 1026 users  63201294 Nov 21 16:49 es_MX-ald-medium.onnx
-rwxrwxrwx 1 1026 users      4889 Nov 21 16:51 es_MX-ald-medium.onnx.json
-rwxrwxrwx 1 1026 users      4889 Nov 21 16:49 es_es_MX_ald_medium_es_MX-ald-medium.onnx.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants