Webgl support #5

Yosshi999 · 2022-06-21T23:03:57Z

decodeモデルを二つに分割し、hifiganの方のみをwebgl backendのonnxruntime-webに対応させる

Note

この変更をmainに適用する際にはvoicevox_coreに対して改修が必要になります。decode.onnxが消え新しく二つのonnxモデルができたので、これらを替わりにバイナリに埋め込む必要があります。

Yosshi999 · 2022-06-21T23:05:30Z

js側での読み込みは
https://github.com/Yosshi999/vv_check_web/tree/webgl-wip

Hiroshiba · 2022-06-22T01:49:05Z

おおー！！　conv1dがないの、歴史を感じました。
ちょっと後ほど試させていただきたいです！！

（webgl-wipブランチの方はyukarin_sosoa.onnxが無いしまだ反映されてなさそう？）

Yosshi999 · 2022-06-23T10:13:52Z

すみません8e0f194以降はなぜかバグってるのでもうしばらくお待ちください

Yosshi999 · 2022-06-23T10:48:35Z

これ、バグってたのは今までの方で、最新のコミットでは別の理由で落ちてるっぽいです。
vv_check_webで試すとしばらくしたのち画面（chrome)が真っ暗になってエラー落ちする & 入力長を407から100に落とすと30秒ほどかけたのち成功するので、メモリ不足を疑っています...

Yosshi999 · 2022-06-23T11:00:07Z

chromeのハードウェアアクセラレーションを切ると真っ暗になる現象は無くなりましたが、エラー落ちするのは変わりませんでした

Yosshi999 · 2022-06-23T11:05:29Z

エラーを貼ります (改行/インデントは自分で足しました）

failed to inference ONNX model: Error: Failed to compile shader: Shader source: 
# version 300 es

precision highp float;
precision highp int;
precision highp sampler2D;
in vec2 TexCoords;
out vec4 outputColor;
const vec2 halfCR = vec2(0.5, 0.5);
// Custom vector types to handle higher dimenalities. 
struct ivec5 {
    int x;
    int y;
    int z;
    int w;
    int u;
};
struct ivec6 {
    int x;
    int y;
    int z;
    int w;
    int u;
    int v;
};
int imod(int x, int y) {
    return x - y * (x / y);
}
uniform sampler2D X;
int coordsToOffset(vec2 coords, int width, int height) {
    float s = coords.s * float(width);
    float t = coords.t * float(height);
    int offset = int(t) * width + int(s);
    return offset;
}
void toVec(vec2 texCoords, out int c[4]) {
    int offset = coordsToOffset(texCoords, 1582, 1581);
    c[0] = offset / 2500608;
    offset -= c[0] * 2500608;
    c[1] = offset / 96;
    offset -= c[1] * 96;
    c[2] = offset / 96;
    offset -= c[2] * 96;
    c[3] = offset;
}
void toVec(int offset, out int c[4]) {
    c[0] = offset / 2500608;
    offset -= c[0] * 2500608;
    c[1] = offset / 96;
    offset -= c[1] * 96;
    c[2] = offset / 96;
    offset -= c[2] * 96;
    c[3] = offset;
}
int indicesToOffset_X(int indices[4]) {
    int offset = 0;
    offset += indices[3] * 1;
    offset += indices[2] * 1;
    offset += indices[1] * 26048;
    offset += indices[0] * 3334144;
    return offset;
}
vec2 offsetToCoords(int offset, int width, int height) {
    int t = offset / width;
    int s = offset - t*width;
    vec2 coords = (vec2(s,t) + vec2(0.5,0.5)) / vec2(width, height);
    return coords;
}
highp float decode(highp vec4 rgba) {
    return rgba.r;
}
float getColorAsFloat(vec4 color) {
    return decode(color);
}
float _X(int m[4]) {
    int offset = indicesToOffset_X(m);
    vec2 coords = offsetToCoords(offset, 1826, 1826);
    float value = getColorAsFloat(texture(X, coords));
    return value;
}
const int XC = 128;
const int XH = 26048;
const int XW = 1;
const int KH = 3;
const int KW = 1;
const int dilationH = 5;
const int dilationW = 5;
const int strideH = 1;
const int strideW = 1;
const int padH = 5;
const int padW = 0;
const int KHKW = KH*KW;
const int XCKHKW = XC * KHKW;
const int outputChannels = 4;
vec4 process(int indices[4]) {
    int b = indices[0];
    // batch size
    int oh = indices[1] * strideH - padH;
    //output height
    int ow = indices[2] * strideW - padW;
    //output width
    int p = indices[3] * outputChannels;
    //patch
    vec4 value = vec4(0.0);
    for(int i=0; i < outputChannels; ++i) {
        if(p < XCKHKW) {
            int patchC = p / KHKW;
            int patchH = (p - patchC*KHKW) / KW;
            int patchW = (p - patchC*KHKW) - patchH * KW;
            int xh2 = oh + patchH * dilationH;
            int xw2 = ow + patchW * dilationW;
            int x[4];
            x[0] = b;
            x[1] = patchC;
            x[2] = xh2;
            x[3] = xw2;
            if(xh2 >= 0 && xh2 < XH && xw2 >= 0 && xw2 < XW) {
                value[i] = _X(x);
            }
        }
        ++p;
    }
    return value;
}
void main() {
    int indices[4];
    toVec(TexCoords, indices);
    vec4 result = vec4(process(indices));
    outputColor = result;
}

Yosshi999 · 2022-06-23T12:23:36Z

上記のスクリプトを手元でコンパイルしてみましたが正常なコードでした。
例外が起こるところにブレークポイントを張ると

WebGL: CONTEXT_LOST_WEBGL: loseContext: context lost

というエラーが起こっているので、
コンパイルに失敗したというよりCONTEXT_LOST_WEBGLで落ちたのでコンパイル失敗とみなされたように見えます。

Yosshi999 · 2022-06-23T13:00:12Z

十中八九メモリ不足っぽいです。僕のノートPC(内臓GPU)ではまともに動きませんでした。
一応 https://github.com/Yosshi999/vv_check_web/tree/webgl-wip にはモデルをアップロードしました

Hiroshiba · 2022-06-23T18:30:32Z

動かしてみました！！動きました！！！

メモリ8GBのRTX3070で動かしてみたのですが、同じくlengthを短くしないとメモリが溢れて挙動が変になりました。（VRAMの解放にchromeの再起動が必要だった）

length=200で試したところ、メモリを6GBほど使って、1.4秒ほどかけて生成できました。
GPUの使用率はちゃんと一瞬高くなっている感じでした。

たしかlength=256で音声1秒なので、RTFは1.5とかそれくらいになりそうです。
threadingをONにしたCPUでの生成はRTF 0.7とかなので、なぜかちょっと遅いという印象です。

･･････いやーーーなんかさすがにパフォーマンスが出てなさすぎる気がする･･･！！
onnxruntime×WebGLがメモリ効率・計算効率共にかなり悪いのか、はたまた使い方がどこか変なのか、もしくは何かバグを踏んでいるのか、どれだと思われますか･････？👀

📝 lengthを200にするコード

        const length = 200;
        const phoneme_size = 45;
        const speaker_id_data = BigInt64Array.from([1].map((x) => BigInt(x)));

        const _f0_data = f0_data.slice(0, length);
        const _phoneme_data = phoneme_data.slice(0, length * phoneme_size);

        const f0 = new ort.Tensor("float32", _f0_data, [length, 1]);
        const phoneme = new ort.Tensor("float32", _phoneme_data, [
          length,
          phoneme_size,
        ]);

Yosshi999 · 2022-06-23T23:20:17Z

うーん何とも言えませんね...
計算効率が悪いというのはあると思いますが、pythonでは問題なく動くのでメモリをこんなに食うのはおかしいし（いやでも中間結果をすべて保持したらそのくらい行くのか？）、そもそもonnxruntime-webglはなんか挙動が怪しいし、自分たちが通常とは違う使い方をしている（conv2dしかない実装に対して異様に細長い疑似画像を投げている）ので想定されていない挙動になっている可能性もあります

Hiroshiba · 2022-07-19T16:52:12Z

すみません、コメントしたつもりになっていました 🙇
たしかに細長い画像になっている関係で何かよくないことが起こっている気もしました。

#4 (comment) にもかいたのですが、数万×１の行列を数百×数百くらいにreshapeすれば動作が安定する･･･かも･･･。

Yosshi999 · 2024-07-21T08:04:48Z

なんか今更ですが、onnxruntime-webを1.18.0に上げてwebgpuに変更すると動きました（hifiganをwasm->webgpuにすると7倍高速化 on RTX3060）
ほとんど元のコードを覚えてない&もう必要ないworkaroundが含まれている可能性があるので書き直した方がいいかもしれないです

https://github.com/Yosshi999/vv_check_web/tree/webgl-wip

cf. VOICEVOX/voicevox_core#491

Hiroshiba · 2024-07-26T17:24:12Z

@Yosshi999 お久しぶりです！コメントありがとうございます、ありがたいです！！！

7倍高速･･･ということは、以前のコメント見るとRTFが1.5くらいだったので、環境によっては0.2～0.3くらいになる･･･という感じでしょうか。
となると4秒ぐらいの音声でも1秒ぐらい待てば再生されるので、ギリギリ許容範囲かもですね！！！
（まあGPUが必要になりますが）

お手数おかけしてしまうのですが、もしよかったら速度測定をお願いできませんか 🙇
適当な長さのテキストを合成してみて、その音声の長さ・CPUの場合にかかった時間・GPU の場合にかかった時間をいろんなテキストの長さでやってみれば見えてくるんじゃないかなと･･････！！！
（CPUの場合にかかった時間は比較用、なくても良さそう）

また現状では暗号化の観点からonnxの配布はしていませんが、暗号部分をonnxruntimeに埋め込んでweb版をビルドする方法もあると思うので、進展があれば結構面白いことも射程範囲に入るんじゃないかな～～と期待してたりします！！！

Yosshi999 · 2024-07-27T18:06:41Z

すみません、全体を計測するのはだいぶ大変なので、decode部分（sosoa + hifigan）のみを計測しました
sosoaは常にwasm, hifiganの部分はwasmとwebgpuで場合分けして0.5 ~ 4秒までの音声生成を行いました。
gpuが強いのかだいぶ早くなっており、グラフがつぶれています

表にまとめると以下のようになります

outlength (sec)	webgpu enabled	all wasm
0.49066667	0.04227	0.75811
0.992	0.04573	1.45352
1.49333333	0.06242	2.21918
1.99466667	0.07915	2.95608
2.496	0.094815	3.715555
2.99733333	0.11548	4.50789
3.49866667	0.126165	5.144365
4	0.143135	5.9653

Hiroshiba · 2024-07-28T15:50:55Z

あっすみません、decode部分だけで大丈夫です！ここが一番時間かかってるはずなので！

おーーーー全く問題ない程度に早いですね！！！
ちなみになのですが、その計測終了のタイミングで計算結果は帰ってきてそうでしょうか･･･？

というのもpytorchとかはCUDAの計算が非同期に行われているので、CUDAの計算終了と関数が終了するタイミングがずれているんですよね。
なので関数を実行し終わったタイミングで計測終了していると、実際使った時との使用感が大きくずれるので、念のためにお聞きした次第です！！

もし間違いなさそうであれば、全く問題ない動作速度ですね！！！！！
もちろん GPU は必要になってしまいますが、ここまでしっかり速く動くとは････。
もしかしたらスマホでもだいぶスピーディーに動くかもですね･･･。

ちょっと一旦ご確認までコメントということで 🙇

Yosshi999 added 3 commits June 22, 2022 07:28

split decoder and make surgeon script

069f249

remove comments, fix filename

51636aa

process surgeon in converting

53f5813

Yosshi999 added 2 commits June 22, 2022 23:55

update requirements

8e0f194

refactoring surgeon script and add elimination algorithm

05e6d51

sevenc-nanashi mentioned this pull request Jul 21, 2024

WebGPUに対応させる VOICEVOX/voicevox_core#491

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Webgl support #5

Webgl support #5

Yosshi999 commented Jun 21, 2022

Yosshi999 commented Jun 21, 2022

Hiroshiba commented Jun 22, 2022

Yosshi999 commented Jun 23, 2022

Yosshi999 commented Jun 23, 2022

Yosshi999 commented Jun 23, 2022

Yosshi999 commented Jun 23, 2022 •

edited

Loading

Yosshi999 commented Jun 23, 2022

Yosshi999 commented Jun 23, 2022

Hiroshiba commented Jun 23, 2022 •

edited

Loading

Yosshi999 commented Jun 23, 2022

Hiroshiba commented Jul 19, 2022

Yosshi999 commented Jul 21, 2024 •

edited

Loading

Hiroshiba commented Jul 26, 2024 •

edited

Loading

Yosshi999 commented Jul 27, 2024

Hiroshiba commented Jul 28, 2024

Webgl support #5

Are you sure you want to change the base?

Webgl support #5

Conversation

Yosshi999 commented Jun 21, 2022

Note

Yosshi999 commented Jun 21, 2022

Hiroshiba commented Jun 22, 2022

Yosshi999 commented Jun 23, 2022

Yosshi999 commented Jun 23, 2022

Yosshi999 commented Jun 23, 2022

Yosshi999 commented Jun 23, 2022 • edited Loading

Yosshi999 commented Jun 23, 2022

Yosshi999 commented Jun 23, 2022

Hiroshiba commented Jun 23, 2022 • edited Loading

Yosshi999 commented Jun 23, 2022

Hiroshiba commented Jul 19, 2022

Yosshi999 commented Jul 21, 2024 • edited Loading

Hiroshiba commented Jul 26, 2024 • edited Loading

Yosshi999 commented Jul 27, 2024

Hiroshiba commented Jul 28, 2024

Yosshi999 commented Jun 23, 2022 •

edited

Loading

Hiroshiba commented Jun 23, 2022 •

edited

Loading

Yosshi999 commented Jul 21, 2024 •

edited

Loading

Hiroshiba commented Jul 26, 2024 •

edited

Loading