Skip to content

Conversation

@danpovey
Copy link
Contributor

No description provided.

kangshiyin and others added 10 commits July 18, 2018 00:05
Speed(gflops)                                           size     old    new    speedup
CuVector::AddDiagMat2Shapes<double>[no-trans],  (1048576, 32),   2.13   8.04   3.77x
CuVector::AddDiagMat2Shapes<double>[no-trans],   (524288, 64),   4.12   7.27   1.77x
CuVector::AddDiagMat2Shapes<double>[no-trans],  (262144, 128),   7.66   8.56   1.12x
CuVector::AddDiagMat2Shapes<double>[no-trans],  (131072, 256),  13.50  13.50   1.00x
CuVector::AddDiagMat2Shapes<double>[no-trans],   (65536, 512),  22.29  22.32   1.00x
CuVector::AddDiagMat2Shapes<double>[no-trans],  (32768, 1024),  32.26  32.35   1.00x
CuVector::AddDiagMat2Shapes<double>[no-trans],  (16384, 2048),  32.48  32.47   1.00x
CuVector::AddDiagMat2Shapes<double>[no-trans],   (8192, 4096),  32.54  32.57   1.00x
CuVector::AddDiagMat2Shapes<double>[no-trans],   (4096, 8192),  32.52  32.55   1.00x
CuVector::AddDiagMat2Shapes<double>[no-trans],  (2048, 16384),  32.46  32.49   1.00x
CuVector::AddDiagMat2Shapes<double>[no-trans],  (1024, 32768),  32.30  32.34   1.00x
CuVector::AddDiagMat2Shapes<double>[no-trans],   (512, 65536),  31.77  31.89   1.00x
CuVector::AddDiagMat2Shapes<double>[no-trans],  (256, 131072),  31.74  31.71   1.00x
CuVector::AddDiagMat2Shapes<double>[no-trans],  (128, 262144),  31.64  31.67   1.00x
CuVector::AddDiagMat2Shapes<double>[no-trans],   (64, 524288),  32.36  32.37   1.00x
CuVector::AddDiagMat2Shapes<double>[no-trans],  (32, 1048576),  30.94  30.92   1.00x
   CuVector::AddDiagMat2Shapes<double>[trans],  (1048576, 32),   1.10   8.61   7.84x
   CuVector::AddDiagMat2Shapes<double>[trans],   (524288, 64),   2.19   8.61   3.94x
   CuVector::AddDiagMat2Shapes<double>[trans],  (262144, 128),   4.41   8.67   1.97x
   CuVector::AddDiagMat2Shapes<double>[trans],  (131072, 256),   8.64   8.56   0.99x
   CuVector::AddDiagMat2Shapes<double>[trans],   (65536, 512),  15.72   8.57   0.55x
   CuVector::AddDiagMat2Shapes<double>[trans],  (32768, 1024),  26.09  26.07   1.00x
   CuVector::AddDiagMat2Shapes<double>[trans],  (16384, 2048),  31.51  31.26   0.99x
   CuVector::AddDiagMat2Shapes<double>[trans],   (8192, 4096),  27.93  28.35   1.02x
   CuVector::AddDiagMat2Shapes<double>[trans],   (4096, 8192),  31.56  31.52   1.00x
   CuVector::AddDiagMat2Shapes<double>[trans],  (2048, 16384),  31.21  31.20   1.00x
   CuVector::AddDiagMat2Shapes<double>[trans],  (1024, 32768),  31.40  31.36   1.00x
   CuVector::AddDiagMat2Shapes<double>[trans],   (512, 65536),  31.52  31.55   1.00x
   CuVector::AddDiagMat2Shapes<double>[trans],  (256, 131072),  30.96  30.95   1.00x
   CuVector::AddDiagMat2Shapes<double>[trans],  (128, 262144),  30.00  29.99   1.00x
   CuVector::AddDiagMat2Shapes<double>[trans],   (64, 524288),  28.43  28.78   1.01x
   CuVector::AddDiagMat2Shapes<double>[trans],  (32, 1048576),  24.95  24.93   1.00x
 CuVector::AddDiagMat2Shapes<float>[no-trans],  (1048576, 32),   2.92  15.87   5.44x
 CuVector::AddDiagMat2Shapes<float>[no-trans],   (524288, 64),   5.70  14.27   2.51x
 CuVector::AddDiagMat2Shapes<float>[no-trans],  (262144, 128),  11.04  16.65   1.51x
 CuVector::AddDiagMat2Shapes<float>[no-trans],  (131072, 256),  21.12  21.15   1.00x
 CuVector::AddDiagMat2Shapes<float>[no-trans],   (65536, 512),  38.60  38.67   1.00x
 CuVector::AddDiagMat2Shapes<float>[no-trans],  (32768, 1024),  57.21  57.29   1.00x
 CuVector::AddDiagMat2Shapes<float>[no-trans],  (16384, 2048),  63.39  63.50   1.00x
 CuVector::AddDiagMat2Shapes<float>[no-trans],   (8192, 4096),  62.63  62.71   1.00x
 CuVector::AddDiagMat2Shapes<float>[no-trans],   (4096, 8192),  63.60  63.71   1.00x
 CuVector::AddDiagMat2Shapes<float>[no-trans],  (2048, 16384),  63.07  63.09   1.00x
 CuVector::AddDiagMat2Shapes<float>[no-trans],  (1024, 32768),  62.47  62.64   1.00x
 CuVector::AddDiagMat2Shapes<float>[no-trans],   (512, 65536),  61.80  61.86   1.00x
 CuVector::AddDiagMat2Shapes<float>[no-trans],  (256, 131072),  61.03  60.99   1.00x
 CuVector::AddDiagMat2Shapes<float>[no-trans],  (128, 262144),  60.22  59.81   0.99x
 CuVector::AddDiagMat2Shapes<float>[no-trans],   (64, 524288),  62.09  61.87   1.00x
 CuVector::AddDiagMat2Shapes<float>[no-trans],  (32, 1048576),  52.96  53.01   1.00x
    CuVector::AddDiagMat2Shapes<float>[trans],  (1048576, 32),   1.25  16.44  13.19x
    CuVector::AddDiagMat2Shapes<float>[trans],   (524288, 64),   2.48  17.15   6.91x
    CuVector::AddDiagMat2Shapes<float>[trans],  (262144, 128),   4.92  17.14   3.49x
    CuVector::AddDiagMat2Shapes<float>[trans],  (131072, 256),   9.55  18.27   1.91x
    CuVector::AddDiagMat2Shapes<float>[trans],   (65536, 512),  17.90  18.30   1.02x
    CuVector::AddDiagMat2Shapes<float>[trans],  (32768, 1024),  31.49  31.48   1.00x
    CuVector::AddDiagMat2Shapes<float>[trans],  (16384, 2048),  34.38  34.38   1.00x
    CuVector::AddDiagMat2Shapes<float>[trans],   (8192, 4096),  51.61  51.59   1.00x
    CuVector::AddDiagMat2Shapes<float>[trans],   (4096, 8192),  48.60  48.87   1.01x
    CuVector::AddDiagMat2Shapes<float>[trans],  (2048, 16384),  57.47  57.52   1.00x
    CuVector::AddDiagMat2Shapes<float>[trans],  (1024, 32768),  56.30  56.38   1.00x
    CuVector::AddDiagMat2Shapes<float>[trans],   (512, 65536),  55.83  56.24   1.01x
    CuVector::AddDiagMat2Shapes<float>[trans],  (256, 131072),  55.35  55.81   1.01x
    CuVector::AddDiagMat2Shapes<float>[trans],  (128, 262144),  54.26  54.56   1.01x
    CuVector::AddDiagMat2Shapes<float>[trans],   (64, 524288),  52.88  53.00   1.00x
    CuVector::AddDiagMat2Shapes<float>[trans],  (32, 1048576),  47.55  47.44   1.00x
@danpovey
Copy link
Contributor Author

@GaofengCheng, can you please investigate making this same kind of change to some larger-scale TDNN-F scripts? Our GPUs are very busy. For now you can just leave the layer sizes for the initial CNN layers the same as they are in the small setups, and tune it later.

@GaofengCheng
Copy link
Contributor

Nice improvement
I will try this on switchboard first.
After getting good results, I can try using blocksum layer.
Gaofeng

@kangshiyin
Copy link
Contributor

It seems this is not the newest version of AddDiagMat2. The implementation in #2560 should be 3x faster, for AddDiagMat2 with the size (1048576, 32).

@danpovey
Copy link
Contributor Author

Thanks a lot, @kangshiyin; I merged your PR and merged the latest master into this PR.

@danpovey
Copy link
Contributor Author

I added a WSJ example. It's a little better, but not a lot. I will try to tune it more; right now I'm trying taking out two TDNN-F layers but I don't have much hope that it will help, based on my experience in mini-librispeech. @GaofengCheng, maybe you could also try it with slightly larger l2-regularize values, since it seems to be overfitting more-- if you have a WSJ setup.


# the first TDNN-F layer has no bypass (since dims don't match), and a larger bottleneck so the
# information bottleneck doesn't become a problem.
tdnnf-layer name=tdnnf7 $tdnnf_first_opts dim=1024 bottleneck-dim=256 time-stride=0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 256 here was a mistake; I intended for it to be the same as the final-layer bottleneck which it seems is 192 in the WSJ setup.

@danpovey
Copy link
Contributor Author

@xiaohui-zhang, do you have time to help with this PR? I just need the scripts to be renamed to the next consecutive letter (no number), the references to the script/affixes inside the scripts themselves to be corrected, and the comparisons with non-checked-in scripts to be removed from the comments in the headers.
I'm super busy with something right now so I don't have time to do this.

I was originally hoping @GaofengCheng would be able to get CNN+TDNN-F to help (vs. TDNN-F) on larger setups,as our grid was busy, but I think he was not able to, so far. I think we can just check this in now, rather than wait.

@xiaohui-zhang
Copy link
Contributor

xiaohui-zhang commented Aug 20, 2018 via email

xiaohui-zhang added a commit to xiaohui-zhang/kaldi that referenced this pull request Aug 23, 2018
@xiaohui-zhang
Copy link
Contributor

done

@danpovey
Copy link
Contributor Author

Closing since merged via #2643

@danpovey danpovey closed this Aug 24, 2018
xiaohui-zhang added a commit to xiaohui-zhang/kaldi that referenced this pull request Nov 6, 2018
xiaohui-zhang added a commit to xiaohui-zhang/kaldi that referenced this pull request Nov 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants