Fixes to CUDA problems introduced by recent commits: normalize code c… by danpovey · Pull Request #1228 · kaldi-asr/kaldi

danpovey · 2016-11-30T00:40:05Z

…rashed on zero; compile problem on nvcc 8.0.

…de crash; compile problem on nvcc 8.0; fix thread-sync errors.

kangshiyin · 2016-11-30T01:49:01Z

src/cudamatrix/cu-kernels.cu

      ssum[tid] += ssum[tid + shift];
-      __syncthreads();
-    }
+    __syncthreads();


My bad… I missed this sync bug in unit test.

kangshiyin · 2016-11-30T02:35:44Z

src/cudamatrix/cu-math.cc

 void NormalizePerRow(const CuMatrixBase<Real>& in, const Real target_rms,
                     const bool add_log_stddev, CuMatrixBase<Real>* out) {
  const Real kSquaredNormFloor = 1.35525271560688e-20; // 2^-66
+  KALDI_ASSERT(SameDim(in, *out));


This does not hold if add_log_stddev is true.

kangshiyin · 2016-11-30T02:43:20Z

src/cudamatrix/cu-kernels.cu

+
+
+inline __device__ static float max_generic(float a, float b) {
+  return fmaxf(a, b);


Math functions such as fmax() are already overloaded for both double and float as documented here. http://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH.html#group__CUDA__MATH

It seems that max() has been overloaded for all possible types in this header but this is not documented.
/usr/local/cuda/include/math_functions.h

danpovey · 2016-11-30T02:49:06Z

I would have asked for review but that patch fixed a bug so I did it quickly. Would you mind fixing that assert problem by replacing with a more accurate assert? No hurry-- we have no scripts that use that feature. [but the tests were OK, so I guess we were also not testing that feature... would be good to have a test that covers it.] Regarding 'max'... I was not sure if that was a problem, it was my first guess at fixing the problem and I was in too much of a hurry to test it out. I couldn't find any documentation of the overloaded function so I assumed it didn't exist. It would probably be good to revert it, since my alternative is a bit ugly. You can make a PR for that.

…

On Tue, Nov 29, 2016 at 9:43 PM, Shiyin Kang ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/cudamatrix/cu-math.cc <#1228 (review)>: > @@ -246,6 +246,7 @@ template<typename Real> void NormalizePerRow(const CuMatrixBase<Real>& in, const Real target_rms, const bool add_log_stddev, CuMatrixBase<Real>* out) { const Real kSquaredNormFloor = 1.35525271560688e-20; // 2^-66 + KALDI_ASSERT(SameDim(in, *out)); This does not hold if add_log_stddev is true. ------------------------------ In src/cudamatrix/cu-kernels.cu <#1228 (review)>: > @@ -28,6 +28,16 @@ #include <math_constants.h> #include "cudamatrix/cu-kernels-ansi.h" + + +inline __device__ static float max_generic(float a, float b) { + return fmaxf(a, b); Math functions such as fmax() are already overloaded for both double and float as documented here. http://docs.nvidia.com/cuda/ cuda-math-api/group__CUDA__MATH.html#group__CUDA__MATH It seems that max() has been overloaded for all possible types in this header but this is not documented. /usr/local/cuda/include/math_functions.h — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#1228 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVuxqSKzqc8yWHqIZGKgYovQeX1k6zks5rDOLsgaJpZM4K_qpN> .

kangshiyin · 2016-11-30T03:06:59Z

I will make a PR soon. I think renorm in relu+renorm+sreg stands for NormalizeComponent, which is being used by @GaofengCheng in the experiment?

GaofengCheng · 2016-11-30T03:31:47Z

@kangshiyin NormalizeComponent

Fixes to CUDA problems introduced by recent commits: fix normalize co…

e732348

…de crash; compile problem on nvcc 8.0; fix thread-sync errors.

danpovey force-pushed the cuda_fix branch from f224707 to e732348 Compare November 30, 2016 01:22

danpovey merged commit 87465f5 into kaldi-asr:master Nov 30, 2016

kangshiyin reviewed Nov 30, 2016

View reviewed changes

kangshiyin mentioned this pull request Nov 30, 2016

fix assert; use fmax overloading that has been documented. #1230

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes to CUDA problems introduced by recent commits: normalize code c…#1228

Fixes to CUDA problems introduced by recent commits: normalize code c…#1228
danpovey merged 1 commit intokaldi-asr:masterfrom
danpovey:cuda_fix

danpovey commented Nov 30, 2016

Uh oh!

kangshiyin Nov 30, 2016

Uh oh!

kangshiyin Nov 30, 2016

Uh oh!

kangshiyin Nov 30, 2016

Uh oh!

danpovey commented Nov 30, 2016 via email

Uh oh!

kangshiyin commented Nov 30, 2016

Uh oh!

GaofengCheng commented Nov 30, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		inline __device__ static float max_generic(float a, float b) {
		return fmaxf(a, b);

Conversation

danpovey commented Nov 30, 2016

Uh oh!

kangshiyin Nov 30, 2016

Choose a reason for hiding this comment

Uh oh!

kangshiyin Nov 30, 2016

Choose a reason for hiding this comment

Uh oh!

kangshiyin Nov 30, 2016

Choose a reason for hiding this comment

Uh oh!

danpovey commented Nov 30, 2016 via email

Uh oh!

kangshiyin commented Nov 30, 2016

Uh oh!

GaofengCheng commented Nov 30, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants