Add round-to-nearest-even rounding to float2half(). #368

DickJC123 · 2019-01-15T00:58:12Z

When mshadow creates instances of its 'half_t' type based on a value, if converts that value to a 32-bit float, then calls a 'constructor' routine to convert that float to a 16-bit half. On the CPU under linux, the f16c library call cvtss_sh() is used (depending on CPU gen?), but in the absence of this library (e.g. on the Windows systems used by MXNet CI testing) mshadow's own float2half() routine is called. While cvtss_sh() and the GPU intrinsics available on CUDA_VERSION >= 7.5 do a round-to-nearest-even conversion, the float2half() routine rounds to 0 (i.e. it truncates the extra significand bits). This PR corrects this platform-specific difference in behavior by adding round-to-nearest-even rounding to float2half(). This should improve the robustness of the MXNet CI testing and make MXNet behavior more consistent across systems. @piiswrong @eric-haibin-lin @KellenSunderland

This change should only effect Windows users whose VC++ compiler cannot offer the f16c library, or perhaps GPU users still on CUDA 7.0 or earlier. For those users who may wish to compare behaviors, the legacy (pre-PR) truncation behavior is made available by building with -DMSHADOW_HALF_ROUND_TO_NEAREST=0. This PR is tested by an MXNet PR with tests/python/unittest/test_operator.py:test_cast_float32_to_float16. See apache/mxnet#13857. In the process of developing this test, a numpy rounding bug was discovered, but a simple work-around was put in place.

Experiments with the new float2half() routine show a speed-up of 50% when measured over the [0,+inf] range of possible 32-bit inputs. The new float2half() routine can compile for the GPU, although this should be rarely if ever needed, and passes the new test.

…F_ROUND_TO_EVEN=0 build.

DickJC123 · 2019-01-15T21:55:06Z

This new functionality is tested in apache/mxnet#13857 and exercised further in apache/mxnet#13749. Both these MXNet PRs are dependent on this PR.

DickJC123 · 2019-01-22T21:27:15Z

Still waiting for a review of this improvement to float2half(). If it helps any, I've verified in a separate C++ program that the behavior of the new routine matches that of the _cvtss_sh() library routine over the range of 32-bit patterns from [0,+INF]. It's also tested in the MXNet PR's that are stacked up waiting for this PR. @piiswrong @eric-haibin-lin @KellenSunderland

mshadow/half.h

DickJC123 added 3 commits January 14, 2019 15:17

Add round-to-nearest-even to float2half(). Disable with -DMSHADOW_HAL…

d65d851

…F_ROUND_TO_EVEN=0 build.

Correct #if guard name.

4522884

Fix lint.

b211cb7

eric-haibin-lin requested a review from piiswrong January 15, 2019 01:33

Minor syntax fix for MXNet CI.

f607798

DickJC123 mentioned this pull request Jan 15, 2019

Add NHWC layout support to Pooling (cpu, gpu cuda, gpu cuDNN) apache/mxnet#13749

Merged

7 tasks

eric-haibin-lin reviewed Jan 24, 2019

View reviewed changes

mshadow/half.h Show resolved Hide resolved

eric-haibin-lin merged commit 3dc8081 into dmlc:master Jan 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add round-to-nearest-even rounding to float2half(). #368

Add round-to-nearest-even rounding to float2half(). #368

DickJC123 commented Jan 15, 2019

DickJC123 commented Jan 15, 2019

DickJC123 commented Jan 22, 2019

Add round-to-nearest-even rounding to float2half(). #368

Add round-to-nearest-even rounding to float2half(). #368

Conversation

DickJC123 commented Jan 15, 2019

DickJC123 commented Jan 15, 2019

DickJC123 commented Jan 22, 2019