Skip to content
This repository has been archived by the owner on Aug 11, 2020. It is now read-only.

Add round-to-nearest-even rounding to float2half(). #368

Merged
merged 4 commits into from
Jan 28, 2019

Conversation

DickJC123
Copy link
Contributor

When mshadow creates instances of its 'half_t' type based on a value, if converts that value to a 32-bit float, then calls a 'constructor' routine to convert that float to a 16-bit half. On the CPU under linux, the f16c library call cvtss_sh() is used (depending on CPU gen?), but in the absence of this library (e.g. on the Windows systems used by MXNet CI testing) mshadow's own float2half() routine is called. While cvtss_sh() and the GPU intrinsics available on CUDA_VERSION >= 7.5 do a round-to-nearest-even conversion, the float2half() routine rounds to 0 (i.e. it truncates the extra significand bits). This PR corrects this platform-specific difference in behavior by adding round-to-nearest-even rounding to float2half(). This should improve the robustness of the MXNet CI testing and make MXNet behavior more consistent across systems. @piiswrong @eric-haibin-lin @KellenSunderland

This change should only effect Windows users whose VC++ compiler cannot offer the f16c library, or perhaps GPU users still on CUDA 7.0 or earlier. For those users who may wish to compare behaviors, the legacy (pre-PR) truncation behavior is made available by building with -DMSHADOW_HALF_ROUND_TO_NEAREST=0. This PR is tested by an MXNet PR with tests/python/unittest/test_operator.py:test_cast_float32_to_float16. See apache/mxnet#13857. In the process of developing this test, a numpy rounding bug was discovered, but a simple work-around was put in place.

Experiments with the new float2half() routine show a speed-up of 50% when measured over the [0,+inf] range of possible 32-bit inputs. The new float2half() routine can compile for the GPU, although this should be rarely if ever needed, and passes the new test.

@DickJC123
Copy link
Contributor Author

This new functionality is tested in apache/mxnet#13857 and exercised further in apache/mxnet#13749. Both these MXNet PRs are dependent on this PR.

@DickJC123
Copy link
Contributor Author

Still waiting for a review of this improvement to float2half(). If it helps any, I've verified in a separate C++ program that the behavior of the new routine matches that of the _cvtss_sh() library routine over the range of 32-bit patterns from [0,+INF]. It's also tested in the MXNet PR's that are stacked up waiting for this PR. @piiswrong @eric-haibin-lin @KellenSunderland

@eric-haibin-lin eric-haibin-lin merged commit 3dc8081 into dmlc:master Jan 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants