Cast Op performance fix. #6509

edgchen1 · 2021-01-30T03:20:43Z

Description

Update CPU Cast implementation to fix performance regressions.

Update Cast unit tests for more coverage.

Performance Measurements

baseline: 91b19b8 (prior to #6466 which pessimized Cast CPU performance).
updated: changes from this PR

measured 1000 iterations, median value seemed more stable than average

n	src_type	dst_type	median_us baseline	median_us updated	(baseline - updated) / baseline
32768	float16	float	50.7000	49.6	0.021696252
32768	float16	int64	84.8000	84.3	0.005896226
32768	float16	string	35995.4000	13615.3	0.62174889
32768	float	float16	139.2000	139.9	-0.005028736
32768	float	int64	62.1000	56.8	0.085346216
32768	float	string	35930.9000	13621.3	0.6209029
32768	int64	float16	152.8000	147.7	0.033376963
32768	int64	float	60.6000	46.9	0.226072607
32768	int64	string	25288.2000	1217.2	0.951866879
32768	string	float16	N/A (unimplemented)	5924	N/A
32768	string	float	5847.4000	5512.2	0.057324623
32768	string	int64	4936.9000	4808.4	0.026028479

Motivation and Context

Performance.

include/onnxruntime/core/framework/data_types.h

skottmckay · 2021-02-04T00:54:42Z

onnxruntime/core/providers/cpu/tensor/cast_op.cc

+    auto snprintf_result = std::snprintf(nullptr, 0, format, value);
+    ORT_ENFORCE(snprintf_result > 0, "Failed to determine required snprintf() buffer length.");
+
+    // buffer for string and trailing '\0'


is there a maximum size that a float can possibly use when converted to a string that we could just use as the fixed buffer size to avoid the double call to snprintf? #Closed

there probably is, but i'm not sure what it is. updated to make one call in the case where it does fit into a fixed buffer

In reply to: 569863895 [](ancestors = 569863895)

can we make the static buffer a lot bigger (I would use at least 256 bytes) so the chance of needing the fallback path is much lower?

In reply to: 569930348 [](ancestors = 569930348,569863895)

skottmckay · 2021-02-04T00:56:06Z

onnxruntime/core/providers/cpu/tensor/cast_op.cc

+    ORT_ENFORCE(
+        snprintf_result > 0 && gsl::narrow_cast<size_t>(snprintf_result) == buffer.size() - 1,
+        "Failed to write value with snprintf().");
+


Prefer returning a status in code that runs during model execution. In a minimal build that will gracefully return an error (vs. ORT_ENFORCE which will call abort() if exceptions are disabled).
#Pending

ok, good to know. however i think this is sufficiently bad to warrant termination

In reply to: 569864399 [](ancestors = 569864399)

Either way the request will terminate as we essentially exit as soon as the status indicates failure. If you return a Status that will come back to the user gracefully with an error message in all builds. If you throw, it will do the same unless exceptions are disabled, in which case it's a hard abort which is much harder to debug.

In reply to: 569930622 [](ancestors = 569930622,569864399)

ok. the error message still gets printed though, right?
let me know if you have a strong preference for Status in this particular case. it'll require some plumbing to propagate it. i think these snprintf failures are unexpected - i.e., more like an assertion

In reply to: 569957335 [](ancestors = 569957335,569930622,569864399)

trying out Status

In reply to: 570436510 [](ancestors = 570436510,569957335,569930622,569864399)

onnxruntime/core/providers/cpu/tensor/cast_op.cc

skottmckay · 2021-02-04T01:00:48Z

onnxruntime/core/providers/cpu/tensor/cast_op.cc

 };

 #if defined(_M_AMD64)
+// add some specializations to use optimized MLFloat16 -> float conversion


comment documenting why this is _M_AMD64 only would be good #Resolved

changed to say that this is _M_AMD64 specific. searched the code and from the filenames it looks like it is only implemented for this architecture, but i'm not sure why that is.

In reply to: 569866100 [](ancestors = 569866100)

Searched the ORT code or Eigen code? Would be good for the comment to mention where you looked and only found amd64 specific optimizations so that in the future we can look in the same places to see if anything has changed and whether other plaforms can have the specializations enabled.

In reply to: 569931342 [](ancestors = 569931342,569866100)

This is MLAS specific: at one time, the Windows ML team cared about the performance of a FP16 model running on the CPU (for falling back from the DML EP), so I implemented a Windows x64 specific routine to do the conversion. That scenario wasn't important on other platforms, so the kernel was never adapted anywhere else. The scenario may no longer matter either, if you can get support for retiring this from the Windows ML team. #Resolved

ok, thanks for explaining that. the MLAS routine is faster from my measurements, so i think it's good to have this optimized version.

In reply to: 570004323 [](ancestors = 570004323)

i searched the ORT code for MlasConvertHalfToFloatBuffer.

In reply to: 569962495 [](ancestors = 569962495,569931342,569866100)

…1/cast_perf

skottmckay

edgchen1 added 4 commits January 29, 2021 13:05

Update comment.

9778111

Update example comment.

6cc0d4e

Update Cast tests.

9e58038

Address Cast performance issues.

e9d80ca

edgchen1 requested a review from a team as a code owner January 30, 2021 03:20

edgchen1 changed the title ~~Edgchen1/cast perf~~ Cast Op performance fix. Jan 30, 2021

edgchen1 added 6 commits February 1, 2021 11:05

Fix build issue.

c4a2776

Disable MLFloat16 to string test on x86.

44437e3

Fix float16 -> string test failure.

e303e85

Avoid using stringstream.

da85adc

Add headers.

5362b54

Fix unsigned/signed comparison.

56b9e7c

edgchen1 requested review from guoyu-wang, skottmckay and tracysh February 2, 2021 01:43

edgchen1 commented Feb 2, 2021

View reviewed changes

include/onnxruntime/core/framework/data_types.h Show resolved Hide resolved

Account for CUDA compute capability requirements in test.

084fbc2

skottmckay reviewed Feb 4, 2021

View reviewed changes

onnxruntime/core/providers/cpu/tensor/cast_op.cc Show resolved Hide resolved

skottmckay reviewed Feb 4, 2021

View reviewed changes

edgchen1 added 4 commits February 3, 2021 19:54

Address PR comments.

259ba89

Merge remote-tracking branch 'origin/edgchen1/cast_perf' into edgchen…

0f14fec

…1/cast_perf

Merge remote-tracking branch 'origin/master' into edgchen1/cast_perf

ade73d2

Address PR comments.

44b2838

skottmckay approved these changes Feb 4, 2021

View reviewed changes

edgchen1 merged commit 318b82c into master Feb 4, 2021

edgchen1 deleted the edgchen1/cast_perf branch February 4, 2021 22:52

snnn mentioned this pull request Feb 9, 2022

Move floatToHalf from onnxruntime_util.lib to onnxruntime_framework.lib #10508

Closed

Cast Op performance fix. #6509

Cast Op performance fix. #6509

Uh oh!

Conversation

edgchen1 commented Jan 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Performance Measurements

Motivation and Context

Uh oh!

Uh oh!

skottmckay Feb 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skottmckay Feb 4, 2021 • edited by edgchen1 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

skottmckay Feb 4, 2021 • edited by edgchen1 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tracysh Feb 4, 2021 • edited by edgchen1 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skottmckay left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

edgchen1 commented Jan 30, 2021 •

edited

Loading

skottmckay Feb 4, 2021 •

edited

Loading

skottmckay Feb 4, 2021 •

edited by edgchen1

Loading

skottmckay Feb 4, 2021 •

edited by edgchen1

Loading

tracysh Feb 4, 2021 •

edited by edgchen1

Loading