Skip to content

Commit

Permalink
cpu: reorder: tentatively turn ref direct copy code for gcc
Browse files Browse the repository at this point in the history
Rationale: jitted code is typically faster than reference code
           compiled with old GCC (4.8.3). However jitted code
           requires significant creation time, so if someone
           always creates reorders prior to its execution jitted
           code might become slower than simple reference code.

This commit is tentative. Intel MKL-DNN team needs to find out a way
to make jitting less expensive... especially for such auxiliary and
quite popular stuff like direct copy and other reorders.


(cherry picked from commit 44b09b8)
  • Loading branch information
Fomenko, Evarist M authored and tprimak committed Nov 28, 2018
1 parent 830a100 commit 567dfb5
Showing 1 changed file with 9 additions and 1 deletion.
10 changes: 9 additions & 1 deletion src/cpu/cpu_reorder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,17 @@ static const rpd_create_f cpu_reorder_impl_list[] = {
wino_reorder_t<f32, f32>::pd_t::create,
wino_reorder_t<f32, s8>::pd_t::create,

#if defined(__INTEL_COMPILER) || (defined(__GNUC__) && !defined(__clang__))
/* Direct copy for icc which is faster than jitted code;
* Direct copy for gcc which might or might not be faster than jitted
* code, but still worth it because doesn't require jitting, i.e. much
* faster creation time. This is tentative solution and should be removed
* later (when we will cache jitted code?...). */
REG_SR_DIRECT_COPY(f32, f32),
#endif

#ifdef __INTEL_COMPILER
/* direct copy for icc, which is faster than jitted code */
REG_SR_DIRECT_COPY(f32, f32),
REG_SR_DIRECT_COPY(f32, s32),
REG_SR_DIRECT_COPY(f32, s8),
REG_SR_DIRECT_COPY(f32, u8),
Expand Down

0 comments on commit 567dfb5

Please sign in to comment.