Questions about status of dgNewtonSse #215

JayFoxRox · 2020-05-01T21:31:09Z

For running Newton on the original Xbox (Pentium 3, 733MHz, MMX and SSE; specifically no SSE2 or higher) I wondered about the status of dgNewtonSse. That is still mentioned here:

newton-dynamics/CMakeLists.txt

Line 18 in fd2c31d

#option("NEWTON_WITH_SSE_PLUGIN" "adding sse parallel solver" OFF)

and here:

newton-dynamics/sdk/CMakeLists.txt

Lines 41 to 43 in fd2c31d

    
           if (NEWTON_WITH_SSE_PLUGIN) 
        
           	add_subdirectory(dgNewtonSse) 
        
           endif()

(and potentially elsewhere)

However, the actual folder newton-dynamics/sdk/dgNewtonSse is nowhere to be found.
To find out when / why it was deleted, I checked the git history.

The first hint I found was in 4290f25 which mentions them as "unfinished" which implies they are still planned / wanted.

I then kept looking if they were finished in the past (before being unfinished by bitrot), and found the last revision with dgNewtonSse: https://github.com/MADEAPPS/newton-dynamics/tree/ce423a44e3d0e6b075d84195aec29081ce9acb66/sdk/dgNewtonSse
After that, it was renamed / moved to dgNewtonGL: https://github.com/MADEAPPS/newton-dynamics/tree/159b8b469b3f8cea55c69a8dfd93ceb1e76a68af/sdk/dgNewtonGL

Tracking the changes became tedious, so I started looking at the state of the current implementations for AVX and SSE4.2.

So my questions are:

When and why was dgNewtonSse removed / turned into dgNewtonGL? What was dgNewtonGL? Did SSE continue to exist elsewhere (where I didn't see it)?
Does auto-vectorization of modern compilers like Clang do a good enough job to use the reference solvers with good performance?
Would there be any expected performance benefit when adding explicit legacy SSE support (specifically SSE without SSE2+)?
What makes each vectorization plugin code (Sse4.2, Avx, Avx2) incompatible with other vectorization instruction sets, other than the little code in dgSolver.h?
The code in dgNewtonAvx2/dgSolver.cpp and dgNewtonSse4.2/dgSolver.cpp is identical with exception of a single line. Couldn't the code be re-organized to make it easier to add support for other instruction sets without so much redundancy?

The text was updated successfully, but these errors were encountered:

JulioJerez · 2020-05-02T21:52:55Z

there is not dgNewtonSse plugin.
the parallel solver is the default and use the most basic version of SSE and SSE2.
this solver is part of the engine, so it work static or as dll.
all other plugins use more advance version or SSE. instructions like gathering/ scatherisn, muladd, and some others but tha can only be load as DLLs.

the code template in teh dll solvers may look identical but the driver function are very different and that what make then incompatible. for example a cpu that does not support avx2 will not load but a that cpu may load the AVX plugin.
teh avx2 plugin, in theory execute twice as many flops because the muladd instruction, it also support gathering whi simply some operation. In practice avx2 is only about 5 to 10% faster.

in general the plugins are fasters because the use the simd vector are if they were GPU compute units. for example a avx2 solve solver 16 joints per iterations where the default solver solve one per iteration. This requires some overhead to transpose the data for array or structure to structures of array, therefore the true benefic is when thousand of joints are resolve so the cost of transposing is amortized by the gain of the solver.

Sutor vectorization doe not translate to big gain because the engine predate the date of these compiler optimization, but since the engine support scalar math, you can just define the preprocessor USE SCALAR operation and let the compiler do all the optimization.
the time I try I have never seen it doing a better job than the hand made optimizations.

JulioJerez · 2020-05-02T22:01:22Z

if you are using an older intel Pentium 3, 733MHz, MMX and SSE;
the SSE mode is not really very good because internally it is still a 64 bit bust.
so SSE is kind of a place holder for that CPU, it was not until the intel core duo that SSE became a real factor in floats throughput. In your case, the default solver is your best option, because you avoid the extra work of the plugin, remember teh plugin are faster because the capitalize in one or more feature of the instruction set. be that 8 way float vector avx, muldd, 256 bit wide internal bust, and so on. you
you cpu does nor supports any of then so teh overhead of the plugin will be a wast that cann't be recovered by special instructions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about status of dgNewtonSse #215

Questions about status of dgNewtonSse #215

JayFoxRox commented May 1, 2020

JulioJerez commented May 2, 2020

JulioJerez commented May 2, 2020 •

edited

Loading

Questions about status of dgNewtonSse #215

Questions about status of dgNewtonSse #215

Comments

JayFoxRox commented May 1, 2020

JulioJerez commented May 2, 2020

JulioJerez commented May 2, 2020 • edited Loading

JulioJerez commented May 2, 2020 •

edited

Loading