LSTM: ARM SIMD support #519

amitdo · 2016-12-01T14:59:17Z

https://github.com/tesseract-ocr/tesseract/wiki/NeuralNetsInTesseract4.00#for-open-source-contributors

There is a C++ implementation if the hardware does not have SSE and/or AVX, but the code could benefit from SIMD implementations for other hardware, such as ARM. See the new arch directory for where to insert the code.

zamazan4ik · 2018-06-03T20:09:43Z

Should we write this code manually nowadays? Modern compilers can optimize SIMD instructions very good wihtout any manual work with intrinsics. User should just compile with -O2/3 and -march=<required_arch>

I think writing a lot of manual assembler/intrinsic isn't a good idea.

stweil · 2018-06-03T20:14:32Z

Yes, that's correct. It is already possible to do it by providing additional compiler flags when running configure (CXXFLAGS=...). But of course that should happen automatically, and we must take care that the resulting binary can also be used on different hardware. This still has to be implemented.

PS. I have recently ordered a small ARM based cluster for Tesseract OCR, so I'm highly motivated to work on that issue. :-)

drothlis · 2019-02-12T15:22:36Z

Enabling NEON optimisations does result in vectorised NEON instructions for WeightMatrix::DotProduct: https://godbolt.org/z/YCUgcb

I'm not sure about IntSimdMatrix::MatrixDotVector -- the code (and assembly) is much harder to follow.

On my ARM device (NVidia Tegra K1) compiling tesseract with NEON optimisations (-mfpu=neon-vfpv4 -mfloat-abi=hard -mcpu=cortex-a15) gave a 10-15% speedup, but the LSTM engine is still 3-10 times slower than the legacy engine: 3-30 seconds (depending on the image size) compared to 1-4 seconds for the legacy engine.

These compiler flags had no measurable effect on the legacy engine.

Adding -O3 (versus the default -O2) resulted in a further 0-20% speedup (depending on image size). In other words, a total speedup of 10-30% over -O2 without NEON. (Still many times slower than the legacy engine.)

For the legacy engine, -O3 gave me a 1-8% speedup.

I used Ubuntu's tesseract package version 4.00~git2288-10f4998a-2 + the english data files from https://github.com/tesseract-ocr/tessdata/tree/590567f2

How I built it, in case it helps anyone:

sudo apt install build-essential devscripts
sudo apt build-dep tesseract-ocr
mkdir /tmp/tesseract
cd /tmp/tesseract
apt source tesseract-ocr
cd tesseract-4.00~git2288-10f4998a
debchange -R "Rebuild with NEON optimisations";
export DEB_CFLAGS_APPEND="-mfpu=neon-vfpv4 -mfloat-abi=hard -mcpu=cortex-a15"
debuild -i -us -uc -b  # creates ../*.deb

stweil · 2019-02-12T16:04:05Z

I suggest using data files from tessdata_fast instead of those from tessdata. In addition, you could try -c dotproduct=native which should use Neon if you compiled on a Neon machine.

s6ch13 · 2020-01-09T05:22:18Z

you can find below code which addresses arm neon integer support. This is native implementation of intsimdmatrixneon.cpp along with changes in other files to support this. Once i get my hand on a 64b arm platform, i will work on the arm neon float support (for dotproductneon.cpp). There is about 20% improvement in performance. Please review the code and let me know your comments.

https://github.com/s6ch13/tesseract/tree/arm_neon_support

cheers Sriram

amitdo · 2020-05-27T03:34:42Z

Dot product acceleration using Neon was implemented in f79e52a.

stweil · 2020-05-27T05:04:09Z

I'll try to compare the performance of both implementations later. This is an interesting example because the one here simply relies on the compiler while the other one uses handwritten NEON code.

Shreeshrii · 2020-11-21T17:00:57Z

@stweil Do you have a result for the comparison?
What are the recommended settings to use for Neon?

stweil · 2020-11-21T17:26:47Z

Neon is automatically detected and used with the latest code, so no special settings should be required.

And no, sorry, I don't have a comparison result.

Shreeshrii mentioned this issue Mar 4, 2018

SIMD detection - OS support #1348

Closed

amitdo added the SIMD label May 14, 2020

amitdo closed this as completed Nov 22, 2020

hexdreamer mentioned this issue Feb 16, 2022

Support for NEON pokkinandpup/TesseractBuild#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSTM: ARM SIMD support #519

LSTM: ARM SIMD support #519

amitdo commented Dec 1, 2016

zamazan4ik commented Jun 3, 2018

stweil commented Jun 3, 2018 •

edited

Loading

drothlis commented Feb 12, 2019

stweil commented Feb 12, 2019 •

edited

Loading

s6ch13 commented Jan 9, 2020

amitdo commented May 27, 2020

stweil commented May 27, 2020

Shreeshrii commented Nov 21, 2020

stweil commented Nov 21, 2020

LSTM: ARM SIMD support #519

LSTM: ARM SIMD support #519

Comments

amitdo commented Dec 1, 2016

zamazan4ik commented Jun 3, 2018

stweil commented Jun 3, 2018 • edited Loading

drothlis commented Feb 12, 2019

stweil commented Feb 12, 2019 • edited Loading

s6ch13 commented Jan 9, 2020

amitdo commented May 27, 2020

stweil commented May 27, 2020

Shreeshrii commented Nov 21, 2020

stweil commented Nov 21, 2020

stweil commented Jun 3, 2018 •

edited

Loading

stweil commented Feb 12, 2019 •

edited

Loading