-
Notifications
You must be signed in to change notification settings - Fork 9.4k
-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSTM: ARM SIMD support #519
Comments
Should we write this code manually nowadays? Modern compilers can optimize SIMD instructions very good wihtout any manual work with intrinsics. User should just compile with -O2/3 and -march=<required_arch> I think writing a lot of manual assembler/intrinsic isn't a good idea. |
Yes, that's correct. It is already possible to do it by providing additional compiler flags when running configure ( PS. I have recently ordered a small ARM based cluster for Tesseract OCR, so I'm highly motivated to work on that issue. :-) |
Enabling NEON optimisations does result in vectorised NEON instructions for I'm not sure about On my ARM device (NVidia Tegra K1) compiling tesseract with NEON optimisations ( These compiler flags had no measurable effect on the legacy engine. Adding For the legacy engine, I used Ubuntu's tesseract package version 4.00~git2288-10f4998a-2 + the english data files from https://github.com/tesseract-ocr/tessdata/tree/590567f2 How I built it, in case it helps anyone:
|
I suggest using data files from |
you can find below code which addresses arm neon integer support. This is native implementation of intsimdmatrixneon.cpp along with changes in other files to support this. Once i get my hand on a 64b arm platform, i will work on the arm neon float support (for dotproductneon.cpp). There is about 20% improvement in performance. Please review the code and let me know your comments. https://github.com/s6ch13/tesseract/tree/arm_neon_support cheers Sriram |
Dot product acceleration using Neon was implemented in f79e52a. |
I'll try to compare the performance of both implementations later. This is an interesting example because the one here simply relies on the compiler while the other one uses handwritten NEON code. |
@stweil Do you have a result for the comparison? |
Neon is automatically detected and used with the latest code, so no special settings should be required. And no, sorry, I don't have a comparison result. |
https://github.com/tesseract-ocr/tesseract/wiki/NeuralNetsInTesseract4.00#for-open-source-contributors
The text was updated successfully, but these errors were encountered: