Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce clock syscalls #4303

Merged
merged 2 commits into from
Aug 23, 2024
Merged

Reduce clock syscalls #4303

merged 2 commits into from
Aug 23, 2024

Conversation

heshpdx
Copy link
Contributor

@heshpdx heshpdx commented Aug 22, 2024

I noticed significant calls to clock_gettime so I investigated and found that we could eliminate them when they are not in use. This was found from the continuing scrutiny of SPEC CPUv8 benchmark development. @zdenop

Gate the sampling of the clock by the tessedit_timing_debug flag, which is the only time it gets used anyway. This eliminates unnecessary clock_gettime() system calls.

Gate the sample of the clock by the tessedit_timing_debug flag, which is the only time it gets used anyway. This eliminates unnecessary clock_gettime() system calls.
Copy link
Member

@stweil stweil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether it would be better to apply only the 2nd of the two modifications.

Comment on lines 1315 to 1318
clock_t start_t;
if (tessedit_timing_debug) {
start_t = clock();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that the if statement is less expensive than the clock() call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call to clock() results in a vdso syscall to the operating system. If you are running tesseract in batch mode with multiple copies running on the same machine, this seemingly innocuous call could result in a storm of useless syscalls to the OS. It is definitely more expensive than a single conditional that all branch predictors would get correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of whether one believes this is slower or faster, the start_t variable has no consumers unless tessedit_timing_debug is enabled.

if (tessedit_timing_debug) {
clock_t ocr_t = clock();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the commit is a good change.

@zdenop
Copy link
Contributor

zdenop commented Aug 22, 2024

@stweil: what about switching to c++11 std::chrono::high_resolution_clock::now() instead of std::clock() (see e.g. https://stackoverflow.com/questions/28396014/why-is-clock-considered-bad)?
It should provide higher precision and the "same" result on different platforms comparing to std::clock()

if (tessedit_timing_debug) {
clock_t ocr_t = clock();
tprintf("%s (ocr took %.2f sec)\n", word_data->word->best_choice->unichar_string().c_str(),
static_cast<double>(ocr_t - start_t) / CLOCKS_PER_SEC);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bad news is that g++ is not clever enough and produces a warning with this PR:

src/ccmain/control.cpp:1373:39: warning: 'start_t' may be used uninitialized [-Wmaybe-uninitialized]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What g++ version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that. I will initialize it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But init is exactly wasted cycles again?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to use [[maybe_unused]] attribute here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What g++ version?

g++ (Debian 12.2.0-14) 12.2.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting to check it on gcc-14

Copy link
Member

@stweil stweil Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apple clang version 15.0.0 (clang-1500.3.9.4) and g++-14 (Homebrew GCC 14.1.0_2) 14.1.0 show the same warning. This is not surprising because the compiler must assume that tessedit_timing_debug might be changed between the two conditional statements. Therefore a local assignment const bool timing_debug = tessedit_timing_debug; helps. It fixes the warning for g++-14, but not for clang 15.

@stweil
Copy link
Member

stweil commented Aug 22, 2024

I wonder whether we should simply remove this timing code (or compile it conditionally). The debug messages are printed per word with a resolution of 10 ms. In a short test I got the results 0.00 sec, 0.01 sec and 0.02 sec. That's not really helpful.

@egorpugin
Copy link
Contributor

Apply for now and postpone to timing/debugging related refactoring.

src/ccmain/control.cpp Outdated Show resolved Hide resolved
@stweil stweil merged commit 3b9d119 into tesseract-ocr:main Aug 23, 2024
7 checks passed
@stweil
Copy link
Member

stweil commented Aug 23, 2024

I have run a short test to see the effect of less calls of clock(). 10 million calls take 2.8 s on my MacBook or 4.5 s on a Linux server. So the effect of this PR is very small. It saves two calls per word or less than a second per one million words.

@heshpdx
Copy link
Contributor Author

heshpdx commented Aug 23, 2024

The effects will be more egregious when running in batch mode, e.g. with 192 instances executing simultaneously on a modern many-core server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants