Algorithmic Tweaks, parallel stream, memory mapped file #3

twobiers · 2023-12-30T00:03:07Z

Thank you for the interesting challenge.
I've set myself a limit to finish this evening to not invest so much time and this is what I came up with. Basically just algorithmic improvements on hot Code paths and utilizing a parallel stream.

Current results on my machine (AMD Ryzen 7 PRO 4750G 16 core, 48GB RAM) - Latest Temurin JDK:

#	Result (m:s:ms)	Implementation
1.	00:26.06	CalculateAverage_twobiers.java
2.	02:57.77	CalculateAverage.java (baseline)

I'm curious what others find.

I thought about caching some parts as it is most likely static data. However, I think this would not be in the spirit of the challenge.

gunnarmorling · 2023-12-30T18:01:00Z

Hey, wow, that's awesome, thanks a lot for this submission! I'll merge and evaluate it once I've officially launched and announced this challenge (planned for tomorrow).

twobiers · 2023-12-31T11:22:03Z

Oh, I saw it in my GitHub Feed and thought it is already open for submissions. Actually I misread the deadline date and assumed the challenge would end today. I'm sorry for the inconvenience.

In that case I will convert the PR to a draft and take a look in January again to find further optimizations.

gunnarmorling · 2023-12-31T16:18:13Z

LOL, no worries, it's not a convinience whatsoever. To the contrary, it's very encouraging :)

In that case I will convert the PR to a draft and take a look in January again to find further optimizations.

+1. You'll have time until Jan 31. Note I'll do one more tweak and that is to also ask to emit min and max value per station. This is to avoid somebody cheats by only processing a part of the dataset (which should be obvious from looking at the code, but it might also be easy to miss).

twobiers · 2024-01-02T20:31:21Z

I think I'm done for now, lacking more ideas. Might take a look again in 1-2 weeks

lobaorn · 2024-01-03T15:55:39Z

Shamelessly sharing this idea for JVM/GC tuning in another PR/discussion? #15 (comment)

gunnarmorling · 2024-01-05T09:30:30Z

Could you please run test.sh twobiers and make sure all the tests pass? Thanks!

gunnarmorling · 2024-01-05T10:10:27Z

Issue seems to be that you configure Shenandoah GC. Which JDK distro should this be run on?

twobiers · 2024-01-05T10:21:45Z

I used the latest Temurin distribution that is available in sdkman.

gunnarmorling · 2024-01-05T16:27:25Z

Still seeing test failures. Can you also please adjust your launch script to set the right JDK. See @royvanrijn's one as an example. Thanks.

twobiers · 2024-01-05T18:13:49Z

@gunnarmorling never change a running system... Tests should pass now.

gunnarmorling · 2024-01-05T19:18:10Z

51.678sec. Thx for being the first participant to this one!

…me further by 10%. As the jvm exits with exit(0) syscall, the kernel reclaims the memory mappings via munmap() call. Prior to this change. all the unmap() calls were happening right at the end as the JVM exited. This led to serial execution of about 350ms out of 2500 ms right at the end after each shard completed its work. We can parallelize it by exposing the Cleaner from MappedByteBuffer and then ensure that it is truly parallel execution of munmap() by using a non-blocking lock (SeqLock). The optimal strategy for when each thread must call unmap() is an interesting math problem with an exact solution and this code roughly reflects it. Commit gunnarmorling#3: Tried out reading long at a time from bytebuffer and checking for presence of ';'.. it was slower compared to just reading int(). Removed the code for reading longs; just retaining the hasSemicolonByte(..) check code Commit gunnarmorling#2: Introduce processLineSlow() and processRangeSlow() for the tial part. Commit gunnarmorling#1: Create a separate tail piece of work for the last few lines to be processed separately from the main loop. This allows the main loop to read past its allocated range (by a 'long' if we reserve atleast 8 bytes for the tail piece of work.)

…m 16th based on local testing; no Unsafe; no bitwise tricks yet (#465) * Squashing a bunch of commits together. Commit#2; Uplift of 7% using native byteorder from ByteBuffer. Commit#1: Minor changes to formatting. * Commit #4: Parallelize munmap() and reduce completion time further by 10%. As the jvm exits with exit(0) syscall, the kernel reclaims the memory mappings via munmap() call. Prior to this change. all the unmap() calls were happening right at the end as the JVM exited. This led to serial execution of about 350ms out of 2500 ms right at the end after each shard completed its work. We can parallelize it by exposing the Cleaner from MappedByteBuffer and then ensure that it is truly parallel execution of munmap() by using a non-blocking lock (SeqLock). The optimal strategy for when each thread must call unmap() is an interesting math problem with an exact solution and this code roughly reflects it. Commit #3: Tried out reading long at a time from bytebuffer and checking for presence of ';'.. it was slower compared to just reading int(). Removed the code for reading longs; just retaining the hasSemicolonByte(..) check code Commit #2: Introduce processLineSlow() and processRangeSlow() for the tial part. Commit #1: Create a separate tail piece of work for the last few lines to be processed separately from the main loop. This allows the main loop to read past its allocated range (by a 'long' if we reserve atleast 8 bytes for the tail piece of work.)

* Latest snapshot (#1) preparing initial version * Improved performance to 20seconds (-9seconds from the previous version) (#2) improved performance a bit * Improved performance to 14 seconds (-6 seconds) (#3) improved performance to 14 seconds * sync branches (#4) * initial commit * some refactoring of methods * some fixes for partitioning * some fixes for partitioning * fixed hacky getcode for utf8 bytes * simplified getcode for partitioning * temp solution with syncing * temp solution with syncing * new stream processing * new stream processing * some improvements * cleaned stuff * run configuration * round buffer for the stream to pages * not using compute since it's slower than straightforward get/put. using own byte array equals. * using parallel gc * avoid copying bytes when creating a station object * formatting * Copy less arrays. Improved performance to 12.7 seconds (-2 seconds) (#5) * initial commit * some refactoring of methods * some fixes for partitioning * some fixes for partitioning * fixed hacky getcode for utf8 bytes * simplified getcode for partitioning * temp solution with syncing * temp solution with syncing * new stream processing * new stream processing * some improvements * cleaned stuff * run configuration * round buffer for the stream to pages * not using compute since it's slower than straightforward get/put. using own byte array equals. * using parallel gc * avoid copying bytes when creating a station object * formatting * some tuning to increase performance * some tuning to increase performance * avoid copying data; fast hashCode with slightly more collisions * avoid copying data; fast hashCode with slightly more collisions * cleanup (#6) * tidy up

twobiers added 8 commits December 29, 2023 23:56

First performance tweaks

39452f5

further tweaks

eed49f3

collect into a treemap

9ebd8fa

Tweak JVM options

929629e

Inline rounding into collector

7de390c

reduce some operations

dac013b

oops, add missing braces

0418608

tweak JVM options

6b162b2

twobiers marked this pull request as draft December 31, 2023 11:22

twobiers added 3 commits December 31, 2023 22:32

small fixes

fc87321

Merge branch 'gunnarmorling:main' into main

8816e3a

add min and max to processing

d7023e0

twobiers changed the title ~~Add implementation using simple tweaks~~ Algorithmic Tweaks, parallel stream Jan 1, 2024

twobiers added 7 commits January 1, 2024 18:53

Merge branch 'gunnarmorling:main' into main

a2fe19a

fix min

be09e50

remove compact strings

d84a073

replace sumWithCompensation with naive sum implementation

724f8cc

use UseShenandoahGC

83b7367

integrate mmap

b04f3f9

integrate mmap

cb710a0

twobiers marked this pull request as ready for review January 2, 2024 20:30

twobiers changed the title ~~Algorithmic Tweaks, parallel stream~~ Algorithmic Tweaks, parallel stream, memory mapped file Jan 2, 2024

gunnarmorling added the test failure label Jan 5, 2024

twobiers added 3 commits January 5, 2024 18:43

Merge branch 'gunnarmorling:main' into main

b157908

Fix messed up array logic

c929408

Set jdk version

2ec6b35

gunnarmorling merged commit d617039 into gunnarmorling:main Jan 5, 2024

gunnarmorling mentioned this pull request Jan 7, 2024

Submitting my version #75

Merged

jincongho added a commit to jincongho/1brc-jho that referenced this pull request Jan 18, 2024

Submission gunnarmorling#3: jincongho

336052c

gunnarmorling pushed a commit that referenced this pull request Jan 19, 2024

Submission #3: jincongho (#482)

ce8fe41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Algorithmic Tweaks, parallel stream, memory mapped file #3

Algorithmic Tweaks, parallel stream, memory mapped file #3

twobiers commented Dec 30, 2023 •

edited

Loading

gunnarmorling commented Dec 30, 2023

twobiers commented Dec 31, 2023 •

edited

Loading

gunnarmorling commented Dec 31, 2023

twobiers commented Jan 2, 2024

lobaorn commented Jan 3, 2024

gunnarmorling commented Jan 5, 2024 •

edited

Loading

gunnarmorling commented Jan 5, 2024

twobiers commented Jan 5, 2024

gunnarmorling commented Jan 5, 2024

twobiers commented Jan 5, 2024

gunnarmorling commented Jan 5, 2024

Algorithmic Tweaks, parallel stream, memory mapped file #3

Algorithmic Tweaks, parallel stream, memory mapped file #3

Conversation

twobiers commented Dec 30, 2023 • edited Loading

gunnarmorling commented Dec 30, 2023

twobiers commented Dec 31, 2023 • edited Loading

gunnarmorling commented Dec 31, 2023

twobiers commented Jan 2, 2024

lobaorn commented Jan 3, 2024

gunnarmorling commented Jan 5, 2024 • edited Loading

gunnarmorling commented Jan 5, 2024

twobiers commented Jan 5, 2024

gunnarmorling commented Jan 5, 2024

twobiers commented Jan 5, 2024

gunnarmorling commented Jan 5, 2024

twobiers commented Dec 30, 2023 •

edited

Loading

twobiers commented Dec 31, 2023 •

edited

Loading

gunnarmorling commented Jan 5, 2024 •

edited

Loading