Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort weather stations #2

Merged
merged 1 commit into from
Dec 29, 2023
Merged

Sort weather stations #2

merged 1 commit into from
Dec 29, 2023

Conversation

nipafx
Copy link
Contributor

@nipafx nipafx commented Dec 29, 2023

I sorted the weather stations but LibreOffice Calc didn't quite hit the Java String sort order, so some small fixes remain (e.g. "Abéché" shouldn't be 1st but 3rd) - I'll leave those to others. 😁

@gunnarmorling gunnarmorling merged commit b3d23eb into gunnarmorling:main Dec 29, 2023
1 check passed
@gunnarmorling
Copy link
Owner

Excellent, thanks!

gunnarmorling pushed a commit that referenced this pull request Jan 13, 2024
* Use graal

* Use dynamic cores computer

* Use stream API to cleanup code

* Use max processors

* Use hash to avoid init string

* optimize concurrentmap init

* Smaller hash size

* Avoid checking concurrentmap

* Optimize data type

* string dedup

* Faster write

* Change base

* Remove time

* Use mul instead of div
dmitry-midokura pushed a commit to dmitry-midokura/1brc that referenced this pull request Jan 13, 2024
gunnarmorling#356)

* Use graal

* Use dynamic cores computer

* Use stream API to cleanup code

* Use max processors

* Use hash to avoid init string

* optimize concurrentmap init

* Smaller hash size

* Avoid checking concurrentmap

* Optimize data type

* string dedup

* Faster write

* Change base

* Remove time

* Use mul instead of div
jincongho added a commit to jincongho/1brc-jho that referenced this pull request Jan 15, 2024
gunnarmorling pushed a commit that referenced this pull request Jan 15, 2024
vemana added a commit to vemana/1brc that referenced this pull request Jan 17, 2024
…me further by

10%. As the jvm exits with exit(0) syscall, the kernel reclaims the
memory mappings via munmap() call. Prior to this change. all the unmap()
calls were happening right at the end as the JVM exited. This led to
serial execution of about 350ms out of 2500 ms right at the end after
each shard completed its work. We can parallelize it by exposing the
Cleaner from MappedByteBuffer and then ensure that it is truly parallel
execution of munmap() by using a non-blocking lock (SeqLock). The
optimal strategy for when each thread must call unmap() is an interesting math problem with an exact solution and this code roughly reflects it.

Commit gunnarmorling#3: Tried out reading long at a time from bytebuffer and
checking for presence of ';'.. it was slower compared to just reading int().
Removed the code for reading longs; just retaining the
hasSemicolonByte(..) check code

Commit gunnarmorling#2: Introduce processLineSlow() and processRangeSlow() for the
tial part.

Commit gunnarmorling#1: Create a separate tail piece of work for the last few lines to be
processed separately from the main loop. This allows the main loop to
read past its allocated range (by a 'long' if we reserve atleast 8 bytes
for the tail piece of work.)
gunnarmorling pushed a commit that referenced this pull request Jan 17, 2024
…m 16th based on local testing; no Unsafe; no bitwise tricks yet (#465)

* Squashing a bunch of commits together.

Commit#2; Uplift of 7% using native byteorder from ByteBuffer.
Commit#1: Minor changes to formatting.

* Commit #4: Parallelize munmap() and reduce completion time further by
10%. As the jvm exits with exit(0) syscall, the kernel reclaims the
memory mappings via munmap() call. Prior to this change. all the unmap()
calls were happening right at the end as the JVM exited. This led to
serial execution of about 350ms out of 2500 ms right at the end after
each shard completed its work. We can parallelize it by exposing the
Cleaner from MappedByteBuffer and then ensure that it is truly parallel
execution of munmap() by using a non-blocking lock (SeqLock). The
optimal strategy for when each thread must call unmap() is an interesting math problem with an exact solution and this code roughly reflects it.

Commit #3: Tried out reading long at a time from bytebuffer and
checking for presence of ';'.. it was slower compared to just reading int().
Removed the code for reading longs; just retaining the
hasSemicolonByte(..) check code

Commit #2: Introduce processLineSlow() and processRangeSlow() for the
tial part.

Commit #1: Create a separate tail piece of work for the last few lines to be
processed separately from the main loop. This allows the main loop to
read past its allocated range (by a 'long' if we reserve atleast 8 bytes
for the tail piece of work.)
gunnarmorling pushed a commit that referenced this pull request Jan 28, 2024
* Latest snapshot (#1)

preparing initial version

* Improved performance to 20seconds  (-9seconds from the previous version) (#2)

improved performance a bit

* Improved performance to 14 seconds (-6 seconds) (#3)

improved performance to 14 seconds

* sync branches (#4)

* initial commit

* some refactoring of methods

* some fixes for partitioning

* some fixes for partitioning

* fixed hacky getcode for utf8 bytes

* simplified getcode for partitioning

* temp solution with syncing

* temp solution with syncing

* new stream processing

* new stream processing

* some improvements

* cleaned stuff

* run configuration

* round buffer for the stream to pages

* not using compute since it's slower than straightforward get/put. using own byte array equals.

* using parallel gc

* avoid copying bytes when creating a station object

* formatting

* Copy less arrays. Improved performance to 12.7 seconds (-2 seconds) (#5)

* initial commit

* some refactoring of methods

* some fixes for partitioning

* some fixes for partitioning

* fixed hacky getcode for utf8 bytes

* simplified getcode for partitioning

* temp solution with syncing

* temp solution with syncing

* new stream processing

* new stream processing

* some improvements

* cleaned stuff

* run configuration

* round buffer for the stream to pages

* not using compute since it's slower than straightforward get/put. using own byte array equals.

* using parallel gc

* avoid copying bytes when creating a station object

* formatting

* some tuning to increase performance

* some tuning to increase performance

* avoid copying data; fast hashCode with slightly more collisions

* avoid copying data; fast hashCode with slightly more collisions

* cleanup (#6)

* tidy up
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants