Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added to stable comparison #1

Open
buybackoff opened this issue Jan 8, 2024 · 15 comments
Open

Added to stable comparison #1

buybackoff opened this issue Jan 8, 2024 · 15 comments

Comments

@buybackoff
Copy link

So far the top result overall. Great job!

https://github.com/buybackoff/1brc?tab=readme-ov-file#native

image

@lehuyduc
Copy link
Owner

lehuyduc commented Jan 8, 2024

Thanks! Can you run again but with N_THREADS set to number of threads (cores * 2) on the test machine? Running 128 threads on a 12 thread PC is slower than running just 12

@lehuyduc
Copy link
Owner

lehuyduc commented Jan 9, 2024

I've updated the repo so that run_cpp.sh automatically finds number of threads to use, and compile with that number as constexpr. Could you pull and run it again? Thanks!

@AlexanderYastrebov
Copy link

AlexanderYastrebov commented Jan 9, 2024

On my machine I get

$ chmod +x run_cpp.sh
$ ./run_cpp.sh 
Using 8 threads
init mmap file cost = 0.041405ms
Parallel process file cost = 18528.8ms
Aggregate stats cost = 1.22172ms
Output stats cost = 1.90048ms
Runtime inside main = 18532ms
Time to munmap = 263.052ms

real    0m18,809s
user    0m30,831s
sys     0m7,561s
867e3fb8c93a52eddb94bd99b8b87c1f355bd225a58249eba97ce4eb87eb58cc  result.txt

and for my go version https://github.com/AlexanderYastrebov/1brc/tree/go-implementation/src/main/go

$ (cd src/main/go/ && go build -o /tmp/1brc-go && time /tmp/1brc-go ../../../measurements-109.txt) | sha256sum 

real    0m20,917s
user    0m45,232s
sys     0m7,821s
867e3fb8c93a52eddb94bd99b8b87c1f355bd225a58249eba97ce4eb87eb58cc  -

@buybackoff would be possible to add a record for go implementation in your chart?

I proposed in gunnarmorling/1brc#253 to support Docker builds but this is out of scope for now as per gunnarmorling/1brc#182 (comment)

@lehuyduc
Copy link
Owner

lehuyduc commented Jan 9, 2024

That looks unusually slow o.O I get 13.5-14.5 with 1 thread on 2 separate test PC.

I think PC specs should be included any time a benchmark number is posted. Also any related inputs

@AlexanderYastrebov
Copy link

AlexanderYastrebov commented Jan 9, 2024

@lehuyduc Sure, that's why I would like to know how it stands compared to your version, C# and best java.
PS: I posted my specs in the gunnarmorling/1brc#67

@buybackoff
Copy link
Author

@buybackoff would be possible to add a record for go implementation in your chart?

@AlexanderYastrebov Will do eventually, not today/tomorrow.

@lehuyduc I'm reading your 1brc_final_valid.cpp and this is cool stuff. I would say [Samuel L. Jackson from Pulp Fiction]-level cool. And it's quite simple nevertheless, but the some tricks/hacks are beautiful. Very educating!

@lehuyduc
Copy link
Owner

@lehuyduc I'm reading your 1brc_final_valid.cpp and this is cool stuff. I would say [Samuel L. Jackson from Pulp Fiction]-level cool. And it's quite simple nevertheless, but the some tricks/hacks are beautiful. Very educating!

@buybackoff I'm not done yet, still has some tiny tricks left :D Also the latest version is ~28% faster compared to the last one you tested

@buybackoff
Copy link
Author

@AlexanderYastrebov sorry but it's so inconvenient. Not only it's in Go and I need to download tools, it's inside this huge Java fork. If you think it's a real contender for top spots then please create a separate repo and instructions to build. Also the file path should be the first argument to an executable.

@AlexanderYastrebov
Copy link

AlexanderYastrebov commented Jan 12, 2024

@buybackoff Hi, no worries. I hope maybe I can get it into the main repo gunnarmorling/1brc#298 to compare against java

@tivrfoa
Copy link

tivrfoa commented Jan 13, 2024

@AlexanderYastrebov sorry but it's so inconvenient. Not only it's in Go and I need to download tools, it's inside this huge Java fork. If you think it's a real contender for top spots then please create a separate repo and instructions to build. Also the file path should be the first argument to an executable.

You just need to download this file:
https://github.com/gunnarmorling/1brc/blob/2e2699216bf21f0bf3639595724aeac540a9b555/src/main/go/AlexanderYastrebov/calc.go

go build calc.go (ridiculous fast! go compile time is amazing =))

./calc measurements.txt

I think this go version would be performing pretty well in this rank:
https://hotforknowledge.com/2024/01/13/7-1brc-in-dotnet-even-faster-than-java-cpp/

It would be really nice if @buybackoff could run your solution.

@buybackoff
Copy link
Author

@lehuyduc Hi, it looks like your yesterday version is much faster on 10K, but much slower on the default data. Is that expected?

run_cpp.sh expects main.cpp, but I used 1brc_final_valid.cpp as before.

@lehuyduc
Copy link
Owner

Hmm, I think this might be due to CPU difference. I tested the latest code on AMD 7502P directly instead of AMD 2950X like before. Both default + 10K results are faster on 7502P.

What results do you get when running the latest version?

@buybackoff
Copy link
Author

The default goes from 1.649 to 1.792. Tried both 6/12 compilation.

10K goes from 3.081 to 2.863

Running only on P-cores of i5-13500.

@lehuyduc
Copy link
Owner

Hmm, I really don't know what cause this. It might even be due to Intel vs AMD difference. I tested on 2950X again, and it didn't cause any slowdown (but no speedup because the CPU is quite old).

I'll try looking into it later. But lower 10K time means we can still keep it :D

@lehuyduc
Copy link
Owner

Oh, I found the reason. It's a typo:

  size_t end_idx0 = idx1 - 1;
  size_t end_idx1 = to_byte;
  size_t end_idx0_pre = end_idx0 - 2 * MAX_KEY_LENGTH;
  size_t end_idx1_pre = end_idx0 - 2 * MAX_KEY_LENGTH; => should be end_idx1

  handle_line_packed(data + idx0, data + end_idx0_pre, hmaps[tid], idx0);
  handle_line_packed(data + idx1, data + end_idx1_pre, hmaps[tid], idx1);  

I'll test a few more ideas and push again later this weekend. Thanks for helping me notice the problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants