Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug musl build #263

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
Draft

Debug musl build #263

wants to merge 10 commits into from

Conversation

iamazeem
Copy link
Collaborator

@iamazeem iamazeem commented Nov 6, 2024

Signed-off-by: Azeem Sajid [email protected]

@iamazeem iamazeem self-assigned this Nov 6, 2024
@iamazeem
Copy link
Collaborator Author

iamazeem commented Nov 6, 2024

@iamazeem iamazeem marked this pull request as draft November 6, 2024 11:20
@liquidaty
Copy link
Owner

Thank you. I wasn't able to find any significant differences in the config.mk, though there were a few that probably don't matter. I'll have to look more closely later. In the meantime, not sure if this is helpful but here's the final compile+link command when I run, that produces a binary about 2.5MB in size. Maybe it's related to gcc version? Mine is x86_64-linux-musl-gcc (GCC) 9.4.0.

In any case, if we can't figure this out it's not a big deal for now. The musl build is useful for some cases, but will be very significantly limited by its inability to use dlopen which therefore will make extensions unavailable.

x86_64-linux-musl-gcc -pipe -ffunction-sections -fdata-sections -fsigned-char  -fpic -std=gnu11 -Wunused -O3 -Wshadow -Wall -Wextra -DNDEBUG -Wno-gnu-statement-expression -Wno-missing-braces -pedantic -DSTDC_HEADERS -D_GNU_SOURCE   -I/opt/musl-linux/include   -DZSV_EXTRAS -DZSVSHEET_BUILD -DHAVE_NCURSESW -DNCURSES_STATIC -DUSE_JQ -DHAVE_MEMMEM -DHAVE___BUILTIN_EXPECT -DHAVE___BUILTIN_EXPECT_WITH_PROBABILITY -I/src/zsv/app/external/yajl_helper -I/src/zsv/app/external/yajl/build/yajl-2.1.1/include -I/src/zsv/app/external/utf8proc-2.6.1 -DUTF8PROC -DUTF8PROC_STATIC  -I/src/zsv/include -o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/bin/cli /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/cli_cli.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/cli_echo.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/cli_select.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/cli_desc.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/cli_count.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/cli_paste.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/cli_2tsv.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/cli_pretty.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/cli_sql.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/cli_flatten.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/cli_2json.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/cli_serialize.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/cli_stack.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/cli_2db.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/cli_compare.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/cli_prop.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/cli_rm.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/cli_mv.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/cli_jq.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/cli_overwrite.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/cli_sheet.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/utils/writer.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/utils/file.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/utils/err.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/utils/signal.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/utils/mem.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/utils/clock.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/utils/arg.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/utils/dl.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/utils/string.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/utils/dirs.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/utils/prop.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/utils/cache.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/utils/jq.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/utils/os.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/utils/overwrite.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc-external/yajl/yajl.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc-external/yajl/yajl_alloc.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc-external/yajl/yajl_buf.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc-external/yajl/yajl_encode.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc-external/yajl/yajl_gen.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc-external/yajl/yajl_lex.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc-external/yajl/yajl_parser.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc-external/yajl/yajl_tree.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc-external/yajl/yajl_version.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc-external/yajl_helper/yajl_helper.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/utils/json.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc-external/sqlite3/sqlite3_and_csv_vtab.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc-external/utf8proc-2.6.1/utf8proc.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc-external/inih/inih.o -L/opt/musl-linux/lib -lzsv  -lpthread  -lncursesw -fwhole-program -ldl  /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc-external/json_writer-1.0/jsonwriter.o /src/zsv/build/Darwin/rel/x86_64-linux-musl-gcc/objs/utils/db.o -I/src/zsv/app/external/yajl/build/yajl-2.1.1/include -I/src/zsv/app/external/yajl_helper -I/opt/musl-linux/include -I/src/zsv/app/external/sqlite3 -I/src/zsv/app/external/sglib  -ljq -lm -pthread  -static

@iamazeem
Copy link
Collaborator Author

iamazeem commented Nov 7, 2024

@liquidaty:
I'm verifying the differences between compiler versions i.e. GCC 13.2.1 vs 9.4.0.
Please share your generated zsv binary with GCC 9.4.0 here.
Need to run benchmarks on that. Thanks!

@iamazeem
Copy link
Collaborator Author

iamazeem commented Nov 7, 2024

@liquidaty

In any case, if we can't figure this out it's not a big deal for now. The musl build is useful for some cases, but will be very significantly limited by its inability to use dlopen which therefore will make extensions unavailable.

Regarding future directions and road map, do you think we should keep the musl build?
How do you see it?

@iamazeem
Copy link
Collaborator Author

iamazeem commented Nov 7, 2024

@liquidaty: Ran benchmarks with GCC 9.3.0 (locally via Docker and CI) and 9.4.0 (locally via musl toolchain).

Here are the select command results only (ran multiple times, pasting only one instance here for brevity):

GCC 9.3.0

1 | zsv : real 0.21 user 0.20 sys 0.00
2 | zsv : real 0.21 user 0.20 sys 0.00
3 | zsv : real 0.21 user 0.21 sys 0.00
4 | zsv : real 0.21 user 0.21 sys 0.00
5 | zsv : real 0.21 user 0.21 sys 0.00
1 | xsv : real 0.31 user 0.31 sys 0.00
2 | xsv : real 0.31 user 0.31 sys 0.00
3 | xsv : real 0.31 user 0.31 sys 0.00
4 | xsv : real 0.32 user 0.31 sys 0.00
5 | xsv : real 0.31 user 0.31 sys 0.00
1 | tsv : real 0.15 user 0.13 sys 0.01
2 | tsv : real 0.15 user 0.14 sys 0.00
3 | tsv : real 0.15 user 0.14 sys 0.01
4 | tsv : real 0.15 user 0.14 sys 0.01
5 | tsv : real 0.15 user 0.14 sys 0.00

GCC 9.4.0

1 | zsv : real 0.31 user 0.30 sys 0.00
2 | zsv : real 0.30 user 0.29 sys 0.01
3 | zsv : real 0.30 user 0.30 sys 0.00
4 | zsv : real 0.30 user 0.29 sys 0.00
5 | zsv : real 0.29 user 0.29 sys 0.00
1 | xsv : real 0.43 user 0.42 sys 0.00
2 | xsv : real 0.42 user 0.40 sys 0.01
3 | xsv : real 0.41 user 0.41 sys 0.00
4 | xsv : real 0.41 user 0.40 sys 0.00
5 | xsv : real 0.40 user 0.40 sys 0.00
1 | tsv : real 0.12 user 0.12 sys 0.00
2 | tsv : real 0.12 user 0.11 sys 0.00
3 | tsv : real 0.12 user 0.11 sys 0.00
4 | tsv : real 0.12 user 0.11 sys 0.00
5 | tsv : real 0.12 user 0.11 sys 0.00

It's clearly slower than gcc and clang builds, both locally and in the CI.

As for the size, the size of libncursesw.a on alpine container is smaller than built with musl toolchain:

## alpine:latest container
760782 /usr/lib/libncursesw.a

## built with musl toolchain
808316 ncurses-6.5/lib/libncursesw.a

The compiler must be adding the rest of the size.
Stripping zsv will reduce that.


Apart from that, you might want to run benchmarks on your side on a Linux box.
Just need to create zsv-0.3.9-alpha-amd64-linux-musl.tar.gz from your locally built zsv's amd64-linux-musl directory and copy it to your Linux box.

The sequence of commands will look like this (need to update env vars and paths in case of musl toolchain):

PREFIX=amd64-linux-musl \
CC=musl-gcc \
MAKE=make \
ARTIFACT_DIR=artifacts \
STATIC_BUILD=1 \
CFLAGS="-I/path/to/ncurses-6.5/include" \
LDFLAGS="-L/path/to/ncurses-6.5/lib" \
RUN_TESTS=false \
SKIP_ZIP_ARCHIVE=true \
SKIP_TAR_ARCHIVE=true \
./scripts/ci-build.sh

tar -czvf "zsv-0.3.9-alpha-amd64-linux-musl.tar.gz" "amd64-linux-musl"

git clone https://github.com/liquidaty/zsv.git
cd zsv
mkdir -p .benchmarks
cp /path/to/zsv-0.3.9-alpha-amd64-linux-musl.tar.gz .benchmarks
ZSV_LINUX_BUILD_COMPILER=musl ./scripts/ci-run-benchmarks.sh

@iamazeem
Copy link
Collaborator Author

iamazeem commented Nov 9, 2024

@liquidaty: UPDATE
Performed some digging into gcc, clang and musl Linux binaries with perf for zsv select benchmarks.
The initial analysis shows that for gcc and clang builds the memcpy call is optimized away but not for musl.

Command:

zsv select -W -n -- 2 1 3-7 < worldcitiespop_mil.csv >/dev/null

Here are the perf reports:

gcc

# Samples: 345  of event 'cycles'
# Event count (approx.): 416029728
#
# Overhead  Command  Shared Object     Symbol                           
# ........  .......  ................  .................................
#
    35.86%  zsv      zsv               [.] zsv_scan
    25.48%  zsv      zsv               [.] zsv_writer_cell
    14.77%  zsv      zsv               [.] zsv_select_data_row
     6.95%  zsv      zsv               [.] zsv_get_cell
     6.13%  zsv      libc.so.6         [.] __memmove_evex_unaligned_erms
     4.20%  zsv      [unknown]         [k] 0xffffffffa1cb852e
     4.06%  zsv      zsv               [.] zsv_get_cell_1
     0.82%  zsv      [unknown]         [k] 0xffffffffa197a79d
     0.28%  zsv      zsv               [.] zsv_cell_count
     0.28%  zsv      [unknown]         [k] 0xffffffffa18c02f9
     0.28%  zsv      [unknown]         [k] 0xffffffffa23d4c15
     0.28%  zsv      libc.so.6         [.] _IO_file_xsgetn
     0.28%  zsv      zsv               [.] 0x00000000000118a4
     0.28%  zsv      [unknown]         [k] 0xffffffffa1980c60
     0.07%  zsv      [unknown]         [k] 0xffffffffa16d952b
     0.00%  perf-ex  [unknown]         [k] 0xffffffffa164fa46
     0.00%  perf-ex  [unknown]         [k] 0xffffffffa169cea4

clang

# Samples: 349  of event 'cycles'
# Event count (approx.): 420920770
#
# Overhead  Command  Shared Object     Symbol                           
# ........  .......  ................  .................................
#
    40.02%  zsv      zsv               [.] zsv_scan
    26.12%  zsv      zsv               [.] zsv_writer_cell
    13.29%  zsv      zsv               [.] zsv_select_data_row
     4.57%  zsv      libc.so.6         [.] __memmove_evex_unaligned_erms
     4.20%  zsv      zsv               [.] zsv_get_cell
     3.71%  zsv      [unknown]         [k] 0xffffffffa1cb852e
     3.01%  zsv      zsv               [.] zsv_get_cell_1
     1.92%  zsv      zsv               [.] memcpy@plt
     1.10%  zsv      [unknown]         [k] 0xffffffffa18bf960
     0.84%  zsv      zsv               [.] zsv_cell_count
     0.31%  zsv      zsv               [.] zsv_csv_quote
     0.28%  zsv      [unknown]         [k] 0xffffffffa18c0309
     0.27%  zsv      [unknown]         [k] 0xffffffffa190766b
     0.24%  zsv      zsv               [.] zsv_select_main
     0.11%  zsv      [unknown]         [k] 0xffffffffa1e53a95
     0.00%  perf-ex  [unknown]         [k] 0xffffffffa1650554
     0.00%  perf-ex  [unknown]         [k] 0xffffffffa169cea4

musl

# Samples: 1K of event 'cycles'
# Event count (approx.): 1190217195
#
# Overhead  Command  Shared Object     Symbol                 
# ........  .......  ................  .......................
#
    56.56%  zsv      zsv               [.] memcpy
    13.48%  zsv      zsv               [.] zsv_scan
    13.14%  zsv      zsv               [.] zsv_writer_cell
     7.08%  zsv      zsv               [.] zsv_select_data_row
     3.72%  zsv      zsv               [.] zsv_get_cell_1
     2.64%  zsv      zsv               [.] zsv_get_cell
     2.19%  zsv      [unknown]         [k] 0xffffffffa1cb852e
     0.30%  zsv      [unknown]         [k] 0xffffffffa198532f
     0.20%  zsv      zsv               [.] zsv_cell_count
     0.20%  zsv      [unknown]         [k] 0xffffffffa18c02f9
     0.10%  zsv      [unknown]         [k] 0xffffffffa190766b
     0.10%  zsv      [unknown]         [k] 0xffffffffa1c51bc6
     0.10%  zsv      [unknown]         [k] 0xffffffffa18c3ce0
     0.10%  zsv      [unknown]         [k] 0xffffffffa1cb52e6
     0.09%  zsv      [unknown]         [k] 0xffffffffa1962e06
     0.03%  zsv      [unknown]         [k] 0xffffffffa16f7cbf
     0.00%  perf-ex  [unknown]         [k] 0xffffffffa164facc
     0.00%  perf-ex  [unknown]         [k] 0xffffffffa169cea4

Disassmebly of above memcpy (source):

Percent│
       │
       │
       │    Disassembly of section .text:
       │
       │    0000000000595251 <.text+0x194241>:
       │      mov   %rdi,%rax
       │      cmp   $0x8,%rdx
  0.70 │    ↓ jb    1d
  0.17 │      test  $0x7,%edi
       │    ↓ je    1d
  2.08 │11:   movsb %ds:(%rsi),%es:(%rdi)
  0.17 │      dec   %rdx
       │      test  $0x7,%edi
  1.32 │    ↑ jne   11
  0.87 │1d:   mov   %rdx,%rcx
  0.17 │      shr   $0x3,%rcx
 57.57 │      rep   movsq %ds:(%rsi),%es:(%rdi)
 13.83 │      and   $0x7,%edx
  0.17 │    ↓ je    31
 13.28 │2c:   movsb %ds:(%rsi),%es:(%rdi)
  1.56 │      dec   %edx
  2.78 │    ↑ jne   2c
  5.33 │31: ← ret

Apart from above, went through the tsv-utils source for buffered output handling.
It's using an internal buffer with a maximum size of 4194304 (4 MB).
Though, setting ZSV_OUTPUT_BUFF_SIZE to 4 MB doesn't have any significanct effect for count or select on my machine.

@iamazeem
Copy link
Collaborator Author

iamazeem commented Nov 9, 2024

@liquidaty:
Switched the usage of memcpy in zsv_output_buff_write with GCC 14.2.0's implementation in d64ec44.
Ran benchmarks locally and via CI. Both are now comparable.

musl build has been used from this workflow run:
https://github.com/liquidaty/zsv/actions/runs/11758172816

Results from CI (benchmarks workflow used from #274):
https://github.com/liquidaty/zsv/actions/runs/11758225823/attempts/1#summary-32756351774

$ ZSV_LINUX_BUILD_COMPILER=musl ./scripts/ci-run-benchmarks.sh
[INF] Running ./scripts/ci-run-benchmarks.sh
[INF] OS: linux
[INF] RUNS: 6
[INF] SKIP_FIRST_RUN: true
[INF] BENCHMARKS_DIR: .benchmarks
[INF] ZSV_TAG: 0.3.9-alpha
[INF] WORKFLOW_RUN_ID: 11758172816
[INF] ZSV_LINUX_BUILD_COMPILER: musl
[INF] Downloading CSV file... [worldcitiespop_mil.csv] [DOWNLOADED]
[INF] Downloading... [zsv-0.3.9-alpha-amd64-linux-musl.tar.gz] [SKIPPED]
[INF] Extracting... [zsv-0.3.9-alpha-amd64-linux-musl.tar.gz] [DONE]
[INF] Downloading... [tsv-utils-v2.2.0_linux-x86_64_ldc2.tar.gz] [DONE]
[INF] Extracting... [tsv-utils-v2.2.0_linux-x86_64_ldc2.tar.gz] [DONE]
[INF] Downloading... [xsv-0.13.0-x86_64-unknown-linux-musl.tar.gz] [DONE]
[INF] Extracting... [xsv-0.13.0-x86_64-unknown-linux-musl.tar.gz] [DONE]
[INF] Running count benchmarks...
1 | zsv : real 0.04 user 0.04 sys 0.00
2 | zsv : real 0.04 user 0.04 sys 0.00
3 | zsv : real 0.04 user 0.04 sys 0.00
4 | zsv : real 0.04 user 0.04 sys 0.00
5 | zsv : real 0.04 user 0.04 sys 0.00
1 | xsv : real 0.10 user 0.09 sys 0.00
2 | xsv : real 0.10 user 0.10 sys 0.00
3 | xsv : real 0.10 user 0.09 sys 0.00
4 | xsv : real 0.10 user 0.10 sys 0.00
5 | xsv : real 0.10 user 0.09 sys 0.00
1 | tsv : real 0.02 user 0.02 sys 0.00
2 | tsv : real 0.03 user 0.01 sys 0.01
3 | tsv : real 0.02 user 0.02 sys 0.00
4 | tsv : real 0.03 user 0.01 sys 0.01
5 | tsv : real 0.03 user 0.02 sys 0.00
[INF] Running select benchmarks...
1 | zsv : real 0.15 user 0.14 sys 0.00
2 | zsv : real 0.15 user 0.14 sys 0.00
3 | zsv : real 0.17 user 0.17 sys 0.00
4 | zsv : real 0.15 user 0.15 sys 0.00
5 | zsv : real 0.15 user 0.14 sys 0.00
1 | xsv : real 0.32 user 0.31 sys 0.00
2 | xsv : real 0.31 user 0.31 sys 0.00
3 | xsv : real 0.33 user 0.31 sys 0.01
4 | xsv : real 0.32 user 0.31 sys 0.00
5 | xsv : real 0.31 user 0.30 sys 0.01
1 | tsv : real 0.15 user 0.14 sys 0.01
2 | tsv : real 0.15 user 0.14 sys 0.00
3 | tsv : real 0.15 user 0.15 sys 0.00
4 | tsv : real 0.15 user 0.14 sys 0.00
5 | tsv : real 0.15 user 0.14 sys 0.00
[INF] Generating Markdown output... [benchmarks.md]
[INF] Generated Markdown output successfully!
[INF] Generating step summary...
[INF] Generated step summary successfully!
[INF] --- [DONE] ---

Results from local runs:

$ ZSV_LINUX_BUILD_COMPILER=musl ./scripts/ci-run-benchmarks.sh
[INF] Running ./scripts/ci-run-benchmarks.sh
[INF] OS: linux
[INF] RUNS: 6
[INF] SKIP_FIRST_RUN: true
[INF] BENCHMARKS_DIR: .benchmarks
[INF] ZSV_TAG: 0.3.9-alpha
[INF] ZSV_LINUX_BUILD_COMPILER: musl
[INF] Downloading CSV file... [worldcitiespop_mil.csv] [SKIPPED]
[INF] Downloading... [zsv-0.3.9-alpha-amd64-linux-musl.tar.gz] [SKIPPED]
[INF] Extracting... [zsv-0.3.9-alpha-amd64-linux-musl.tar.gz] [DONE]
[INF] Downloading... [tsv-utils-v2.2.0_linux-x86_64_ldc2.tar.gz] [SKIPPED]
[INF] Extracting... [tsv-utils-v2.2.0_linux-x86_64_ldc2.tar.gz] [DONE]
[INF] Downloading... [xsv-0.13.0-x86_64-unknown-linux-musl.tar.gz] [SKIPPED]
[INF] Extracting... [xsv-0.13.0-x86_64-unknown-linux-musl.tar.gz] [DONE]
[INF] Running count benchmarks...
1 | zsv : real 0.04 user 0.03 sys 0.00
2 | zsv : real 0.05 user 0.05 sys 0.00
3 | zsv : real 0.04 user 0.04 sys 0.00
4 | zsv : real 0.05 user 0.04 sys 0.00
5 | zsv : real 0.05 user 0.04 sys 0.00
1 | xsv : real 0.08 user 0.07 sys 0.01
2 | xsv : real 0.08 user 0.08 sys 0.00
3 | xsv : real 0.08 user 0.06 sys 0.01
4 | xsv : real 0.08 user 0.08 sys 0.00
5 | xsv : real 0.08 user 0.08 sys 0.00
1 | tsv : real 0.08 user 0.09 sys 0.00
2 | tsv : real 0.09 user 0.08 sys 0.00
3 | tsv : real 0.08 user 0.07 sys 0.00
4 | tsv : real 0.08 user 0.08 sys 0.00
5 | tsv : real 0.08 user 0.07 sys 0.01
[INF] Running select benchmarks...
1 | zsv : real 0.11 user 0.11 sys 0.00
2 | zsv : real 0.13 user 0.13 sys 0.00
3 | zsv : real 0.11 user 0.09 sys 0.01
4 | zsv : real 0.11 user 0.10 sys 0.00
5 | zsv : real 0.11 user 0.10 sys 0.00
1 | xsv : real 0.37 user 0.37 sys 0.00
2 | xsv : real 0.37 user 0.36 sys 0.00
3 | xsv : real 0.36 user 0.35 sys 0.00
4 | xsv : real 0.36 user 0.35 sys 0.00
5 | xsv : real 0.36 user 0.34 sys 0.01
1 | tsv : real 0.11 user 0.10 sys 0.00
2 | tsv : real 0.11 user 0.11 sys 0.00
3 | tsv : real 0.11 user 0.11 sys 0.00
4 | tsv : real 0.11 user 0.10 sys 0.00
5 | tsv : real 0.10 user 0.09 sys 0.01
[INF] Generating Markdown output... [benchmarks.md]
[INF] Generated Markdown output successfully!
[INF] --- [DONE] ---

So, as observed earlier, the musl libc implementation of memcpy adds latency.
Apparently, there's room for improvement and this may further be optimized by using vector instructions if need be.

@iamazeem iamazeem mentioned this pull request Nov 9, 2024
@liquidaty
Copy link
Owner

@iamazeem that is very interesting thank you. How does the memcpy change impact the benchmarks on other OSes? Should this change be isolated to musl builds? If not, would it be advantageous to replace all instances of memcpy in the entire codebase?

@iamazeem
Copy link
Collaborator Author

How does the memcpy change impact the benchmarks on other OSes?

Ran some benchmarks with the same change for Linux and macOS builds.

Apparently, this change only affects the musl build significantly.
Other builds seem to perform better with their own versions.

Should this change be isolated to musl builds?

Yes, apparently, looks like it.

If not, would it be advantageous to replace all instances of memcpy in the entire codebase?

Without measuring, it's hard to say.
I'll perform some tests to find out whether it shows any significant improvements or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants