tee: increase buf size for large input#11441
Conversation
|
The existing code is documented to use same |
|
GNU coreutils uses |
|
Optimal value might different at each PC. 64*1024 was best for me. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This measures throughput in the most ideal scenario for a large buffer, we should ensure performance does not regress in other cases where output is small. |
|
Time for small input is bounded. But I could use smaller buf only for 1st cycle. |
|
Please test something like: |
This comment was marked as outdated.
This comment was marked as outdated.
6897cbc to
0a8e582
Compare
|
I added a code path for smaller input. There is variance at Both of win and lose happen at |
7c43c1f to
dac8b97
Compare
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
needs to be rebased |
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
@oech3 Could you quantify the performance improvement from the additional code path to justify the added complexity? |
|
This PR was opened for larger files. Mostly 0 perf change for small files. |
|
Please add a benchmark in a different PR |
|
GNU testsuite comparison: |
|
Is it equivalint with ? |
|
How to use stdin at benchmark? |
|
Would you suggest how to use stdin at CodSpeed? |
|
We don't have bench for stream yet. Was this really OK to merge? |
|
you did some testing, i trust you :) |
|
This broke on my Gentoo system for some reason: Reproducergentoo-reproducer.bats#!/usr/bin/env bats
good_commit="60d448be4825e5e1a4ebbb2cbb74f562a9c899ea"
bad_commit="13fb3bede8185bfe12d0579027dfd6ab13793185"
setup() {
[ -d /var/db/repos/gentoo/dev-lang/rust-bin ] || emerge-webrsync
command -v rustc || emerge -q1 dev-lang/rust-bin
command -v git || emerge -q1 dev-vcs/git
keyword_file="/etc/portage/package.accept_keywords/uutils"
[ -e "$keyword_file" ] || printf "%s **\n" sys-apps/uutils-coreutils > "$keyword_file"
[ -e /usr/bin/gnu-tee ] || cp -afv /usr/bin/tee /usr/bin/gnu-tee
}
teardown() {
echo "Restoring GNU tee"
cp -afv /usr/bin/gnu-tee /usr/bin/tee
}
@test "uucore: add missing fs feature to rustix $good_commit" {
env EGIT_OVERRIDE_COMMIT_UUTILS_COREUTILS="$good_commit" emerge -q1 uutils-coreutils
cp -afv /usr/bin/uu-tee /usr/bin/tee
emerge -v1 sys-apps/portage
}
@test "tee: increase buf size for large input $bad_commit" {
env EGIT_OVERRIDE_COMMIT_UUTILS_COREUTILS="$bad_commit" emerge -q1 uutils-coreutils
cp -afv /usr/bin/uu-tee /usr/bin/tee
emerge -v1 sys-apps/portage
}Run it like: docker run -it --rm --pull always -v $PWD/gentoo-reproducer.bats:/gentoo-reproducer.bats gentoo/stage3 bash -x -c 'cat /gentoo-reproducer.bats && emerge-webrsync && emerge -q1 dev-util/bats && bats --verbose-run --trace --formatter tap /gentoo-reproducer.bats'Output |
|
Would you provide a minimized step for breakage?
I don't have Gentoo and container.
|
|
I don't know why yet. I'll try to find out. I made a wrapper like this at /usr/bin/tee: The last thing it ran was: I don't know what was on the other end of the pipe to cause it to break though. I'll try to find it. |
|
Claude was able to spot the issue: The sanity check it came up with: (for i in $(seq 5); do echo "line $i"; sleep 0.1; done) \
| ./target/release/coreutils tee /tmp/portage_log_test.log
line 1With Busybox tee: (for i in $(seq 5); do echo "line $i"; sleep 0.1; done) \
| busybox tee /tmp/portage_log_test.log
line 1
line 2
line 3
line 4
line 5 |
| // flush the buffer to comply with POSIX requirement that | ||
| // `tee` does not buffer the input. | ||
| output.flush()?; | ||
| return Ok(len); |
There was a problem hiding this comment.
maybe, this return caused it
taskset -c 1 yes|taskset -c 2 target/release/tee|taskset -c 3 pv>/dev/nullThis PR: [3.83GiB/s]
gnu58d88d243/tee: [2.83GiB/s]
closes #11432