Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy operations are (relatively) slow #1685

Closed
valoq opened this issue Apr 9, 2024 · 6 comments · Fixed by #1749
Closed

Copy operations are (relatively) slow #1685

valoq opened this issue Apr 9, 2024 · 6 comments · Fixed by #1749

Comments

@valoq
Copy link
Contributor

valoq commented Apr 9, 2024

It takes my setup about 1 second to copy a 1GB file using "cp"
But the same operation takes about 5 seconds in lf.

After digging through the code I am still unsure what causes this issue.

@MahouShoujoMivutilde
Copy link
Contributor

Lf copies a file in relatively small chunks of fixed size (4KB)

lf/copy.go

Lines 51 to 86 in ce12116

buf := make([]byte, 4096)
r, err := os.Open(src)
if err != nil {
return err
}
defer r.Close()
w, err := os.OpenFile(dst, os.O_RDWR|os.O_CREATE|os.O_TRUNC, dst_mode)
if err != nil {
return err
}
for {
n, err := r.Read(buf)
if err != nil && err != io.EOF {
w.Close()
os.Remove(dst)
return err
}
if n == 0 {
break
}
if _, err := w.Write(buf[:n]); err != nil {
return err
}
nums <- int64(n)
}
if err := w.Close(); err != nil {
os.Remove(dst)
return err
}

which could be suboptimal.

In an ideal world it would use io.Copy() (which could benefit from copy_file_range() to speed things up, allowing e.g. CoW fs reflinks, samba server side file copy, etc) instead of a custom implementation (which exists for a reason ⬇️).

However, it's also nice to have a % progress of an ongoing copy, which io.Copy() doesn't provide an easy way* to implement while keeping full scope of its benefits.

Increasing it to 4MB decreases the copy duration by about 30%.

* - at least I haven't found any in 2021.

@valoq
Copy link
Contributor Author

valoq commented Jun 11, 2024

Would this approach provide a possible solution:
https://hjr265.me/blog/tracking-io-copy-progress-in-go/

@MahouShoujoMivutilde
Copy link
Contributor

I don't think so, staticcheck gives me a bunch of type errors and an infinite recursive call.

But let's assume it's a skill issue on my part and/or author forgot to copy the working version to his blog, and it actually does what it's supposed to.

Most likely it would just use a default 32KB buffer of io.Copy, because you're no longer sending os.File in.

So different implementation, but no big difference, no accelerated copy methods used, which is where the actual speed comes from.

Look, buffer can be increased even right now, I personally think it's quite conservative - aside from a small text file, what will even fit in 4KB in our days? But I don't think it will help a lot - as I've said above, even with 4 MB it's only -30% time to copy, not five times difference.

@valoq
Copy link
Contributor Author

valoq commented Jun 12, 2024

Basically what we would need to get the same speed as with the cp command, is a change in io.Copy() itself, is that correct?

Alternatively there could be a copy mode that does not show progress but benefits from io.Copy() as it is?!

@MahouShoujoMivutilde
Copy link
Contributor

TL;DR: make buf = 32*1024, it'll get you to cp speed (proof below).

(but it will not make really cool shit like reflinks or server side copy work, hence my rant about io.Copy()).

If you don't care about the progress at all, you could just implement your own copy function in lf config.

Here and below are few examples for various use cases of how to do that.

io.Copy() is fine and already fast as it is, it just makes it difficult to expose the progress without losing the benefits.

However, I've just checked:

  • It takes 12s to copy a 4.6GB file via cp --reflink=never (it would take 0s with always, thanks CoW)
  • and 20s with simplified (no progress) buffered copy like used in lf on my machine.
  • however takes only 13s if I increase the buffer size to 32*1024 bytes (io.Copy() default). So again, 20s -30% = 14s, close enough.

Does it hold within lf?

But one thing I've originally dismissed as "probably fine and asynchronous anyway" is UI.

During a file copy a number of bytes being read (EACH read) is sent to nums channel to update UI, and it's buffered for up to 1024 elements, which is later sent to nav.copyBytesChan, which is also 1024. It doesn't do anything complicated, it's just summing it as nav.copyBytes (bytes copied of a file being copied now). But every 1024 element it gets it calls a UI update.

lf/app.go

Lines 333 to 339 in 0c72034

case n := <-app.nav.copyBytesChan:
app.nav.copyBytes += n
// n is usually 4096B so update roughly per 4096B x 1024 = 4MB copied
if app.nav.copyUpdate++; app.nav.copyUpdate >= 1024 {
app.nav.copyUpdate = 0
app.ui.draw(app.nav)
}

Findings

  • 32KB buffer size even within lf yields the same copy speed as cp --reflink=never, taking ~30% less time to copy than with current lf's 4KB.
  • Channel size is not an any significant bottleneck here, 1024 is fine, longer channels fill up to ~28k items max, but don't seem to impact performance much if at all.
  • UI is async enough to not influence a copy significantly. Good.

Read the process below, if you care.


I've added a simple timer to measure how long those draws take:

case n := <-app.nav.copyBytesChan:
	app.nav.copyBytes += n
	// n is usually 4096B so update roughly per 4096B x 1024 = 4MB copied
	if app.nav.copyUpdate++; app.nav.copyUpdate >= 1024 {
		app.nav.copyUpdate = 0
		start := time.Now()
		app.ui.draw(app.nav)
		log.Printf("`app.ui.draw(app.nav)` took %s", time.Since(start))
	}

And for the same file, median time to draw was 2.7ms, mean 4.5ms, 5s total time spent drawing (but that's not accounting for channels).

So how much buffering in channels help?

Adding len(app.nav.copyBytesChan) to the above log reveals that it mostly stays in 100, unless you start to actively use lf, like e.g. select some image and let the previewer script do its job, then it instantly fills to 1024 and starts to take 20ms+ for each draw (meaning nums is full too at this point).

What I think might help here is that we either make stupid long channels (It's int64, who cares, even if we make it 1000 times the size, it'll be 8MB per channel max, so 16MB total max) or do some progressively less frequent updates (e.g. every 4MB, every 8MB, every 12MB, etc).

Making both channels 1024*1024 long, I get maximum saturation per draw as about 28k, so nowhere near full capacity. (But time to draw still goes up in use).

So does it actually help with copy speed?

copyFile() took 20.872228571s; nums and app.nav.copyBytesChan as 1048576 - actively using lf, preview.
copyFile() took 21.044714459s; nums and app.nav.copyBytesChan as 1048576 - no preview at all, not touching it.

copyFile() took 21.211875279s; nums and app.nav.copyBytesChan as 1024 - no preview, not using
copyFile() took 21.866803991s; nums and app.nav.copyBytesChan as 1024 - preview....

FUCK.

Okay so longer channels and having a preview redraw updates doesn't matter as much as I thought, maybe buf at 32KB will help.

copyFile() took 20.887190009s; nums set as copyBytesChan as 1024; buf = 4096; update every 1024 item - baseline, no preview
copyFile() took 13.615526607s; nums set as copyBytesChan as 1024; buf = 32768; update every 1024 item - bigger buffer, no preview.

copyFile() took 13.110737526s; nums set as copyBytesChan as 1024; buf = 32768; update every 1024 item - bigger buffer, preview.

Okay, so bigger buffer gets you about the same time as cp --reflink=never.

But here is the question:

Does bigger buffer itself help, or does it help because it means that instead of every draw at 1024*buf (which was 4096K = 4MB) it now draws every 32KB*1024 = 32MB, so instead of ~1099 draw calls this file is now just ~138?

copyFile() took 20.900660492s; nums set as copyBytesChan as 1024; buf = 4096; update every 8192 item - no preview, default buffer, same period (as in how much of a file is read before update) of update as it would be for 32KB buffer.
copyFile() took 13.327280577s; nums set as copyBytesChan as 1024; buf = 32768; update every 8192 item - no preview, bigger buffer, larger delay

Okay, buffer is helpful, the rest isn't. But can we do better?

copyFile() took 12.157263234s; nums set as copyBytesChan as 1024; buf = 131072; update every 1024 item - probably within a margin of error.
copyFile() took 12.22640922s; nums set as copyBytesChan as 1024; buf = 524288; update every 1024 item - again.
copyFile() took 14.207558531s; nums set as copyBytesChan as 1024; buf = 4194304; update every 1024 item - not better.

So 32KB is a sweet spot.

Here is the patch I've used to get those results
diff --git a/app.go b/app.go
index a95b180..3e5a561 100644
--- a/app.go
+++ b/app.go
@@ -333,9 +333,15 @@ func (app *app) loop() {
 		case n := <-app.nav.copyBytesChan:
 			app.nav.copyBytes += n
 			// n is usually 4096B so update roughly per 4096B x 1024 = 4MB copied
-			if app.nav.copyUpdate++; app.nav.copyUpdate >= 1024 {
+			if app.nav.copyUpdate++; app.nav.copyUpdate >= updatePeriodItems {
 				app.nav.copyUpdate = 0
+				// start := time.Now()
 				app.ui.draw(app.nav)
+				// log.Printf(
+				// 	"`len(app.nav.copyBytesChan) = %d; app.ui.draw(app.nav)` took %s",
+				// 	len(app.nav.copyBytesChan),
+				// 	time.Since(start),
+				// )
 			}
 		case n := <-app.nav.copyTotalChan:
 			app.nav.copyTotal += n
diff --git a/copy.go b/copy.go
index cec83c2..01e1ce6 100644
--- a/copy.go
+++ b/copy.go
@@ -3,14 +3,22 @@ package main
 import (
 	"fmt"
 	"io"
+	"log"
 	"os"
 	"path/filepath"
 	"strconv"
 	"strings"
+	"time"
 
 	"github.com/djherbis/times"
 )
 
+const (
+	numsChanSize      = 1024
+	copyBufSize       = 32 * 1024
+	updatePeriodItems = 1024
+)
+
 func copySize(srcs []string) (int64, error) {
 	var total int64
 
@@ -36,6 +44,7 @@ func copySize(srcs []string) (int64, error) {
 }
 
 func copyFile(src, dst string, preserve []string, info os.FileInfo, nums chan int64) error {
+	start := time.Now()
 	var dst_mode os.FileMode = 0o666
 	preserve_timestamps := false
 	for _, s := range preserve {
@@ -47,7 +56,7 @@ func copyFile(src, dst string, preserve []string, info os.FileInfo, nums chan in
 		}
 	}
 
-	buf := make([]byte, 4096)
+	buf := make([]byte, copyBufSize)
 
 	r, err := os.Open(src)
 	if err != nil {
@@ -94,11 +103,19 @@ func copyFile(src, dst string, preserve []string, info os.FileInfo, nums chan in
 		}
 	}
 
+	log.Printf(
+		"copyFile() took %s; nums set as copyBytesChan as %d; buf = %d; update every %d item",
+		time.Since(start),
+		numsChanSize,
+		copyBufSize,
+		updatePeriodItems,
+	)
+
 	return nil
 }
 
 func copyAll(srcs []string, dstDir string, preserve []string) (nums chan int64, errs chan error) {
-	nums = make(chan int64, 1024)
+	nums = make(chan int64, numsChanSize)
 	errs = make(chan error, 1024)
 
 	go func() {
diff --git a/nav.go b/nav.go
index ba9b2c3..b9257ea 100644
--- a/nav.go
+++ b/nav.go
@@ -577,7 +577,7 @@ func (nav *nav) getDirs(wd string) {
 
 func newNav(height int) *nav {
 	nav := &nav{
-		copyBytesChan:   make(chan int64, 1024),
+		copyBytesChan:   make(chan int64, numsChanSize),
 		copyTotalChan:   make(chan int64, 1024),
 		moveCountChan:   make(chan int, 1024),
 		moveTotalChan:   make(chan int, 1024),

@MahouShoujoMivutilde
Copy link
Contributor

MahouShoujoMivutilde commented Jun 12, 2024

But how does it affect copying a directory with a bunch of smaller files (e.g. 4.4GB of pictures)?

copyAll() took 28.392770906s; nums set as copyBytesChan as 1024; buf = 4096; update every 1024 item - default

copyAll() took 20.155220027s; nums set as copyBytesChan as 1024; buf = 32768; update every 1024 item - 32kb

Again, about -30% time.

For reference, cp --reflink=never of the same directory is 16.9s

MahouShoujoMivutilde added a commit to MahouShoujoMivutilde/lf that referenced this issue Jun 12, 2024
This makes `:paste` comparable in performance to `cp --reflink=never`
for large files.

For large number of small files improvements are less substantial
(compared to `cp -r --reflink=never`), though still noticeable.

In both cases the copy takes about 30% less time than with `buf` size at 4096.

32KB is the same number `io.Copy()` uses internally (when it can), and is about where improvements stop.

Context:
gokcehan#1685 (comment)
joelim-work pushed a commit that referenced this issue Jun 14, 2024
* Use 32KB buffer for `copyFile()` (reduces copy time by 30%)

This makes `:paste` comparable in performance to `cp --reflink=never`
for large files.

For large number of small files improvements are less substantial
(compared to `cp -r --reflink=never`), though still noticeable.

In both cases the copy takes about 30% less time than with `buf` size at 4096.

32KB is the same number `io.Copy()` uses internally (when it can), and is about where improvements stop.

Context:
#1685 (comment)

* Apply suggested changes

* Just delete incomplete an file on all errors

* Mention new progress UI update frequency

* Return the old (each 4MB copied) progress update frequency

(progress UI updates don't seem to impact performance much)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants