Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pcloud] OpenWriterAt: Default values for multi threaded upload #8225

Open
gfelbing opened this issue Dec 3, 2024 · 5 comments
Open

[pcloud] OpenWriterAt: Default values for multi threaded upload #8225

gfelbing opened this issue Dec 3, 2024 · 5 comments

Comments

@gfelbing
Copy link
Contributor

gfelbing commented Dec 3, 2024

The associated forum post URL from https://forum.rclone.org

Follow up on #8147

What is your current rclone version (output from rclone version)?

Latest master

What problem are you are trying to solve?

As @ncw described it:

We don't have a way currently for a backend to give a hint as to what chunk size it prefers for OpenWriterAt only for OpenChunkWriter. It would be easy enough to make one though and I think it is a good idea.

How do you think rclone should be changed to solve that?

As far as I see there are two approaches:

  1. Add a way for backends to give hints for buffer size etc. in OpenWriterAt. (Similar to ChunkWriterInfo for OpenChunkWriter)
  2. Implement OpenChunkWriter in pcloud.

IMHO 1/ is more invasive to the code base but cleaner and consistent with the ChunkWriter. Your thoughts @ncw?

How to use GitHub

  • Please use the 👍 reaction to show that you are affected by the same issue.
  • Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
  • Subscribe to receive notifications on status change and new comments.
@ncw
Copy link
Member

ncw commented Dec 5, 2024

I think option 1 is probably the best, but how to do it neatly...

There are some related problems with multi thread transfers. For example the S3 backend has --s3-upload-cutoff which is used to switch between single part and multi part, but under certain conditions the --multi-thread-cutoff flag is used instead. This is super confusing to users.

The same situation happens with concurrency, and now here with chunk size.

So maybe what we need is an info struct with these 3 things in. This could maybe be part of the Features() struct, not sure.

We need to know whether these are set or not (to use the global defaults) so they need to default to an invalid value. Zero is probably ok.

I think that would solve your problem and also some other ones which have been annoying the users!

We would need to add a flag to the pcloud backend so it could be independently controlled.

How does that sound?

@gfelbing
Copy link
Contributor Author

gfelbing commented Dec 7, 2024

Hey @ncw,
I thought a bit on that: The idea to streamline the arguments sounds really good, especially cases like you described with S3's upload cutoff.
One thing that I see differently is the part with the additional options for the backend:

  • that could bring again confusion to the users, similar to your example
  • we need to come up with a logic of what settings is winning: option default vs global default, option setting vs global setting. That again might be confusing.

What I outlined in PR #8226 is that the backend would only provide it's own default for the global option.
That way, the process is simpler, e.g. the chosen option is the first valid of

  • the option chosen by the user
  • the option suggested by the backend
  • the global default

What do you think?

@ncw
Copy link
Member

ncw commented Dec 8, 2024

That way, the process is simpler, e.g. the chosen option is the first valid of

  • the option chosen by the user
  • the option suggested by the backend
  • the global default

That is a good way of thinking about it - how to explain to the user what is happening. There is some complication in the above though...

  • the option chosen by the user - as set by --multi-thread-chunk-size, --multi-thread-cutoff, --multi-thread-streams
  • the option suggested by the backend - this can also be set by the user as --s3-chunk-size, --s3-upload-cutoff, --s3-upload-concurrency
  • the global default - these are the default values of the --multi-thread* variables

There are two ways we can do multi thread copies

  1. OpenWriterAt
  2. OpenChunkWriter

Currently with OpenChunkWriter the backend specifies a chunk size and that gets used. Due to the way that protocol works, that is probably the only way it can be done. The cutoff and the streams are a bit of a mess though.

Currently with OpenWriterAt the backend has no say at all and the values of the --multi-thread* variables are used.

My concern in the above scheme is that it that the backend will always specify a value for chunk size, cutoff and concurrency most likely so I think we need something a bit more complicated

  • the option chosen by the user - as set by --multi-thread-chunk-size, --multi-thread-cutoff, --multi-thread-streams - if and only if this is changed from the default value
  • the option suggested by the backend - this can also be set by the user as --s3-chunk-size, --s3-upload-cutoff, --s3-upload-concurrency
  • the global default - these are the default values of the --multi-thread* variables

Detecting whether a value has been changed from the default isn't particularly easy with the rclone config system unfortunately - the config system provides either the value the user set or a default. To detect whether it is not the default will require a bit of extra work, expecially considering the defaults can be set with environment variables.

Another alternative would be

  • We use the maximum value of as set by the user and the backend, eg
    • max(--multi-thread-chunk-size, --s3-chunk-size)
    • max(--multi-thread-cutoff, --s3-upload-cutoff)
    • max(--multi-thread-streams, --s3-upload-concurrency)

That would be much easier to implement but doesn't let the user lower values easily.

@gfelbing
Copy link
Contributor Author

gfelbing commented Dec 8, 2024

Agree, the max option would be easiest to implement but not being able to lower values can easily be a really bad thing for users. Also quite surprising, I've never seen a cli acting like that.

My take was actually to remove the option from the backend (e.g. --s3-chunk-size) entirely.
That would leave us with 3 options that can easily hierarchically ordered:

  • user chosen value
  • suggestion by backend
  • global default

Only catch is that we need to specify whether the option was set or not. As you already said, it's not exactly convenient but possible. We could make it code wise a bit nicer by abstracting a function for it though.

On the other hand, we could also turn it around by removing the global option and solely relying on the backends to offer an option. Then it would also be easy implementation wise:

  • add an option with a default suitable for the backend
  • return the values via info struct on openwriterat

The configuration would then be the first of

  • option set by user
  • default of the option specific for backend

You can hardly go simpler.

@ncw
Copy link
Member

ncw commented Dec 14, 2024

Agree, the max option would be easiest to implement but not being able to lower values can easily be a really bad thing for users. Also quite surprising, I've never seen a cli acting like that.

Agreed

My take was actually to remove the option from the backend (e.g. --s3-chunk-size) entirely. That would leave us with 3 options that can easily hierarchically ordered:

  • user chosen value
  • suggestion by backend
  • global default

Only catch is that we need to specify whether the option was set or not. As you already said, it's not exactly convenient but possible. We could make it code wise a bit nicer by abstracting a function for it though.

I don't think we can remove the backend options. They've been around for a long time and for most of that time were the only way of controlling multipart copies

On the other hand, we could also turn it around by removing the global option and solely relying on the backends to offer an option. Then it would also be easy implementation wise:

  • add an option with a default suitable for the backend
  • return the values via info struct on openwriterat

The configuration would then be the first of

  • option set by user
  • default of the option specific for backend

You can hardly go simpler.

I like this idea better.

This would effectively make the --multi-thread-* options obsolete. We can't remove them though (backwards compatibility) but they could make a warning.

I think it is better to be able to get the info struct separately from OpenWriterAt though because committing to calling it means it is too late to decide on the cutoff.

I'll have a noodle with some code and see if it gives me any ideas!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants