Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance on clamp #179

Closed
johnnychen94 opened this issue Mar 19, 2020 · 9 comments
Closed

performance on clamp #179

johnnychen94 opened this issue Mar 19, 2020 · 9 comments

Comments

@johnnychen94
Copy link
Collaborator

johnnychen94 commented Mar 19, 2020

This is expected, but I think it can be optimized.

julia> @btime clamp!($(rand(N0f8, 512, 512)), 0.1, 0.9);
  813.628 μs (0 allocations: 0 bytes)

julia> @btime clamp!($(rand(N0f8, 512, 512)), 0.1N0f8, 0.9N0f8);
  628.583 μs (0 allocations: 0 bytes)

julia> @btime clamp.($(rand(N0f8, 512, 512)), 0.1, 0.9);
  691.843 μs (3 allocations: 2.00 MiB)

julia> @btime clamp.($(rand(N0f8, 512, 512)), 0.1N0f8, 0.9N0f8);
  1.146 ms (3 allocations: 256.19 KiB)

Ref: observed in JuliaImages/ImageContrastAdjustment.jl#28 (comment)

@kimikage
Copy link
Collaborator

kimikage commented Mar 19, 2020

I don't understand what the benchmarks above are intended for.:confused:

Is this what you mean?

julia> @btime clamp!($(zeros(N0f8, 512, 512)), 0.1, 0.9);
  505.900 μs (0 allocations: 0 bytes)

julia> @btime clamp!($(zeros(N0f8, 512, 512)), 0.1N0f8, 0.9N0f8);
  114.299 μs (0 allocations: 0 bytes)

julia> @btime clamp.($(zeros(N0f8, 512, 512)), 0.1, 0.9);
  687.300 μs (2 allocations: 2.00 MiB)

julia> @btime clamp.($(zeros(N0f8, 512, 512)), 0.1N0f8, 0.9N0f8);
  179.200 μs (2 allocations: 256.14 KiB)

Yes, I also think it can be optimized (specialized) as follows:

Base.clamp(x::X, lo::X, hi::X) where X<:FixedPoint = X(clamp(x.i, lo.i, hi.i), 0)

However, the bottleneck should be identified first. I don't think floating-point conversions are "too" slow. They are inherently expensive.

@johnnychen94
Copy link
Collaborator Author

johnnychen94 commented Mar 19, 2020

I was thinking a specification like this:

Base.clamp(x::X, lo, hi) where X<:FixedPoint = clamp(x, X(lo), X(hi))

I opened this as a potential issue tracker and haven't spent some time playing with it, feel free to close it if you think this makes non-sense.

@kimikage
Copy link
Collaborator

kimikage commented Mar 19, 2020

clamp! can be optimized, but I don't want to do it because there are many "in-place" variants.
clamp must not be changed as you suggested. FixedPoint is not a pixel type (e.g. Gray{N0f8}).

feel free to close it if you think this makes non-sense.

This issue clearly makes sense, but I don't know what you want.

@kimikage
Copy link
Collaborator

kimikage commented Mar 20, 2020

TBH, I had believed that the optimization for clamp(x::X, lo::X, hi::X) where X<:FixedPoint was done by the compiler (LLVM backend). I think it is worth introducing.:rocket:

@johnnychen94
Copy link
Collaborator Author

johnnychen94 commented Mar 20, 2020

Oh, I wrote this done as a quick note to myself and I didn't spend enough time playing with it, I'm sorry I didn't make it clear.

This is expected, but I think it can be optimized.

I was meant to say "I believe this performance gap is expected, but I think there're merits to proactively convert lo and hi to FixedPoint type because I believe the general philosophy is to not care about storage type since it's hard to intuit the types in practice, and when we try to tweak performance by converting/promoting types, there's a code smell to me.

I'm in a bit of a hurry catching up with my own ddls, I'll post the update later if I find anything worth optimizing.

clamp! can be optimized, but I don't want to do it because there are many "in-place" variants.

I don't quite understand the reasoning behind this, can you clarify it a bit? Thanks. My understanding is that in-place functions in Base are worth extending.

@kimikage
Copy link
Collaborator

kimikage commented Mar 20, 2020

clamp! can be optimized, but I don't want to do it because there are many "in-place" variants.

I don't quite understand the reasoning behind this, can you clarify it a bit?

The specialization of many "in-place" methods requires much labor and offers little benefit.

I believe the general philosophy is to not care about storage type

I agree with you. Therefore, I think clamp should be used when we can use clamp instead of clamp!. When we use clamp!, we must "definitely" pay attention to the storage type! The performance issues above are the result of "carelessness".:confused:

Also, if we follow the philosophy, clamp MUST NOT convert the "return" type.

@timholy
Copy link
Member

timholy commented Mar 20, 2020

Certainly the specialization in #179 (comment) makes sense. The one in #179 (comment) is much more dangerous, since currently clamp(x, -Inf, Inf) succeeds for all FixedPoint values but that definition would throw an error since Inf does not have a representation as FixedPoint. We could, however, define clamp_fast and make @fastmath do semantic replacement (this would presumably require a PR to Base). It might take a little work to get it to hoist the conversion for broadcasting, though.

Cases like @fastmath clamp.(A * B, lo, hi) would be trouble, though, if A and B don't have the same element type. This could get complex pretty quickly. The most robust approach is to pass lo and hi of the correct type manually.

@kimikage
Copy link
Collaborator

I merged PR #194. If we try to improve it further I will keep this issue open, otherwise I will close this issue.

@johnnychen94
Copy link
Collaborator Author

Thanks for working on this. #194 looks like a good-enough improvement to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants