Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP/RFC/Scary: allow signed Normed numbers #143

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

timholy
Copy link
Member

@timholy timholy commented Nov 27, 2019

This is a partial implementation of a speculative change, allowing signed variants of Normed. For example, S7f8 would represent the range -128.502 to 128.498; like N0f8 it is still scaled by 255, but also has 1 sign bit and 7 bits to store absolute values bigger than 1.

The motivation here is to support differences, related to #126, #41, and coming up with a good accumulator type in CheckedArithmetic.

Speculatively, we could use S7f8 for subtraction (and maybe addition) with N0f8. Pros: computing differences among pixels in images is a common operation, and it would be nice for this to be less of a gotcha than it is currently. Cons: this is against convention in the Julia ecosystem, and in opposition to #27. Thoughts? I'd especially like to hear from @vchuravy in addition to other recent contributors.

@kimikage
Copy link
Collaborator

kimikage commented Nov 27, 2019

Wow, this is a great challenge!
I agree with the aim and concept, but this is a dramatic change, you know. So, I cannot immediately determine whether the benefits of generalization exceed those of specialization.

From a practical point of view, since N0f8 cannot be converted to 8-bit signed Normed, I don't think it has much merit. For addition / subtraction, N8f8 can be used instead of S7f8, and Float32 is suitable for many other operations.
Of course, I'll do all I can for this challenge.

Edit:
Perhaps adding SNormed{<:Signed} instead of Normed{<:Signed} may alleviate my anxiety.

@timholy
Copy link
Member Author

timholy commented Nov 27, 2019

From a practical point of view, since N0f8 cannot be converted to 8-bit signed Normed, I don't think it has much merit... For addition / subtraction, N8f8 can be used instead of S7f8

Not entirely. I don't entirely subscribe to what follows, but for the purposes of argument let me make the extreme case. Consider

julia> 0x01 - 0x02
0xff

If magnitude matters, it's bad when smaller - small = big. This is much more serious a problem than 0xff + 0x01 == 0x00, because it's much more important to be able to accurately measure numbers close to 0 than it is to typemax(T). (We often use small integers on a 64-bit machine, but rarely do we use integers close to ±2^63.) A very common operation is to look at the difference between images, e.g., to detect changes in a scene, and the default arithmetic rules here make this sufficiently fraught that newbies are almost guaranteed to write buggy code. Somewhat along the lines of https://stackoverflow.com/questions/10168079/why-is-size-t-unsigned, one could declare that one might ban unsigned types from arithmetic altogether.

With this line of thinking, there are two options: (1) deliberately lose one bit of precision, and for an 8-bit image demote it to 7-bit and then use the extra bit as a sign bit (so that 8-bit images get loaded as S0f7); (2) for any arithmetic operation involving unsigned types, immediately go to the next wider signed type. In option (2), when doing any mathematics at all, we promote Nmfn (where m+n is 8, 16, or 32) to S(2m+n-1)fn so that we get a sign bit with no loss of precision; as a bonus we can also store larger values, but that's not really the main goal. Note that arithmetic with Spfq (where q + q is 7, 15, or 31) does not promote, consistent with the rest of Julia. This is just an unsigned-types-are-bad-news decision, not a decision to widen all arithmetic operations.

Again, I'm making the "extreme" case to show why this is worth considering. A more balanced view would point out two important negatives:

  • it's different from the rest of Julia
  • when applied to arrays, it causes the memory requirement to double

The latter is a problem, but the fact that it's limited to a doubling (and not, say, 8x) and that RAM is cheap means that I don't think the latter is fatal. I think the former is the more serious disadvantage of such a change.

If you ask me, what is my honest opinion, I am not sure. I support the fact that operations with UInt8 return UInt8. I am less sure that I support operations with N0f8 returning N0f8; to me it seems that no one chooses UInt8 without thinking about it, but for image processing I bet most users don't choose an element type at all, they take whatever arrives when you load the image from disk.

@timholy
Copy link
Member Author

timholy commented Nov 27, 2019

We should make sure that major contributors to/watchers of JuliaImages are aware of this discussion: CC @kmsquire, @rsrock, @SimonDanisch, @Evizero, @tejus-gupta, @annimesh2809, @zygmuntszpak, @RalphAS, @Tokazama, @juliohm, @bjarthur, @ianshmean, @jw3126, @tlnagy. Sorry for the spam (and sorry if I omitted someone who might want to know).

@johnnychen94
Copy link
Collaborator

johnnychen94 commented Nov 27, 2019

I'm skeptical of its practical usage.

I am less sure that I support operations with N0f8 returning N0f8

unless all operations are done under N0f8 without conversion, which is impossible as far as I can see, this feature is of less practical usage. A more common situation at present is:

# not returning N0f8 for N0f8 input
img_in = load(imgfile) # N0f8
img_temp = foo1(img_in) # N0f8->Float32
img_out = foo2(img_temp) # Float32
img_out = n0f8(img_out) # Float32->N0f8

Note that there're only two conversions.

But if it became:

# returning N0f8 for N0f8 input
img_in = load(imgfile) # N0f8
img_temp = foo1(img_in) # N0f8->Float32->N0f8
img_out = foo2(img_temp) # N0f8->Float32->N0f8

There're four now, where two of them are unnecessary. If I understand it right, what you're trying to do here is purely replacing Float32 with signed normed numbers. If it's about saving memory usage, Float16 might be a better choice (although the current performance is said poor).

I once opened an issue related to what I said here (preserving type): JuliaImages/ImageCore.jl#82.

@vchuravy
Copy link
Collaborator

It's been a while since I deeply thought about FixedPointNumbers and the difference between Fixed and Normed.

Cons: this is against convention in the Julia ecosystem, and in opposition to #27.

I see there are two orthogonal concerns.

  1. Is a number normed or fixed point. This is a subtle concern that tripped me up in the past when trying to unify these two concepts. Both have fractional bits, but the meaning differs slightly.

  2. Is there a sign bit.

From a representational perspective I don't think there is any strong reason to not have signed normed numbers, although looking at my comment history I once remarked that I had no longer a strong reason for having unsigned fixed point numbers.

The normed numbers always had an odd place in the ecosystem, since I see them primarily as storage numbers and not as computational numbers.
Then there is the question of semantics, which normed numbers mostly derive from what is needed in the Color ecosystem and as far as I can tell if
one does computation on them, one might want to do saturating computation or one needs to change representation. But the semantics we have for Normed is overflow/underflow and that makes the least sense in image calculations.

In conclusion I don't have strong opinions and only existential questions. It might make sense to divorce fixed and normed and make normed an explicit color representational/computational type instead of having to worry about general semantics.

@timholy
Copy link
Member Author

timholy commented Nov 27, 2019

From @johnnychen94 :

A more common situation at present is:

# not returning N0f8 for N0f8 input
img_in = load(imgfile) # N0f8
img_temp = foo1(img_in) # N0f8->Float32

and @vchuravy:

The normed numbers always had an odd place in the ecosystem, since I see them primarily as storage numbers and not as computational numbers.

Yep. Both of these express the notion that the current values are mostly storage and not computation. If you already think this way, there's little advantage to this proposal.

However, we do support computation with such values. Since I think I may not have been sufficiently clear, let me walk you through an interesting experiment. This is one that someone might do to compare two images, or look for frame-by-frame temporal differences in a movie:

julia> using ImageView, TestImages

julia> mri = testimage("mri");

julia> diff1 = mri[:,:,1:end-1] - mri[:,:,2:end];

julia> diff2 = float.(mri[:,:,1:end-1]) - float.(mri[:,:,2:end]);

julia> imshow(diff1); imshow(diff2)

It's worth doing this on your own, but to make life easy here are the first frames from each:
d1
d2

The second one is "correct," with mid-gray encoding no difference. Or you can use colorsigned from ImageCore:
d4
d3

(magenta is positive and green is negative)

Now, if you're used to calling float(img) before doing any computations, there is nothing to worry about. But new users often are bothered by this kind of overflow (e.g., @johnnychen94, you got bit by this in JuliaImages/ImageCore.jl#63).

To the specifics of this proposal: aside from defining & supporting the type (which, as @kimikage indicated, is a burden), the change that would "fix" these issues is a single line, which could be written:

-(x::N0f8, y::N0f8) = S7f8(Int16(x.i) - Int16(y.i), 0)

This is definitely a breaking change, but it's not a huge computational burden:

julia> minus1(x, y) = x - y   # this is essentially what we do now
minus1 (generic function with 1 method)

julia> minus2(x, y) = Int16(x) - Int16(y)    # this is what we could do if we had S7f8
minus2 (generic function with 1 method)

when passed UInt8s is just the difference in "padding" with a byte of all zeros:

julia> @btime minus1(x, y) setup=(x = rand(UInt8); y = rand(UInt8))
  1.240 ns (0 allocations: 0 bytes)
0x22

julia> @btime minus2(x, y) setup=(x = rand(UInt8); y = rand(UInt8))
  1.239 ns (0 allocations: 0 bytes)
187

So in contrast to what @johnnychen94 said above, there isn't much/any of a conversion overhead to worry about. (For comparison, float(x) is more expensive, clocking in at 2ns.)

Now, just about the only operation this would affect are + and -; as soon as you multiply by a AbstractFloat you convert to floats and the rest of the computations would not be affected.

So, to recap, the costs of this are:

  • more breaking changes
  • more types to support
  • a doubling in size of images if you add or subtract them

and the benefits are

  • adding and subtracting will more often work more in the way users expect them to.

That's pretty much the bottom line, I think.

@johnnychen94
Copy link
Collaborator

I'm satisfied with the "breaking change" and it's more like bug fixes to me.

An alternative way to fix this issue is making Normed a pure storage type and promotes to float or Fixed for every relevant operation. That could really save a lot of effort.

@kimikage
Copy link
Collaborator

kimikage commented Nov 27, 2019

So in contrast to what @johnnychen94 said above, there isn't much/any of a conversion overhead to worry about. (For comparison, float(x) is more expensive, clocking in at 2ns.)

As I showed in the past benchmarks, the benchmarks of a single operation can easily mislead us.:confused:

function _convert(::Type{U}, x::Tf) where {T, f, U <: Normed{T,f}, Tf <: Union{Float32, Float64}}
if T == UInt128 && f == 53
0 <= x <= Tf(3.777893186295717e22) || throw_converterror(U, x)
function _convert(::Type{N}, x::Tf) where {T <: Integer, f, N <: Normed{T,f}, Tf <: Union{Float32, Float64}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry but this method is currently optimized for non-negative numbers and cannot handle negative numbers. (pay attention to k.) Of course, this is a trivial issue, but I mean, "optimization is specialization".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I hadn't checked any of this, this is WIP.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care about that much in FixedPointNumbers. However, it is generally not good to ask other packages for such verification and modification. If this is essential to the design, we should change it aggressively, but I don't think so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand. All I meant was that this was a draft I decided to let people know I was working on. I will fix this before we merge this if we decide to do so.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a simple problem. The current Normed is always unsigned, but your Normed is not always unsigned.

@@ -241,7 +266,8 @@ Base.Rational{Ti}(x::Normed) where {Ti <: Integer} = convert(Ti, reinterpret(x))
Base.Rational(x::Normed) = reinterpret(x)//rawone(x)

# Traits
abs(x::Normed) = x
abs(x::Normed{<:Unsigned}) = x
abs(x::T) where T<:Normed = T(abs(x.i), 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does not abs(::Normed{<:Signed}) return Normed{<:Unsigned}? (Of course, this is not a question.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try this: abs(Int8(-5)). In general Julia doesn't change the number type, even to the point where abs(typemin(Int8)) returns a negative number!

Copy link
Collaborator

@kimikage kimikage Nov 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. Therefore, it can be concluded that -(x::N0f8, y::N0f8) = S7f8 (Int16(x.i) - Int16(y.i), 0) is not good! 😝

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is in conflict with the rest of Julia. I guess we could return Normed{<:Unsigned}, since the only other "always agrees with expectations" is to widen which also changes the type.

@kimikage
Copy link
Collaborator

kimikage commented Nov 29, 2019

It might be better to clarify what we need.

  • Extra bits
  • Sign bit (i.e. negative numbers)
  • Newbie-friendly promotion
  • Speed

It is premature to discuss the speed because the current Fixed and Normed are not fully optimized. (Edit: Moreover, CheckedArithmetic will provide a reasonable interface, but it is not always the fastest way. cf. #41 (comment))
If we just need the extra bits for subtractions, we can use N8f8 instead of S7f8, although it is never newbie-friendly.
With regards to the image processing, it is important that there are no negative RGB colors. Thus, we should (implicitly) apply the mapping.

Edit:
From @vchuravy:

I see there are two orthogonal concerns.

  1. Is a number normed or fixed point. This is a subtle concern that tripped me up in the past when trying to unify these two concepts. Both have fractional bits, but the meaning differs slightly.
  2. Is there a sign bit.

I also agree about that. So, I think there are two types of the type hierarchy:

Fraction-first

  • FixedPoint
    • AbstractFixed
      • Fixed
      • UFixed
    • AbstractNormed
      • Normed
      • SNormed

Sign-first

  • FixedPoint
    • SignedFixedPoint
      • Fixed
      • SNormed
    • UnsignedFixedPoint
      • UFixed
      • Normed

I think it is a good idea to add the signed Normed, but Normed{<:Signed} is not good, as mentioned above.

@vchuravy
Copy link
Collaborator

I also agree about that. So, I think there are two types of the type hierarchy:

Yeah I have a strong preference for the first one since it emphasises the semantic difference between Normed and Fixed.

@kimikage
Copy link
Collaborator

Since we can only choose one for now, I agree completely. However, I think choosing the former is not denying the latter. If we explicitly choose the former, we should be responsible for the implicit nature of the latter. Normed{<:Signed} breaks the nature of the latter.

@timholy
Copy link
Member Author

timholy commented Nov 30, 2019

With regards to the image processing, it is important that there are no negative RGB colors. Thus, we should (implicitly) apply the mapping.

Not sure I fully agree here. We allow folks to construct out-of-gamut colors routinely. This is important in, e.g., computing the mean color of an image patch, where the natural implementation of mean is sum(a)/length(a). It's nice to have a type that enforces in-gamut, but that's not essential. Similar considerations apply in https://www.aristeia.com/Papers/C++ReportColumns/sep95.pdf (linked from the C++ link above).

Normed{<:Signed} is not good, as mentioned above.

I wasn't sure which place, can you elaborate? I don't think anything in the concept of "normed" enforces "unsignedness," so I don't see any problem with Normed{<:Signed}. You can always check for Signed in the type paramter if you need to differentiate (which I needed to do in a couple of places in this PR, though did not check in enough!).

@kimikage
Copy link
Collaborator

I'm sorry. I'm not good with words.

We allow folks to construct out-of-gamut colors routinely.

What I want to say is not about the gamut, but about the visualization. Since our LCDs cannot represent negative RGB colors, we need to project abstract numbers into concrete colors. It usually involves the type conversion. If you do type conversion after all, is there any reason not to convert the numbers to a more convenient type (e.g. Float32, Float16, Fixed) first? If you do not visualize the data, what is the difference between N8f8 and S7f8 in RAMs?

Although signed Normeds are useful as an intermediate storage type, I don't realize the need to implement those arithmetic.

@timholy
Copy link
Member Author

timholy commented Nov 30, 2019

Good! This focuses the discussion down. Really I think we have 3 options:

  1. The current approach: implement math with Normed numbers, with the results also Normed. Advantage: consistent with the rest of Julia. Disadvantage: "wrong" results are returned for very simple operations.
  2. Declare that Normed numbers are storage only. Math always returns floating-point equivalents, e.g., +(x::N0f8, y::N0f8) = float(x) + float(y). Advantages: (1) "correct" results are always returned, although roundoff will become more annoying, and (2) considerable simplification of this package and the type landscape. Disadvantages: (1) less consistent with the rest of Julia. (2) there's a 4x blowup of size for most operations. (Float16 is too slow to be a practical result type.) (3) For some algorithms, like median filtering and histogram equalization, there are particularly efficient implementations that rely on the discretization provided by 8-bit values.
  3. Implement signed variants of Normed numbers. Advantages: "correct" results are usually returned (barring overflow), and roundoff won't happen since the underlying types are integral. Disadvantages: (1) less consistent with the rest of Julia, (2) more types we have to support, (3) like option 2, discretization becomes less of an advantage even by moving to 16-bit types (there are more bins you have to check).

@timholy
Copy link
Member Author

timholy commented Nov 30, 2019

In comparing 2 vs 3, perhaps the key question is whether the extra complexity is worth optimizing a small number of operations (basically, addition and subtraction). For reference,

julia> function mysum(acc, A)
           @inbounds @simd for a in A
               acc += a
           end
           return acc
       end
mysum (generic function with 1 method)

julia> using FixedPointNumbers

julia> A = collect(rand(N0f8, 100));

julia> using BenchmarkTools

julia> Base.:(+)(x::N56f8, y::N0f8) = N56f8(x.i + y.i, 0)

julia> @btime mysum(zero(N56f8), A)
  23.733 ns (1 allocation: 16 bytes)
50.255N56f8

julia> @btime mysum(0.0f0, A)
  23.544 ns (1 allocation: 16 bytes)
50.254906f0

julia> @btime mysum(zero(N56f8), A)
  22.852 ns (1 allocation: 16 bytes)
50.255N56f8

julia> @btime mysum(0.0f0, A)
  23.782 ns (1 allocation: 16 bytes)
50.254906f0

so at least on my machine the performance difference seems to be within the noise. (I am surprised, TBH.)

Can others try this benchmark and report back?

@kimikage
Copy link
Collaborator

kimikage commented Nov 30, 2019

These trends apply not only to x86_64 but also to ARM (with NEON).

julia> versioninfo()
Julia Version 1.0.3
Platform Info:
  OS: Linux (arm-linux-gnueabihf)
  CPU: ARMv7 Processor rev 4 (v7l)
  WORD_SIZE: 32
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, cortex-a53)

julia> @btime mysum(zero(N56f8), A)
  1.219 μs (2 allocations: 32 bytes)
47.776N56f8

julia> @btime mysum(0.0f0, A)
  1.698 μs (1 allocation: 8 bytes)
47.776474f0

Of course, the 32-bit accumulator is faster.

julia> Base.:(+)(x::N24f8, y::N0f8) = N24f8(x.i + y.i, 0)

julia> @btime mysum(zero(N24f8), A)
  757.864 ns (2 allocations: 16 bytes)
47.776N24f8

@timholy
Copy link
Member Author

timholy commented Nov 30, 2019

OK, so there are architectures where it makes a 2x difference. Is that (along with its domain of applicability, which is basically summation/differencing) enough to matter?

@kimikage
Copy link
Collaborator

kimikage commented Nov 30, 2019

I think this is a compiler issue rather than a CPU architecture or CPU performance issue. Of course, compiler-friendly types are better. However, is that really what you wanted to know with the benchmark?

Edit:
I mainly support the option 1, but I do not object to adding isolated signed Normed numbers.
I am strongly against the option 2. If you do that so, you should split FixedPointNumbers into FixedPointTypes and FixedPointArithmetic.

Edit2:
My other concern is that FixedPointNumbers is immature. The packages which cannot follow this change can use the current (old) version, but who will support backporting in the future? The generalization is not so simple. We do easily (innocently) anti-generalization (see: #139).

@kimikage
Copy link
Collaborator

kimikage commented Dec 1, 2019

There are two orthogonal concepts for accumulation.

  • Using Float for the accumulator
  • Converting each element to Float
julia> function mysum_raw(acc, A)
           @inbounds @simd for a in A
               acc += reinterpret(a)
           end
           return acc / FixedPointNumbers.rawone(eltype(A))
       end

This is just a conceptual code and not practical. This difference might be noticeable for large (more practical) arrays.

It is premature to discuss the speed because the current Fixed and Normed are not fully optimized.

@timholy
Copy link
Member Author

timholy commented Dec 1, 2019

I don't think most algorithms should dive into this level of detail about their element types. Your code above should be equivalent to mysum(zero(N56f8), A) (on a 64-bit machine) in all respects except the result type, and of course it can be converted to Float64 depending on what you do with it next. Relying on Julia's abstraction properties means we can write generic algorithms that don't have to be hand-tuned for each element type, without paying a performance penalty. But what we decide here will end up determining what that inner loop looks like.

@kimikage
Copy link
Collaborator

kimikage commented Dec 1, 2019

My conceptual code is never never never practically good. Is it related to this discussion? 😕
I don't want to explain the problems of your benchmark code in detail

@timholy
Copy link
Member Author

timholy commented Dec 1, 2019

I mainly support the option 1, but I do not object to adding isolated signed Normed numbers.
I am strongly against the option 2. If you do that so, you should split FixedPointNumbers into FixedPointTypes and FixedPointArithmetic.

Is it related to this discussion?

#143 (comment) is, in my mind (flaws included), a benchmark for whether 3 provides any advantage over 2. Given that you don't like option 2, and that at least on some architectures 3 does perform better than 2, it seems our main options are 1 and 3. Option 3 would allow one to fix #41, #126, JuliaImages/ImageCore.jl#63. Option 1 is the status quo.

Can we agree on 3 as a reasonable consensus solution?

@kimikage
Copy link
Collaborator

kimikage commented Dec 1, 2019

First of all, a wrong benchmark misleads us. Since sum is a special case of addition, it is not fair. Moreover, the optimizer still has room for improvement. Do you want to "redesign" every time the compiler changes?

I think the main problem of #41 (and JuliaImages/ImageCore.jl#63 ?) will be solved by @checked in the near future. I think #126 can be solved without Option 3.

What are the specific problems which Option 1 cannot solve?

@timholy
Copy link
Member Author

timholy commented Dec 1, 2019

will be solved by @checked in the near future

I don't doubt you already know everything I'm about to say, so I may not be understanding your true question. But let me try: @checked supports "normal" arithmetic with the extra feature that it throws an error if the results overflow. For any nontrivial image pair, @checked img1 - img2 would result in an error. Is that a solution? I guess it depends on your perspective. Would one always want @checked arithmetic? Doubtful, since image-processing is performance-sensitive. So what is @checked for? Mainly tests and debugging, I think. It only works on surface syntax; to go "deeper" we'd need a separate package, CheckedFixedPointNumbers that checks all operations by default. I am fairly certain that there isn't a good mechanism that (1) lets you have just one type, (2) works with precompilation, (3) allows you to switch between checked and unchecked arthmetic, and (4) doesn't also incur a performance penalty. And again, the only thing that buys you is an error when overflow is detected.

Let me try to get others to chime in by making the options here crystal clear:

  1. we can have img1 - img2 work for arbitrary images but use "wraparound" behavior. This is the status quo.
  2. we can have img1 - img2 throw a MethodError, forcing users to write float.(img1) - float.(img2). This will avoid the "wraparound" behavior. Implementing this option would simply delete the code supporting arithmetic from FixedPointNumbers.
  3. we can have img1 - img2 promote eltypes automatically to something that doesn't typically wrap around. This comes in two flavors:
    a. promote to float for all operations
    b. promote to Normed{<:Signed} for at least addition and subtraction

My hope in submitting this PR was that option 3b would "thread the needle" and be an acceptable option to everyone. I think that's pretty clearly not the case. If I understand the voting so far, it looks like this (:+1: means approve, :-1: means disapprove, nothing means relatively neutral):

Option 1: status quo Option 2: force conversion Option 3a: promotion to float Option 3b: promotion to Normed{<:Signed}
@kimikage 👍 👎 👎
@johnnychen94 👎
@timholy 👍 👎 👎 👍
@ianshmean 👎 slight 👍
@tlnagy 👎 👎 👍

If I've misunderstood people's votes, please correct me. How do others vote?

@kimikage
Copy link
Collaborator

kimikage commented Dec 2, 2019

Your explanation is very easy to understand. 👍

Option 2:
force conversion: 👍
delete the code supporting arithmetic: 👎 (Edit: I think the MethodError is not user-friendly. We can communicate with the users by means of a error message.)

I don't know much about the costs of education, but generally it's not bad to be able to manage your own workspace yourself.

Whether the answer is correct or not is determined by the specifications, not by the absence of "overflow". FixedPointNumbers has no way of knowing the user's specifications. Wait for someone to create FormalMethods.jl. However, the higher the level, the closer to the users.

In the first place, where do Normed numbers come from in image processing? Do the users really create them? I think breaking or narrowing the entry pathway is a common tactics. I can't agree with this much, but this is better than Option 3.

@timholy
Copy link
Member Author

timholy commented Dec 2, 2019

To clarify, option 2 = "force conversion" is equivalent to "delete the code supporting arithmetic." The only way to force users to convert to some other type before doing arithmetic is for arithmetic on Normed types to throw an error. Option 2 is the "Normed types are for storage only" option. So your votes on Option 2 are in conflict with one another. If you don't want to delete the code supporting arithmetic, you have to decide what type arithmetic returns, leaving options 1, 3a, and 3b.

where do Normed numbers come from in image processing

img = load("mypicture.png"). They are what we return for any integer-encoded image (e.g., 8- and 16-bit grayscale and color images, where for color 8/16 means per channel). The majority of users never make a conscious decision about the element type they receive; nor have many read the documentation on FixedPointNumbers.

I've checked every package in JuliaRegistries/General that depends on FixedPointNumbers. The only one that isn't related to image processing or colors is https://github.com/JuliaAudio/SampledSignals.jl, which from their README would apparently benefit from Normed{<:Signed}.

@johnnychen94
Copy link
Collaborator

johnnychen94 commented Dec 2, 2019

Option 2: neutral 😑
I'm only unsatisfied with the status quo. Option 2 is what users of FixedPointNumbers.jl can do to solve this issue at present without touching this package. I can see the importance of Option 3(especially Option 3b), but since I'm not familiar with the implementation details and not care much about the sizes at present, really I have nothing much to comment on.

What are the specific problems which Option 1 cannot solve?

If it's on test stage, ReferenceTest.jl already bring us a tool to examine the numerical results on different types. The problem with Option 1 is that we can't guarantee the numerical accuracy without test.

Plug-and-play is a very common style when we're processing images. As a playground, we generally don't write tests. When wrap around behavior happens without detected, we would tend to conclude that "my processing pipeline should be modified" and it's actually quite severe mislead in many research scenarios.

Where do Normed numbers come from in image processing? Do the users really create them?

I only manually convert Float32 to Normed when I need to benchmark different algorithms -- take advantages of the inaccuracy of representation -- just to have a more stable benchmark result, e.g.,

julia> float_x = 0.1
0.1

julia> float_y = 0.103
0.103

julia> N0f8(float_x) - N0f8(float_y)
0.0N0f8

@timholy
Copy link
Member Author

timholy commented Dec 2, 2019

@johnnychen94, thanks for clarifying; I've updated your votes in the table accordingly.

One thing I should also make sure is understood: Python's scikit-image uses Option 1; Matlab uses Option 1 but uses saturating arithmetic. If we choose something different, we're blazing a different path. But we're already doing that with FixedPointNumbers anyway; other frameworks that support both "fixed point" (integer) images and floating-point images are content to live with the equation 255 == 1.0 (sometimes), whereas we are not. Moreover, both Matlab & Python have to make this decision in the context of general operations on arrays with a very limited palette of element types, whereas we're in the situation of having an image-specific element type and hence have greater freedom to set our own rules. Consequently, I don't think it's at all crazy to do something "better" than what they do, if indeed it is possible to do anything that is better.

Consequently I understand @johnnychen94's dissatisfaction with the current status quo, even if I also acknowledge that it's one of the most self-consistent options we have.

@kimikage
Copy link
Collaborator

kimikage commented Dec 2, 2019

@timholy:

To clarify, option 2 = "force conversion" is equivalent to "delete the code supporting arithmetic."

I understand and clearly disagree with Option 2.

The only way to force users to convert to some other type before doing arithmetic is for arithmetic on Normed types to throw an error.

I cannot completely agree with that. I just proposed another way. (The way is not pretty good, though.)

I've checked every package in JuliaRegistries/General that depends on FixedPointNumbers.

Great survey! However, isolated signed Normed and Normed{<:Signed} should be clearly distinguished. The former should not be detrimental to the users (except for the unnatural naming), but the latter is a breaking change, even though it may be a minor change for you. Again, "optimization is specialization".

@johnnychen94:

The problem with Option 1 is that we can't guarantee the numerical accuracy without test.

That's right! Therefore, I think testing is very important. A program with no specifications or tests is a poem. Although I do never hate poetry, it is not the basis for software design.
If the errors mislead us, we should consider throwing correct and useful errors first. I don't think this is an obligation of low-level packages like FixedPointNumbers. Of course, it is good to improve FixedPointNumbers for the better errors.


This is just a thought experiment. What about NormedFloat instead of Normed{<:Signed}? I can assure you that the fully-optimized NormedFloat is not inferior to Normed{<:Signed} in terms of speed and size. However, I don't think it is good. What is the advantage of Normed{<:Signed} over NormedFloat?

Edit:
BTW, I'm somewhat violent or rude, because it is a little stressful for me to express my (abstract) thoughts in English and I'm afraid that the discussion will go on while I am translating. Forgive me for being so rude. How much time do we have for this discussion?

@timholy
Copy link
Member Author

timholy commented Dec 2, 2019

This is just a thought experiment. What about NormedFloat instead of Normed{<:Signed}? I can assure you that the fully-optimized NormedFloat is not inferior to Normed{<:Signed} in terms of speed and size. However, I don't think it is good. What is the advantage of Normed{<:Signed} over NormedFloat?

What is NormedFloat?

How much time do we have for this discussion?

It can go on paralyzing development forever, probably 😉. Seriously, we need to reach some kind of consensus here. Why don't you propose what you think we should do, rather than just argue against what I am proposing? AFAICT the only solution currently is the status quo. I can live with that but I'm not happy with it. Perhaps you feel you have proposed a solution, but if so I didn't understand it.

EDIT: To be concrete, I was hoping to put in a grant on Dec. 17th to try to fund work on JuliaImages. It would be great to reach consensus well before that.

@timholy
Copy link
Member Author

timholy commented Dec 2, 2019

I understand and clearly disagree with Option 2.

I see you edited your post to clarify. Yes, it's fine to throw an informative message. I would have done it that way, I was being a bit casual by saying we'd just delete them. Delete the content, I should have specified.

IUUC, given a friendly error message you are favorable on Option 2, no? I am confused by the reply above, however. I'll amend your votes once I understand your views correctly.

@kimikage
Copy link
Collaborator

kimikage commented Dec 2, 2019

What is NormedFloat?

I'm sorry for the lack of explanation, but it is so simple that no explanation is needed. NormedFloat means Normed{T<:AbstractFloat, f}. (Of course, it is no longer a part of FixedPointNumbers. So, I did not call it Normed.) As a matter of fact, I have already proposed this concept.
We can use Normed{Float16, 8} instead of Normed{Int16, 8}.
Regarding the addition and subtraction of N0f8 numbers (i.e. "safe" operations), Normed{Float16,8} is empirically estimated to be ~2x slower than (unoptimized) S7f8. If the difference of speed is like that, I empirically assume that the bottlenecks are memory allocations and memory accesses, and that the practical speeds will be approximately the same. Of course, the sizes are the same.
Well, this is just a thought experiment. I don't think the promotion to Normed{<:AbstractFloat} is good, too.

@kimikage
Copy link
Collaborator

kimikage commented Dec 2, 2019

Perhaps you feel you have proposed a solution, but if so I didn't understand it.

I don't think it is nice, but narrowing the entry pathway is one solution to "force conversion" , isn't it?

julia> img = load("mypicture.png")
┌ Warning: Please specify the pixel type, e.g, `load(Float32, "mypicture.png")`. You are using `N0f8` now.
└ @ Foo
2×2 Array{RGB{N0f8},2} with eltype RGB{Normed{UInt8,8}}:
 RGB{N0f8}(0.867,0.157,0.169)  RGB{N0f8}(0.984,0.133,0.851)
 RGB{N0f8}(0.855,0.984,0.945)  RGB{N0f8}(0.553,0.549,0.133)

Again, I don't think this is nice. However, I'm not against this because this is the approach which is closer to the users than the low-level FixedPointNumbers.

@kimikage
Copy link
Collaborator

kimikage commented Dec 2, 2019

My biggest concern is that there are very few people participating in this discussion rather than what the final decision is.

If you are convinced that signed Normed is great, why not complete the implementation of signed Normed without breaking changes? If it is really great, everyone will extend the current Normed. And then, there should be a movement to promote to signed Normed.
Of course, it might be too late for the funding.

I do not intend to inhibit it maliciously. However, I think "we should do" #139 and supporting @checked first. Unfortunately, the development in parallel should cause the conflicts.

@IanButterworth
Copy link

Just FYI, I have been following this, but I’ve not felt like my thoughts are yet robust enough to contribute.

Perhaps I’d just say a few things:

  • silent wraparound keeps me up at night, so I do like the idea of dealing with this
  • I love the idea of finding a type promotion system that maintains minimal lossless precision throughout operations (3b), but figuring out the types of each possible combination sounds like a big undertaking, and it would be interesting to see what the benefits in performance and data integrity were over float.

I guess it seems like some more fleshed-out working case studies could help frame the discussion?

HTH!

@timholy
Copy link
Member Author

timholy commented Dec 2, 2019

NormedFloat

I can't see any advantage in NormedFloat: why have an implicit /255 in a type that can represent the result directly? The main purpose of these types is to harmonize the values of "integer-valued" and "floating-point valued" images, which are treated differently in most image-processing frameworks (e.g., https://scikit-image.org/docs/stable/user_guide/data_types.html). This package harmonizes them by, for example, providing a number type with values ranging from 0 to 1 that's representable in 8 bits. I don't see how NormedFloat contributes to that, since you can just do the division directly.

but narrowing the entry pathway is one solution to "force conversion"

We can't force conversion of a 1TB image. My lab uses those all the time, via memory-mapping.

However, I suppose we could change all the image-loading routines to return a MappedArray using of_eltype. That would lazily-convert to Float32 representation upon usage. That would make the rules of arithmetic in this package essentially irrelevant. Thoughts?

If you are convinced that signed Normed is great, why not complete the implementation of signed Normed without breaking changes?

Yes, we could create the type and then decide how to use it later. I just don't feel like creating types for the fun of it, I'd rather be sure there's a path towards them being useful, hence the proposal to change the rules of arithmetic.

Breaking changes aren't a disaster if they are managed carefully; we can bound all packages that use FixedPointNumbers to 0.x versions, and then the packages that want to keep up can install CompatHelper to learn about the new approach. Breaking changes are acceptable especially for a 1.0 release; we broke just about everything in the Julia 0.6->1.0 transition. We already have some breaking changes queued, though any change in arithmetic rules would be a much larger change.

So for me the point of this discussion was to determine whether we can find a better way. I am thinking we can't, so perhaps I should abandon this direction.

@timholy
Copy link
Member Author

timholy commented Dec 2, 2019

@kimikage, I may have been remiss in not directing you to this earlier, but if you haven't read it yet I highly recommend reading https://juliaimages.org/latest/arrays_colors/, particularly starting at https://juliaimages.org/latest/arrays_colors/#fixedpoint-1 and reading to the end of the page. It's worth comparing to that scikit-image page I linked above, particularly section titled "Input types" which, you'll note, forces one to change the value when you want to change the representation. I apologize for the lack of a Japanese translation, I am aware that it makes it much more difficult. I would not fare well if the only documentation were available in Japanese 🤷‍♂️ .

@kimikage
Copy link
Collaborator

kimikage commented Dec 2, 2019

In the first place, please don't put the blame on my lack of knowledge. It is my personal problem that I am stubborn, so you can blame just me any amount. However, my (or our) claim is that we should keep it consistent with the rest of Julia. No matter how much I learn, the Julia language will never change unless I take action.

@tlnagy
Copy link

tlnagy commented Dec 2, 2019

However, I suppose we could change all the image-loading routines to return a MappedArray using of_eltype. That would lazily-convert to Float32 representation upon usage. That would make the rules of arithmetic in this package essentially irrelevant. Thoughts?

This change would fix the problem, but we would lose lossless precision doing so and it would increase the cost of accessing a view of the image since you would need to do this conversion every time. Do we have benchmarks for how much lazy conversions like this would affect common operations?

After reading through this thread, here are my thoughts:

  • Option 1: 👎 I know first-hand that this isn't very user-friendly for new users and is well discussed earlier in this thread. That said, it's gotten us this far, but it's problematic.
  • Option 2: 👎 This is very user-unfriendly even if we provide an explanation for what to do. Things should just work, even 3a is preferable to this.
  • Option 3a: 😐 Eager: Not realistic for large images* Lazily: See my comments above
  • Option 3b: 👍 I quite like this solution. I think that if the benchmarks for realistic image sets are good, then I would prefer this solution since it would balance the tradeoffs between the different options.

* I think I've had to resort to Option 3a (recasting image eltype to float) in my own code, which I'm not happy with (ref: https://github.com/tlnagy/SegmentationTools.jl/blob/36d6b98236cf62880e171059547fd55f9f11e9b4/src/utils.jl#L39).

Consequently, I don't think it's at all crazy to do something "better" than what they do, if indeed it is possible to do anything that is better.

I would go further and say that we would be doing the greater imaging community a disfavor if we didn't try and develop something better given all the problems with the status quo and that we are in a unique position to build it.

So for me the point of this discussion was to determine whether we can find a better way. I am thinking we can't, so perhaps I should abandon this direction.

I hope not! I like the bones here and the status quo could be better and changes like these are why I love the JuliaImages ecosystem :)

@johnnychen94
Copy link
Collaborator

johnnychen94 commented Dec 2, 2019

FYI, I used a lot of MappedArrays.of_eltype to do the lazy conversion in ImageDistances.jl, and the routine I take there is:

  1. channelview
  2. of_eltype: N0f8 -> Float32
  3. real computation: e.g., Euclidean
  4. of_eltype: Float32 -> N0f8
  5. colorview

And in test stage, I use ReferenceTests.jl to compare the results of all possible combination of Bool, N0f8, Float32, Gray, RGB, Lab(6 types for Gray, 2 for RGB and 1 for Lab) just to make sure all input types get the "same" results. With regard to @checked, I think ReferenceTests is more powerful and applicable (e.g., it keeps a record in disk, it support custom distances such as L0 norm, which is useful to check the results of binarization algorithm)

This approach is definitely not good in performance, but at that moment (got bit by wraparound behavior) I just wanted to guarantee accuracy first.

This discussion is becoming too long to catch up with, so I hope consensus at some level can be reached soon, and I'd like to contribute to where I'm capable of.

@kimikage
Copy link
Collaborator

kimikage commented Dec 2, 2019

I can't see any advantage in NormedFloat

NormedFloat must not be good because it is a spoiling candidate or a touchstone. Please tell us the advantages of Normed{<:Signed}, not the disadvantages of NormedFloat. At least, NormedFloat is the same asNormed{<:Signed} in the following ways:

Implement signed variants of Normed numbers. Advantages: "correct" results are usually returned (barring overflow), and roundoff won't happen since the underlying types are integral. Disadvantages: (1) less consistent with the rest of Julia, (2) more types we have to support, (3) like option 2, discretization becomes less of an advantage even by moving to 16-bit types (there are more bins you have to check).

(Edit: Of course, Normed{Float16,8}.i is integral (in add/sub operations). It can handle 11-bit-depth pixels losslessly.)


Edit2:

We can't force conversion of a 1TB image. My lab uses those all the time, via memory-mapping.

So, you have a narrower gate, right? An important point in this approach is to prevent the (new) users from using N0f8 carelessly. Since actually you are going to use S7f8, there is no difference after passing the gate.


Breaking changes aren't a disaster if they are managed carefully;

Ideally. Please face the facts(#139). We are only interested in what we need. I hate potential minor bugs. They are more vicious than obvious malfunctions.

I don't care about that much in FixedPointNumbers. However, it is generally not good to ask other packages for such verification and modification. If this is essential to the design, we should change it aggressively, but I don't think so.


Please give me some time to study. My claims have been already written, so it's okay to conclude this discussion during my absence.

@kimikage
Copy link
Collaborator

kimikage commented Dec 6, 2019

I do not mean to be rude, but the supporters of "3b" had better clarify the promotion rules. (cf. #147 (comment))
Normed has many types other than N0f8 and N0f16, even though the other types are not often used in image processing.

@Tokazama
Copy link

Tokazama commented Dec 6, 2019

Been following this but don't have the bandwidth right now to help think upon solutions too much. The only thing I really care about on this is that:

  1. Type conversion doesn't became a nightmare. I'm sure a motivated individual could figure out a complex set of promotion rules given a bunch of types, but that's always obnoxious.
  2. It doesn't affect speed in downstream plotting functions.

I think both of these have been addressed to some degree, so the only other thing I'd add is that I only use fixed point numbers as a storage type passed to graphics libraries. Most of the time it doesn't make sense for an MRI to be directly represented as a fixed point number.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants