Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FloatRange: new type for accurate floating-point ranges [fix #2333] #5636

Merged
merged 4 commits into from
Feb 24, 2014

Conversation

StefanKarpinski
Copy link
Member

This algorithm for floating-point ranges magically reads your mind.

  1. It defaults to the plain [ r.start + k*r.step for k = 0:r.len ] behavior by setting len = s and divisor = 1.
  2. It attempts to "lift" the iteration to a level where the step is integral by rationalizing s, using a customized rationalize function.
  3. If the lifted form matches the original (start,step,stop) tuple, then we use lifted iteration, stepping by an integer amount and projecting back down to get a perfect floating point range. Otherwise, it falls through to the default behavior.
  4. Once the FloatRange parameters are determined, iteration and indexing are quite simple: the values of the range are [ (r.start + k*r.step)/r.divisor for k = 0:r.len ].

@JeffBezanson
Copy link
Member

Very cool! Does this generally give the same values as linspace? I don't know if that's necessary, just wondering.

@timholy
Copy link
Member

timholy commented Feb 1, 2014

Can we have a few more such algorithms? I'm thinking of functions like analyzemydataforme and plotmydatasoitlookscool. Thanks in advance!

@jiahao
Copy link
Member

jiahao commented Feb 1, 2014

+1 for writemypaper

@kmsquire
Copy link
Member

kmsquire commented Feb 1, 2014

+1 for writemypaper

I think that's in the forthcoming VirtualPostDoc package that Stefan is writing.

@StefanKarpinski
Copy link
Member Author

@JeffBezanson, re matching linspace – no. For example:

julia> Base.showcompact(io::IO, x::Float64) = show(io,x)
showcompact (generic function with 8 methods)

julia> [0.3:0.1:1.1 linspace(0.3,1.1,9)]
9x2 Array{Float64,2}:
 0.3  0.3
 0.4  0.4
 0.5  0.5
 0.6  0.6000000000000001
 0.7  0.7000000000000001
 0.8  0.8
 0.9  0.9
 1.0  1.0000000000000002
 1.1  1.1

As you can see, even though linspace is better than the naïve approach, the cancellation doesn't always work out the way you'd like. This was a bit of an epiphany for me when working on this: the sliding average between the beginning and the end is a bit of a red herring – if you get the lifting right, you're working entirely with integral values and it becomes unnecessary.

@JeffBezanson
Copy link
Member

Maybe linspace should be implemented using ranges after this.
On Feb 1, 2014 10:42 AM, "Stefan Karpinski" [email protected]
wrote:

@JeffBezanson https://github.com/JeffBezanson, re matching linspace- no. For example:

julia> Base.showcompact(io::IO, x::Float64) = show(io,x)showcompact (generic function with 8 methods)
julia> [0.3:0.1:1.1 linspace(0.3,1.1,9)]9x2 Array{Float64,2}:
0.3 0.3
0.4 0.4
0.5 0.5
0.6 0.6000000000000001
0.7 0.7000000000000001
0.8 0.8
0.9 0.9
1.0 1.0000000000000002
1.1 1.1

As you can see, even though linspace is better than the naïve approach,
the cancellation doesn't always work out the way you'd like. This was a bit
of an epiphany for me when working on this: the sliding average between the
beginning and the end is a bit of a red herring - if you get the lifting
right, you're working entirely with integral values and it becomes
unnecessary.

Reply to this email directly or view it on GitHubhttps://github.com//pull/5636#issuecomment-33874713
.

@StefanKarpinski
Copy link
Member Author

No, I think linspace has a pretty clear and well-defined behavior that we should respect. We may, however want to take a look at how the linear interpolation is done. For example, I wonder if doing the division by n after adding the two halves wouldn't fix some cancellation issues.

@JeffBezanson
Copy link
Member

@timholy analyzemydataforme already exists; it's called svd :-P

@timholy
Copy link
Member

timholy commented Feb 1, 2014

Man, I'd looked everywhere for that function. Thanks!

@stevengj
Copy link
Member

stevengj commented Feb 1, 2014

Since the next{T}(r::FloatRange{T}, i) uses a floating-point division, is the performance difference compared to start+i*step noticeable for simple loops?

@StefanKarpinski
Copy link
Member Author

I haven't looked at performance at all, so I'm not sure. I would imagine it's a bit slower.

@stevengj
Copy link
Member

stevengj commented Feb 3, 2014

function loop1(start,step,n)
    sum = 0.0
    for i = 0:n-1
        sum += start + i*step
    end
    sum
end
function loop2(start,step,divisor,n)
    sum = 0.0
    for i = 0:n-1
        sum += (start + i*step)/divisor
    end
    sum
end
@time loop1(0.0,0.001,10^7)
@time loop2(0.0,1.0,1000.0,10^7);

gives

elapsed time: 0.009202138 seconds (80 bytes allocated)
elapsed time: 0.067068429 seconds (80 bytes allocated)

on my computer: about a factor of 7 penalty for the divisions in a simple loop.

@StefanKarpinski
Copy link
Member Author

Yeah, that seems about right. Is it worth it for more accurate/intuitive results? Keep in mind that floating-point ranges with a unit step don't have to pay the price.

@JeffBezanson
Copy link
Member

A unit step is probably not the common case for float ranges.
On Feb 3, 2014 5:44 PM, "Stefan Karpinski" [email protected] wrote:

Yeah, that seems about right. Is it worth it for more accurate/intuitive
results? Keep in mind that floating-point ranges with a unit step don't
have to pay the price.

Reply to this email directly or view it on GitHubhttps://github.com//pull/5636#issuecomment-34009907
.

@StefanKarpinski
Copy link
Member Author

That's quite true. So the question is whether we care more about indexing performance for float ranges or accuracy/intuitiveness. Note that using linspace has similar performance issues.

@ivarne
Copy link
Member

ivarne commented Feb 4, 2014

I really like the more accurate ranges, and think the performance penalty is well worth it. Getting the wrong number of iterations in a loop because of roundoff errors, is surprising to most beginners and also for lots of more seasoned programmers. I have not figured out is why we want to use step_length instead of num_steps for a floating point range, but integer range similarity and Matlab compatibility is probably a good enough reason.

There is a fine balance for where we should try to do what the user intended, and where we should give single instruction performance. The overflow behaviour for integer is an example where we (for now) have opted for the performant "non obvious" choice.

@kmsquire
Copy link
Member

kmsquire commented Feb 4, 2014

I agree with Ivar in that I like the more accurate ranges. If this became
a bottleneck for someone, a FastRange (or FastButInaccurateRange) type
could always be introduced (perhaps in a package).

Kevin

On Mon, Feb 3, 2014 at 4:33 PM, Ivar Nesje [email protected] wrote:

I really like the more accurate ranges, and think the performance penalty
is well worth it. Getting the wrong number of iterations in a loop because
of roundoff errors, is surprising to most beginners and also for lots of
more seasoned programmers. I have not figured out is why we want to use
step_length instead of num_steps for a floating point range, but integer
range similarity and Matlab compatibility is probably a good enough reason.

There is a fine balance for where we should try to do what the user
intended, and where we should give single instruction performance. The
overflow behaviour for integer is an example where we (for now) have opted
for the performant "non obvious" choice.


Reply to this email directly or view it on GitHubhttps://github.com//pull/5636#issuecomment-34018364
.

@ivarne
Copy link
Member

ivarne commented Feb 4, 2014

FastButInaccurateRange is more characters to write than to write the naive algorithm

(a, b, c) = (0.0, 0.1, 1.0)
for i in FastButInaccurateRange(a, b, c)
    #use i
end
# or
for ind in 1:int(c / b)
    i = a + b*ind
    #use i
end

@StefanKarpinski
Copy link
Member Author

You don't even need any of that – you can just explicitly construct a regular Range object whose fields are floats. You just don't get to use the colon syntax to construct it.

@kmsquire
Copy link
Member

kmsquire commented Feb 4, 2014

I hadn't noticed that you created a new type...

@StefanKarpinski
Copy link
Member Author

Yes. Ultimately floating-point ranges and "ordinal ranges" just shouldn't be handled in the same way. This is work towards solving the rangepocalypse umbrella issue.

@nalimilan
Copy link
Member

This really looks beautiful!

@StefanKarpinski, why do you think linspace() shouldn't give the same results? Are there cases where the algorithm you wrote for ranges would give annoying results for linspace()? (On R mailing lists, every month somebody comes asking why e.g. 0.3 %in% seq(0.1, 1.0, length.out=10) is FALSE. Even people with basic knowledge about the fact that floating point computations are not exact get trapped.)

@StefanKarpinski
Copy link
Member Author

I suppose that linspace could be made to work like this as well, but it's a bit stranger because linspace has such a clearly defined behavior: give me n linearly interpolated points between this start value and this end value.

@JeffBezanson
Copy link
Member

This range implementation is great, but a 7x slowdown does give me pause. It shouldn't be possible to beat the standard library by that much with such a simple-but-annoying rewrite. We don't want to have a large catalog of faster-version-of-X types of things.

@StefanKarpinski
Copy link
Member Author

I wonder how many situations there really are where floating-point range indexing is a bottleneck. Seems like a very strange thing to have as a bottleneck to me.

@JeffBezanson
Copy link
Member

Yes, I doubt it would be a bottleneck by itself, but what can happen is you might use 10 pieces of the language/library, collectively using 60% of run time, and each piece is 4-5x slower than it could be. It starts to make a big dent if we do this in several key components. Some more realistic benchmarks would be helpful though.

@StefanKarpinski
Copy link
Member Author

As much as we don't want a large catalog of "faster-version-of-X types of things", we also don't want a large catalog of "correct-version-of-X types of things". In the case of floating-point ranges, getting unexpected values is a perennial problem, whereas I've just never encountered a situation where I really cared that much about the performance of indexing into or iterating over them. That doesn't mean it's not an issue, but it's just not clear to me that this is a place where we should prioritize performance over ease-of-use.

@stevengj
Copy link
Member

stevengj commented Feb 5, 2014

For what it's worth, I tend to agree with @StefanKarpinski that the tradeoff is worth it here. Otherwise roundoff errors make floating-point ranges so annoying that I end up just not using them.

@GunnarFarneback
Copy link
Contributor

I agree too. This is just the place where you want magic, even if it's somewhat slow magic.

@simonbyrne
Copy link
Contributor

While this is a great idea, I think we should be very careful when using the terms "correct" and "accurate": these ranges are really "the best rounded, rational approximation to the range that you specified".

Arguably, a linspace-type interpretation (n equally spaced points, where n is chosen such that the actual step is closest to the step that you specified) would be an equally valid interpretation of correctness in this case.

On a related note, our linspace isn't strictly correct either:

julia> Base.showcompact(io::IO, x::Float64) = show(io,x)
showcompact (generic function with 8 methods)

julia> [linspace(0.3,1.1,9) float64(linspace(big(0.3),big(1.1),9))]
9x2 Array{Float64,2}:
 0.3                 0.3               
 0.4                 0.4               
 0.5                 0.5               
 0.6000000000000001  0.6               
 0.7000000000000001  0.7000000000000001
 0.8                 0.8               
 0.9                 0.9               
 1.0000000000000002  1.0               
 1.1                 1.1               

and yes, it is true that the exact midpoint of 0.3 and 1.1 does not round to 0.7...

@simonbyrne
Copy link
Contributor

Also, it seems that using any step <= 1e-19 causes it to get stuck in a loop.

@StefanKarpinski
Copy link
Member Author

Yeah, that's the kind of thing that's holding me back from merging it. It still has some issue. I also think that for this to be merged, it needs to have a pretty well-defined behavior, so there's that too. But it's getting there and it already behaves better than any other float range implementation that I've seen.

@StefanKarpinski
Copy link
Member Author

Also, it seems that using any step <= 1e-19 causes it to get stuck in a loop.

Just pushed a patch that ought to fix this. I had it in my workspace but forgot to update rat.

@JeffBezanson JeffBezanson added this to the 0.3 milestone Feb 17, 2014
@StefanKarpinski
Copy link
Member Author

This failure [1] is completely mystifying to me. If anyone has any bright ideas as to why this patch would make convert(Type{Float16}, Float32) not get found even though it exists, I'm all ears.

[1] https://travis-ci.org/JuliaLang/julia/jobs/19165376

@kmsquire
Copy link
Member

Looks like the actual error is being masked:

julia> convert(Float16, 1.0f0)
float16(Evaluation succeeded, but an error occurred while showing value of type Float16:
ERROR: no method display(Float16)
 in display at multimedia.jl:158

@JeffBezanson
Copy link
Member

This bug is surely unrelated to this change.

@StefanKarpinski
Copy link
Member Author

The display error isn't actually the issue. I filed an issue for the actual problem: #5885.

StefanKarpinski added a commit that referenced this pull request Feb 24, 2014
These additional operations are needed to get the random and linalg
tests closer to passing for the FloatRange change [#5636]. We really
need to make a decision about how first-class we want Float16 to be.
I'm starting to suspect that we should just bite the bullet and make
all the things that work for other floating-point types work for the
Float16 type also. It's just easier than having them be half-broken
and try to explain that they're "just for storage".
This addresses the core behvaioral problems of #2333 but doesn't yet
hook up the colon syntax to constructing FloatRange objects [#5885].
There's a slight chance that computing `stop` with integers and then
converting to floating point would work but doing to computation in
floating point would not give the correct answer. This tests actual
values – with the correct type – that will be used.
Obviously we don't want to leave things like this, but with this
work around, we can merge the FloatRange branch and figure out the
root cause of #5885 later.
@StefanKarpinski
Copy link
Member Author

Ok, this now finally works. I'm gonna merge it.

StefanKarpinski added a commit that referenced this pull request Feb 24, 2014
FloatRange: new type for accurate floating-point ranges [fix #2333]
@StefanKarpinski StefanKarpinski merged commit 9bf8d96 into master Feb 24, 2014
@StefanKarpinski StefanKarpinski deleted the sk/floatrange branch February 24, 2014 03:05
@JeffBezanson
Copy link
Member

Do we want the "snapping" behavior here?

julia> 1.0:1.0:((0.3 - 0.1) / 0.1)
1.0:1.0:1.0

The old Range and Range1 give 1.0:2.0 here.

@StefanKarpinski
Copy link
Member Author

It's hard to argue that 1.9999999999999998 should be considered equal to 2. I don't think we should try to infer what arbitrary floating-point computations were meant to compute. Taken to its logical conclusion, that would mean that any value could have been meant to be any other value. What's the rule that allows this? Always snap to the nearest integer? The current behavior guesses start, step and stop values that are ratios of 24-bit integers and give the exact floating-point values given. That seems like a reasonable and well-defined place to stop guessing.

@JeffBezanson
Copy link
Member

I can live with that. But the rule is very simple: range lengths are normally rounded down, but if the computed length is within ~3 ulps of the next larger integer it is rounded up instead.

@ivarne
Copy link
Member

ivarne commented Mar 19, 2014

Why 3 and not 4 ulps?

What about very short step lengths 1.:eps():1+10eps() or large values above maxintfloat() 9007199254740990.:1.:9007199254740994?

Should the rounding be relative to the step (eg. round up if (round_error < step/10000))?

@JeffBezanson
Copy link
Member

The rounding is done against (stop-start)/step.

@StefanKarpinski
Copy link
Member Author

I think the point is that all of those things are arbitrary, whereas there is nothing about this FloatRange behavior that is arbitrary (other than limiting the rations to 24-bit integers).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants