FloatRange: new type for accurate floating-point ranges [fix #2333] #5636

StefanKarpinski · 2014-02-01T02:08:30Z

This algorithm for floating-point ranges magically reads your mind.

It defaults to the plain [ r.start + k*r.step for k = 0:r.len ] behavior by setting len = s and divisor = 1.
It attempts to "lift" the iteration to a level where the step is integral by rationalizing s, using a customized rationalize function.
If the lifted form matches the original (start,step,stop) tuple, then we use lifted iteration, stepping by an integer amount and projecting back down to get a perfect floating point range. Otherwise, it falls through to the default behavior.
Once the FloatRange parameters are determined, iteration and indexing are quite simple: the values of the range are [ (r.start + k*r.step)/r.divisor for k = 0:r.len ].

JeffBezanson · 2014-02-01T02:37:24Z

Very cool! Does this generally give the same values as linspace? I don't know if that's necessary, just wondering.

timholy · 2014-02-01T11:05:08Z

Can we have a few more such algorithms? I'm thinking of functions like analyzemydataforme and plotmydatasoitlookscool. Thanks in advance!

jiahao · 2014-02-01T14:44:51Z

+1 for writemypaper

kmsquire · 2014-02-01T15:16:34Z

+1 for writemypaper

I think that's in the forthcoming VirtualPostDoc package that Stefan is writing.

StefanKarpinski · 2014-02-01T15:42:40Z

@JeffBezanson, re matching linspace – no. For example:

julia> Base.showcompact(io::IO, x::Float64) = show(io,x)
showcompact (generic function with 8 methods)

julia> [0.3:0.1:1.1 linspace(0.3,1.1,9)]
9x2 Array{Float64,2}:
 0.3  0.3
 0.4  0.4
 0.5  0.5
 0.6  0.6000000000000001
 0.7  0.7000000000000001
 0.8  0.8
 0.9  0.9
 1.0  1.0000000000000002
 1.1  1.1

As you can see, even though linspace is better than the naïve approach, the cancellation doesn't always work out the way you'd like. This was a bit of an epiphany for me when working on this: the sliding average between the beginning and the end is a bit of a red herring – if you get the lifting right, you're working entirely with integral values and it becomes unnecessary.

JeffBezanson · 2014-02-01T18:10:01Z

Maybe linspace should be implemented using ranges after this.
On Feb 1, 2014 10:42 AM, "Stefan Karpinski" [email protected]
wrote:

@JeffBezanson https://github.com/JeffBezanson, re matching linspace- no. For example:

julia> Base.showcompact(io::IO, x::Float64) = show(io,x)showcompact (generic function with 8 methods)
julia> [0.3:0.1:1.1 linspace(0.3,1.1,9)]9x2 Array{Float64,2}:
0.3 0.3
0.4 0.4
0.5 0.5
0.6 0.6000000000000001
0.7 0.7000000000000001
0.8 0.8
0.9 0.9
1.0 1.0000000000000002
1.1 1.1

As you can see, even though linspace is better than the naïve approach,
the cancellation doesn't always work out the way you'd like. This was a bit
of an epiphany for me when working on this: the sliding average between the
beginning and the end is a bit of a red herring - if you get the lifting
right, you're working entirely with integral values and it becomes
unnecessary.

Reply to this email directly or view it on GitHubhttps://github.com//pull/5636#issuecomment-33874713
.

StefanKarpinski · 2014-02-01T18:13:12Z

No, I think linspace has a pretty clear and well-defined behavior that we should respect. We may, however want to take a look at how the linear interpolation is done. For example, I wonder if doing the division by n after adding the two halves wouldn't fix some cancellation issues.

JeffBezanson · 2014-02-01T18:34:41Z

@timholy analyzemydataforme already exists; it's called svd :-P

timholy · 2014-02-01T18:39:07Z

Man, I'd looked everywhere for that function. Thanks!

stevengj · 2014-02-01T20:00:28Z

Since the next{T}(r::FloatRange{T}, i) uses a floating-point division, is the performance difference compared to start+i*step noticeable for simple loops?

StefanKarpinski · 2014-02-01T21:22:10Z

I haven't looked at performance at all, so I'm not sure. I would imagine it's a bit slower.

stevengj · 2014-02-03T22:27:42Z

function loop1(start,step,n)
    sum = 0.0
    for i = 0:n-1
        sum += start + i*step
    end
    sum
end
function loop2(start,step,divisor,n)
    sum = 0.0
    for i = 0:n-1
        sum += (start + i*step)/divisor
    end
    sum
end
@time loop1(0.0,0.001,10^7)
@time loop2(0.0,1.0,1000.0,10^7);

gives

elapsed time: 0.009202138 seconds (80 bytes allocated)
elapsed time: 0.067068429 seconds (80 bytes allocated)

on my computer: about a factor of 7 penalty for the divisions in a simple loop.

StefanKarpinski · 2014-02-03T22:44:51Z

Yeah, that seems about right. Is it worth it for more accurate/intuitive results? Keep in mind that floating-point ranges with a unit step don't have to pay the price.

JeffBezanson · 2014-02-03T23:29:58Z

A unit step is probably not the common case for float ranges.
On Feb 3, 2014 5:44 PM, "Stefan Karpinski" [email protected] wrote:

Yeah, that seems about right. Is it worth it for more accurate/intuitive
results? Keep in mind that floating-point ranges with a unit step don't
have to pay the price.

Reply to this email directly or view it on GitHubhttps://github.com//pull/5636#issuecomment-34009907
.

StefanKarpinski · 2014-02-04T00:27:31Z

That's quite true. So the question is whether we care more about indexing performance for float ranges or accuracy/intuitiveness. Note that using linspace has similar performance issues.

ivarne · 2014-02-04T00:33:31Z

I really like the more accurate ranges, and think the performance penalty is well worth it. Getting the wrong number of iterations in a loop because of roundoff errors, is surprising to most beginners and also for lots of more seasoned programmers. I have not figured out is why we want to use step_length instead of num_steps for a floating point range, but integer range similarity and Matlab compatibility is probably a good enough reason.

There is a fine balance for where we should try to do what the user intended, and where we should give single instruction performance. The overflow behaviour for integer is an example where we (for now) have opted for the performant "non obvious" choice.

kmsquire · 2014-02-04T01:11:04Z

I agree with Ivar in that I like the more accurate ranges. If this became
a bottleneck for someone, a FastRange (or FastButInaccurateRange) type
could always be introduced (perhaps in a package).

Kevin

On Mon, Feb 3, 2014 at 4:33 PM, Ivar Nesje [email protected] wrote:

I really like the more accurate ranges, and think the performance penalty
is well worth it. Getting the wrong number of iterations in a loop because
of roundoff errors, is surprising to most beginners and also for lots of
more seasoned programmers. I have not figured out is why we want to use
step_length instead of num_steps for a floating point range, but integer
range similarity and Matlab compatibility is probably a good enough reason.

There is a fine balance for where we should try to do what the user
intended, and where we should give single instruction performance. The
overflow behaviour for integer is an example where we (for now) have opted
for the performant "non obvious" choice.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/5636#issuecomment-34018364
.

ivarne · 2014-02-04T01:19:20Z

FastButInaccurateRange is more characters to write than to write the naive algorithm

(a, b, c) = (0.0, 0.1, 1.0)
for i in FastButInaccurateRange(a, b, c)
    #use i
end
# or
for ind in 1:int(c / b)
    i = a + b*ind
    #use i
end

StefanKarpinski · 2014-02-04T01:31:49Z

You don't even need any of that – you can just explicitly construct a regular Range object whose fields are floats. You just don't get to use the colon syntax to construct it.

kmsquire · 2014-02-04T01:47:16Z

I hadn't noticed that you created a new type...

StefanKarpinski · 2014-02-04T01:50:05Z

Yes. Ultimately floating-point ranges and "ordinal ranges" just shouldn't be handled in the same way. This is work towards solving the rangepocalypse umbrella issue.

nalimilan · 2014-02-04T11:01:05Z

This really looks beautiful!

@StefanKarpinski, why do you think linspace() shouldn't give the same results? Are there cases where the algorithm you wrote for ranges would give annoying results for linspace()? (On R mailing lists, every month somebody comes asking why e.g. 0.3 %in% seq(0.1, 1.0, length.out=10) is FALSE. Even people with basic knowledge about the fact that floating point computations are not exact get trapped.)

StefanKarpinski · 2014-02-04T21:58:27Z

I suppose that linspace could be made to work like this as well, but it's a bit stranger because linspace has such a clearly defined behavior: give me n linearly interpolated points between this start value and this end value.

JeffBezanson · 2014-02-04T22:36:54Z

This range implementation is great, but a 7x slowdown does give me pause. It shouldn't be possible to beat the standard library by that much with such a simple-but-annoying rewrite. We don't want to have a large catalog of faster-version-of-X types of things.

StefanKarpinski · 2014-02-04T22:48:52Z

I wonder how many situations there really are where floating-point range indexing is a bottleneck. Seems like a very strange thing to have as a bottleneck to me.

JeffBezanson · 2014-02-04T23:44:10Z

Yes, I doubt it would be a bottleneck by itself, but what can happen is you might use 10 pieces of the language/library, collectively using 60% of run time, and each piece is 4-5x slower than it could be. It starts to make a big dent if we do this in several key components. Some more realistic benchmarks would be helpful though.

StefanKarpinski · 2014-02-05T18:16:46Z

As much as we don't want a large catalog of "faster-version-of-X types of things", we also don't want a large catalog of "correct-version-of-X types of things". In the case of floating-point ranges, getting unexpected values is a perennial problem, whereas I've just never encountered a situation where I really cared that much about the performance of indexing into or iterating over them. That doesn't mean it's not an issue, but it's just not clear to me that this is a place where we should prioritize performance over ease-of-use.

stevengj · 2014-02-05T20:09:03Z

For what it's worth, I tend to agree with @StefanKarpinski that the tradeoff is worth it here. Otherwise roundoff errors make floating-point ranges so annoying that I end up just not using them.

GunnarFarneback · 2014-02-05T20:59:16Z

I agree too. This is just the place where you want magic, even if it's somewhat slow magic.

simonbyrne · 2014-02-06T12:06:42Z

While this is a great idea, I think we should be very careful when using the terms "correct" and "accurate": these ranges are really "the best rounded, rational approximation to the range that you specified".

Arguably, a linspace-type interpretation (n equally spaced points, where n is chosen such that the actual step is closest to the step that you specified) would be an equally valid interpretation of correctness in this case.

On a related note, our linspace isn't strictly correct either:

julia> Base.showcompact(io::IO, x::Float64) = show(io,x)
showcompact (generic function with 8 methods)

julia> [linspace(0.3,1.1,9) float64(linspace(big(0.3),big(1.1),9))]
9x2 Array{Float64,2}:
 0.3                 0.3               
 0.4                 0.4               
 0.5                 0.5               
 0.6000000000000001  0.6               
 0.7000000000000001  0.7000000000000001
 0.8                 0.8               
 0.9                 0.9               
 1.0000000000000002  1.0               
 1.1                 1.1

and yes, it is true that the exact midpoint of 0.3 and 1.1 does not round to 0.7...

simonbyrne · 2014-02-06T12:19:30Z

Also, it seems that using any step <= 1e-19 causes it to get stuck in a loop.

StefanKarpinski · 2014-02-06T22:13:33Z

Yeah, that's the kind of thing that's holding me back from merging it. It still has some issue. I also think that for this to be merged, it needs to have a pretty well-defined behavior, so there's that too. But it's getting there and it already behaves better than any other float range implementation that I've seen.

StefanKarpinski · 2014-02-06T22:16:05Z

Also, it seems that using any step <= 1e-19 causes it to get stuck in a loop.

Just pushed a patch that ought to fix this. I had it in my workspace but forgot to update rat.

StefanKarpinski · 2014-02-19T15:34:04Z

This failure [1] is completely mystifying to me. If anyone has any bright ideas as to why this patch would make convert(Type{Float16}, Float32) not get found even though it exists, I'm all ears.

[1] https://travis-ci.org/JuliaLang/julia/jobs/19165376

kmsquire · 2014-02-21T05:41:52Z

Looks like the actual error is being masked:

julia> convert(Float16, 1.0f0)
float16(Evaluation succeeded, but an error occurred while showing value of type Float16:
ERROR: no method display(Float16)
 in display at multimedia.jl:158

JeffBezanson · 2014-02-21T05:43:51Z

This bug is surely unrelated to this change.

StefanKarpinski · 2014-02-21T19:14:44Z

The display error isn't actually the issue. I filed an issue for the actual problem: #5885.

These additional operations are needed to get the random and linalg tests closer to passing for the FloatRange change [#5636]. We really need to make a decision about how first-class we want Float16 to be. I'm starting to suspect that we should just bite the bullet and make all the things that work for other floating-point types work for the Float16 type also. It's just easier than having them be half-broken and try to explain that they're "just for storage".

This addresses the core behvaioral problems of #2333 but doesn't yet hook up the colon syntax to constructing FloatRange objects [#5885].

There's a slight chance that computing `stop` with integers and then converting to floating point would work but doing to computation in floating point would not give the correct answer. This tests actual values – with the correct type – that will be used.

Obviously we don't want to leave things like this, but with this work around, we can merge the FloatRange branch and figure out the root cause of #5885 later.

StefanKarpinski · 2014-02-24T03:05:14Z

Ok, this now finally works. I'm gonna merge it.

FloatRange: new type for accurate floating-point ranges [fix #2333]

JeffBezanson · 2014-03-19T07:12:11Z

Do we want the "snapping" behavior here?

julia> 1.0:1.0:((0.3 - 0.1) / 0.1)
1.0:1.0:1.0

The old Range and Range1 give 1.0:2.0 here.

StefanKarpinski · 2014-03-19T12:51:33Z

It's hard to argue that 1.9999999999999998 should be considered equal to 2. I don't think we should try to infer what arbitrary floating-point computations were meant to compute. Taken to its logical conclusion, that would mean that any value could have been meant to be any other value. What's the rule that allows this? Always snap to the nearest integer? The current behavior guesses start, step and stop values that are ratios of 24-bit integers and give the exact floating-point values given. That seems like a reasonable and well-defined place to stop guessing.

JeffBezanson · 2014-03-19T16:20:49Z

I can live with that. But the rule is very simple: range lengths are normally rounded down, but if the computed length is within ~3 ulps of the next larger integer it is rounded up instead.

ivarne · 2014-03-19T16:46:00Z

Why 3 and not 4 ulps?

What about very short step lengths 1.:eps():1+10eps() or large values above maxintfloat() 9007199254740990.:1.:9007199254740994?

Should the rounding be relative to the step (eg. round up if (round_error < step/10000))?

JeffBezanson · 2014-03-19T16:50:22Z

The rounding is done against (stop-start)/step.

StefanKarpinski · 2014-03-19T19:03:09Z

I think the point is that all of those things are arbitrary, whereas there is nothing about this FloatRange behavior that is arbitrary (other than limiting the rations to 24-bit integers).

StefanKarpinski mentioned this pull request Feb 1, 2014

the rangepocalypse #5585

Closed

JeffBezanson added this to the 0.3 milestone Feb 17, 2014

StefanKarpinski mentioned this pull request Feb 20, 2014

hashing of ranges is awful #5778

Closed

StefanKarpinski mentioned this pull request Feb 21, 2014

convert(Type{Float16}, Float32) bug #5885

Closed

StefanKarpinski added 4 commits February 23, 2014 21:43

FloatRange type: correct/intuitive floating-point ranges (no syntax).

47a3447

This addresses the core behvaioral problems of #2333 but doesn't yet hook up the colon syntax to constructing FloatRange objects [#5885].

FloatRange: hook new type up to colon syntax [fixes #2333].

f4905d9

FloatRange: work around for #5885 – temporarily move colon defintion

75efb44

Obviously we don't want to leave things like this, but with this work around, we can merge the FloatRange branch and figure out the root cause of #5885 later.

StefanKarpinski added a commit that referenced this pull request Feb 24, 2014

Merge pull request #5636 from JuliaLang/sk/floatrange

9bf8d96

FloatRange: new type for accurate floating-point ranges [fix #2333]

StefanKarpinski merged commit 9bf8d96 into master Feb 24, 2014

StefanKarpinski deleted the sk/floatrange branch February 24, 2014 03:05

StefanKarpinski added a commit that referenced this pull request Feb 24, 2014

NEWS: add note about new well-behaved FloatRange type [#5636, #2333]

4e84913

pao mentioned this pull request Apr 23, 2014

FloatRange not exported #6606

Closed

joshkarges mentioned this pull request Aug 10, 2015

FloatRange floating point error #12547

Closed

pablosanjose mentioned this pull request Oct 27, 2017

Sequences aren’t computed properly #24357

Closed

FloatRange: new type for accurate floating-point ranges [fix #2333] #5636

FloatRange: new type for accurate floating-point ranges [fix #2333] #5636

Conversation

StefanKarpinski commented Feb 1, 2014

JeffBezanson commented Feb 1, 2014

timholy commented Feb 1, 2014

jiahao commented Feb 1, 2014

kmsquire commented Feb 1, 2014

StefanKarpinski commented Feb 1, 2014

JeffBezanson commented Feb 1, 2014

StefanKarpinski commented Feb 1, 2014

JeffBezanson commented Feb 1, 2014

timholy commented Feb 1, 2014

stevengj commented Feb 1, 2014

StefanKarpinski commented Feb 1, 2014

stevengj commented Feb 3, 2014

StefanKarpinski commented Feb 3, 2014

JeffBezanson commented Feb 3, 2014

StefanKarpinski commented Feb 4, 2014

ivarne commented Feb 4, 2014

kmsquire commented Feb 4, 2014

ivarne commented Feb 4, 2014

StefanKarpinski commented Feb 4, 2014

kmsquire commented Feb 4, 2014

StefanKarpinski commented Feb 4, 2014

nalimilan commented Feb 4, 2014

StefanKarpinski commented Feb 4, 2014

JeffBezanson commented Feb 4, 2014

StefanKarpinski commented Feb 4, 2014

JeffBezanson commented Feb 4, 2014

StefanKarpinski commented Feb 5, 2014

stevengj commented Feb 5, 2014

GunnarFarneback commented Feb 5, 2014

simonbyrne commented Feb 6, 2014

simonbyrne commented Feb 6, 2014

StefanKarpinski commented Feb 6, 2014

StefanKarpinski commented Feb 6, 2014

StefanKarpinski commented Feb 19, 2014

kmsquire commented Feb 21, 2014

JeffBezanson commented Feb 21, 2014

StefanKarpinski commented Feb 21, 2014

StefanKarpinski commented Feb 24, 2014

JeffBezanson commented Mar 19, 2014

StefanKarpinski commented Mar 19, 2014

JeffBezanson commented Mar 19, 2014

ivarne commented Mar 19, 2014

JeffBezanson commented Mar 19, 2014

StefanKarpinski commented Mar 19, 2014