Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make covariance and correlation work for iterators, skipmissing in particular. #34

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

pdeffebach
Copy link

@pdeffebach pdeffebach commented Apr 27, 2020

Currently cov and cor fail with iterators that are not vectors, e.g. skipmissing iterators or vectors with Iterators.Filter applied to them. This is part of the plan I have commented on at JuliaLang/julia#35050 (comment) to improve quality of life issues with missings.

Thanks to Missings.skipmissings (JuliaData/Missings.jl#111), this allows computing the correlation without missing values via cor(skipmissings(x, y)...).

Supersedes #30 because it is a more minimal implementation.

Project.toml Outdated
@@ -1,5 +1,4 @@
name = "Statistics"
uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's a hack to make sure julia knows to load this folder, it's described here for Pkg.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally the Travis script does that automatically, so you can revert this: https://github.com/JuliaLang/Statistics.jl/blob/master/.travis.yml#L24

Though you need it to run tests locally.

@pdeffebach
Copy link
Author

pdeffebach commented Apr 27, 2020

I have added the functionality we want and added tests. What's left, assuming what I've written is okay, is to disallow some things that only kind of work at the moment.

X = rand(10, 2);
y = skipmissing(rand(10));

cov(X, y)

The above works, but we can't add any of the vardim arguments that one can for cov(X::Matrix, y::Vector). I can either add the full combinations of all these methods (cov(X::AbstractMatrix, y::Any; vardim) etc.) or we can disallow them for the present.

@pdeffebach pdeffebach changed the title Initial commit, collects everywhere Make covariance and correlation work for iterators, second attempt. Apr 27, 2020

Return the number one.
"""
cor(itr::Any) = one(real(eltype(collect(itr))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better first check whether Base.IteratorEltype(itr) isa Base.HasEltype && isconcrete(eltype(itr)), and in that case avoid calling collect.

Also remove the docstring for AbstractVector below, which is just a special case of this one.

@@ -630,7 +663,7 @@ function cov2cor!(C::AbstractMatrix, xsd::AbstractArray, ysd::AbstractArray)
end

# corzm (non-exported, with centered data)

corzm(x::Any) = corzm(collect(x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark here and for corm as for cor about using the eltype when it's known.


Compute the covariance between the iterators `x` and `y`. If `corrected` is `true` (the
default), computes ``\\frac{1}{n-1}\\sum_{i=1}^n (x_i-\\bar x) (y_i-\\bar y)^*`` where
``*`` denotes the complex conjugate and `n = length(collect(x)) = length(collect(y))`. If `corrected` is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``*`` denotes the complex conjugate and `n = length(collect(x)) = length(collect(y))`. If `corrected` is
``*`` denotes the complex conjugate and ``n`` the number of elements. If `corrected` is

src/Statistics.jl Show resolved Hide resolved

Compute the variance of the iterator `itr`. If `corrected` is `true` (the default) then the sum
is scaled with `n-1`, whereas the sum is scaled with `n` if `corrected` is `false` where
`n = length(collect(itr))`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`n = length(collect(itr))`.
``n`` is the number of elements.

"""
function cov(itr::Any; corrected::Bool=true)
x = collect(itr)
covm(x, mean(x); corrected=corrected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better call covzm directly to avoid an additional copy:

Suggested change
covm(x, mean(x); corrected=corrected)
covzm(map!(t -> t - xmean, x, x); corrected=corrected)

Same for the two-argument method.

@@ -518,16 +519,32 @@ end
# covm (with provided mean)
## Use map(t -> t - xmean, x) instead of x .- xmean to allow for Vector{Vector}
## which can't be handled by broadcast
covm(itr::Any, itrmean; corrected::Bool=true) =
@show covm(collect(itr), itrmean; corrected=corrected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@show covm(collect(itr), itrmean; corrected=corrected)
covm(collect(itr), itrmean; corrected=corrected)

Project.toml Outdated
@@ -1,5 +1,4 @@
name = "Statistics"
uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally the Travis script does that automatically, so you can revert this: https://github.com/JuliaLang/Statistics.jl/blob/master/.travis.yml#L24

Though you need it to run tests locally.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

If these methods give correct results, then it's OK to allow it and only add keyword arguments later if needed. But they should be tested.

Can you also add tests for cov and cor?

src/Statistics.jl Outdated Show resolved Hide resolved
@@ -644,9 +671,10 @@ corzm(x::AbstractMatrix, y::AbstractMatrix, vardim::Int=1) =
cov2cor!(unscaled_covzm(x, y, vardim), sqrt!(sum(abs2, x, dims=vardim)), sqrt!(sum(abs2, y, dims=vardim)))

# corm

corm(x::Any, xmean) = corm(collect(x), xmean)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also apply the eltype check here.

src/Statistics.jl Outdated Show resolved Hide resolved
src/Statistics.jl Show resolved Hide resolved
@pdeffebach
Copy link
Author

I have added many more tests. Everything is covered.

The rules for

cov(X::Matrix, y::itr)

Are that the rows of X must be observations. you can't use dims = 1 in this scenario. I experimented a bit with methods to make it work but I ended up with never-ending method ambiguities because of the implemtation.

I'm not sure how to best put that into a doscstring.

Future PRs should

  1. Allow for columns to be observation with cov(X::Matrix, y::itr)
  2. Allow for iterators of iterators -- collecting them into matrices

src/Statistics.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Show resolved Hide resolved
src/Statistics.jl Outdated Show resolved Hide resolved
@@ -630,7 +653,13 @@ function cov2cor!(C::AbstractMatrix, xsd::AbstractArray, ysd::AbstractArray)
end

# corzm (non-exported, with centered data)

function corzm(itr::Any)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this code in an internal method which will be called by all functions that need it? It's repeated three times.

Also:

Suggested change
function corzm(itr::Any)
function corzm(itr::Any)

src/Statistics.jl Outdated Show resolved Hide resolved
@bkamins
Copy link
Contributor

bkamins commented Apr 30, 2020

  1. If you have two ::Any arguments you always seem to collect both. You should collect them only if they are not AbstractVector I think (the case is if you pass an AbstractVector and an iterator).
  2. Maybe we should think about special casing of AbstractArray here. They are iterable, and for example I think that in many cases allowing the operation on them is problematic (it can lead to unintuitive results) - I guess you wanted to allow iterators that are not AbstractArray right?

@nalimilan
Copy link
Member

If you have two ::Any arguments you always seem to collect both. You should collect them only if they are not AbstractVector I think (the case is if you pass an AbstractVector and an iterator).

The methods mutate the result to subtract the mean, so calling collect is correct here I think unless something is missing.

Maybe we should think about special casing of AbstractArray here. They are iterable, and for example I think that in many cases allowing the operation on them is problematic (it can lead to unintuitive results) - I guess you wanted to allow iterators that are not AbstractArray right?

Yes that's something that bothered me too. Probably only vectors should be allowed to be mixed with other iterators. For other cases it's not clear what should happen so better throw an error for now.

@bkamins
Copy link
Contributor

bkamins commented Apr 30, 2020

The methods mutate the result to subtract the mean

I do not think they always do. Have a look at corm implementation as an example.

@nalimilan
Copy link
Member

Ah right. So we need corm(x::AbstractVector, mx, y::Any, my) and corm(x::Any, mx, y::AbstractVector, my) to avoid a copy.

@pdeffebach
Copy link
Author

I was under the impression that collect was a no-op for vectors. I will add those methods.

@pdeffebach
Copy link
Author

@bkamins you are right about non-allocations. But adding methods results in tons of method ambiguity errors.

To resolve this without re-thinking the whole dispatch scheme, I implemented

_lazycollect(x::Any) = collect(x)
_lazycollect(x::AbstractVector) = x

just in places where we don't modify x. If we do modify it I use collect.

This result feels hacky, but it's better than method ambiguities. This is ready for review, by Milan and hopefully by Triage.

src/Statistics.jl Outdated Show resolved Hide resolved
src/Statistics.jl Outdated Show resolved Hide resolved
src/Statistics.jl Outdated Show resolved Hide resolved
src/Statistics.jl Outdated Show resolved Hide resolved
src/Statistics.jl Outdated Show resolved Hide resolved

corm(cx, mean(cx), cy, mean(cy))
end

"""
cor(x::AbstractVector, y::AbstractVector)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this docstring which is a special case of the previous one.

_lazycollect(x::Any) = collect(x)
_lazycollect(x::AbstractVector) = x

function _matrix_error(x, y, fun)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just throw an error from _lazycollect if passed a matrix? The error message will be less precise but that's not a big deal. And then you can also throw an error for any AbstractArray that isn't an AbstractVector, which is a case which isn't allowed currently and should probably remain an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I would just add a special method to _lazycollect for other types. Actually I would do it for AbstractArray not only AbstractMatrix. The only thing to think about if we want to allow 0-dimensional AbstractArrays (they would produce NaN anyway).

I think that collecting 2 or more dimensional arrays in places where we expect vectors is not useful (but we can discuss this).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't work because we don't use it all the time, for instance when we collect and use map!.

I can add a _collect_if_itr_or_vec method for that scenario.

Co-authored-by: Milan Bouchet-Valat <[email protected]>
@pdeffebach pdeffebach changed the title Make covariance and correlation work for iterators, second attempt. Make covariance and correlation work for iterators, skipmissing in particular. Apr 30, 2020
@bkamins
Copy link
Contributor

bkamins commented Apr 30, 2020

Also many functions in Statistics allow passing any iterator, so it sounds consistent to allow them here too.

Yes, but these "many functions" have also a defined meaningful behaviour for AbstractArray, not only vectors and matrices - and I think this is an important distinction (so I feel it is safer to exclude AbstractArray for now other than vectors and matrices).

Actually I would have preferred a completely different design for cov/cor interface since we have eachcol/eachrow and eachslice functions to tell cov/cor along which dimensions the calculation should be made, but it is too disruptive so probably it would not be accepted anyway (but if you are interested I could write down a proposal).

Now regarding SkipMissings - I think it is needed anyway, as otherwise there is no way to tell cov/cor if we want to do e.g. pairwise or complete observations approach.

@pdeffebach
Copy link
Author

(so I feel it is safer to exclude AbstractArray for now other than vectors and matrices).

Current implementation excludes cov(Any, Matrix) as well as other higher dimensional arrays.

Actually I would have preferred a completely different design for cov/cor interface since we have eachcol/eachrow and eachslice functions to tell cov/cor along which dimensions the calculation should be made,

Yes. The original attempt at #30 was essentially this, working only with iterators. I wold prefer an implementation which doesn't know about matrix inputs and only cares about iterators. Taking advantage of BLAS etc. for X'X could be an implementation detail. But that is very breaking. Perhaps eachrow based workflows will dominate dims = 1 workflows in 2.0.

Now regarding SkipMissings - I think it is needed anyway, as otherwise there is no way to tell cov/cor if we want to do e.g. pairwise or complete observations approach.

This PR is motivated by the new skipmissings (note the s) in Missings.jl, which I think solves this problem.

@bkamins
Copy link
Contributor

bkamins commented Apr 30, 2020

Ah - then I would also prefer iterators as you do 😄. Actually @nalimilan proposed pairwise(fun, iterator) that would do exactly this and fun can be anything (cor/cov in this case).

And I fully agree that having eachrow etc. based design rather than dims is my preference (then we could easily feed any tabular type to such functions, eg. eachcol(data_frame)).

Regarding skipmissings - yes, I have just noticed it was added but not released yet. But how do you plan to handle "pairwise complete observations" with this design cleanly?

@pdeffebach
Copy link
Author

Regarding skipmissings - yes, I have just noticed it was added but not released yet. But how do you plan to handle "pairwise complete observations" with this design cleanly?

Say x contains missing values but y does not. Then

sx, sy = skipmissings(x, y)
cov(sx, sy)

@bkamins
Copy link
Contributor

bkamins commented Apr 30, 2020

But the typical use case is the following setting:

m = [  missing   2          3
       5           missing  10
      11         12           missing
      91         22         15]

and you want to calculate correlation matrix of the columns using "pairwise complete observations".

@pdeffebach
Copy link
Author

There is currently no skipmissing implementation that preserves the dimensionality of an array. So that is an open problem that will have to be solved by additions to skipmissing or an iteration focused cov implementation. (Or turning that matrix into a data frame).

@Keno
Copy link
Contributor

Keno commented Apr 30, 2020

We discussed this a bit on triage and while we don't feel super qualified to comment, we felt that having independent iterators for the two arguments was likely problematic, because iterators don't in general have a strong guarantee over their ordering. E.g. cov(skipmissing(x), skipmissing(y)) would obviously be wrong. We felt that a more sensible API would take a single iterator that iterates over pairs. A related issue here is that the skipmissings API is a bit odd since it returns coupled iterators, but returns them as a tuple. It seemed like a more general design would be to have it return one iterator of pairs and then, if you only need one of them, project it down to the appropriate pair.

@bkamins
Copy link
Contributor

bkamins commented Apr 30, 2020

@Keno - this is exactly what needs to be done in an ideal world in my opinion (and then you can have several rules what you "pass down" to the cor function in this case).

But doing it right would require a significant API change (i.e. be breaking) - so what should be a practical approach to this? Should a breaking PR be put on a table so that it can be judged for inclusion in 2.0 release.

An alternative is to leave cor/cov "as is" (i.e. in particular non-aware of missing) and add a more general pairwise design as proposed by @nalimilan that would be recommended to be use in more complex cases.

What would be the preference here?

@Keno
Copy link
Contributor

Keno commented Apr 30, 2020

I'd say design the interface you want and then figure out whether it's possible to do most of it backwards compatible. If not, stdlibs can also go to 2.0 before base Julia (that's the whole reason why we introduced them).

@bkamins
Copy link
Contributor

bkamins commented Apr 30, 2020

So let me put my proposal on a table (it is essentially what I think @pdeffebach had in mind in #30). I write it for cov as for cor it is the same:

cov(itr; corrected::Bool=true; skipmissing::Symbol=:all)

Here itr is understood to be iterator of iterators to calculate the cov for. The result will be a square Matrix with length(itr) rows and columns where at index position [i,j] we keep covariance of i-th and j-th element of itr. skipmissing kwarg decides how missings should be handled (no handling at all is the default, other options are "complete cases" and "pairwise complete observations", other can be also added if we find it useful).

The second form is:

cov(itr1, itr2; corrected::Bool=true; skipmissing::Symbol=:all)

That does the same but between itr1 and itr2. The result is length(itr1) x length(itr2) matrix.

In this design we can treat everything inside itr as an iterable.

To be less breaking we can keep methods:

cov(X::SparseArrays.SparseMatrixCSC; dims, corrected)
cov(x::AbstractArray{T,1} where T<:Number; corrected)
cov(X::AbstractArray{T,2} where T<:Number; dims, corrected)
cov(x::AbstractArray{T,1} where T<:Number, y::AbstractArray{T,1} where T<:Number; corrected) 
cov(X::Union{AbstractArray{T,1}, AbstractArray{T,2}} where T, Y::Union{AbstractArray{T,1}, AbstractArray{T,2}} where T<:Number; dims, corrected)

(note that I have added <:Number restriction to T which is not present in Statistics.jl now). In this way we do the "sensible thing" in old cases (if collection contains Number it does not have much sense to treat it as an iterable) and at the same provide a general interface.

Alternatively we could allow <:Union{Missing, Number} instead of <:Number if we felt we want to allow missings in the old methods (though - as discussed here it is not super useful, as we will just produce missing in the output).

@pdeffebach
Copy link
Author

Thank you for your comments. Here are my thoughts:

I think it's important to understand the purpose skipmissing serves. A researcher gets a data-set and wites functions on a subset of their data -- one without missing values.

function analyze(x, y)
    x .+ y .-  mean(y)
end

Now they move onto the rest of their data, which now has missing values. They have to go back and change their analyze function to make it work. With current behavior, they don't have to change their function analyze at all. They can call analyze(skipmissings(x, y)...) and be fine. With proposed functionality, they would have to call analyze(unzip(skipmissings(x, y))...), which is not that bad.

So skipmissings should emulate as closely as possible a workflow based off of Vectors without missing data.

That said, if the researcher wrote their analyze function with zips in mind -- iterators of tuples -- then the proposed behavior for skipmissing, which creates an iterator of Tuples, would be intuitive.

Therefore, I would want cov(itr, itr) to work unless we deprecate cov(Vector, Vector). Similarly, I would want skipmissings to return a tuple of iterators unless we feel that the dominant way of working with Vectors is by zipping them.

@nalimilan
Copy link
Member

See previous discussion at JuliaStats/StatsBase.jl#343.

I think there are legitimate use cases for the current design of skipmissings as @pdeffebach noted above. However the question of how cor and cov should alllow skipping missing values (this PR) is semi-independent from that of skipmissings: we can provide a more convenient API for cor and cov but still need skipmissings for other functions which do not offer a convenient way to skip missing values.

Also, allowing to pass any iterator to cov and cor could be useful for other cases than skipping missings. For example, cor((log(x) for x in X), (log(y) for y in Y)) could be used to compute the log correlation without allocating a temporary copy. Though maybe that's not a big need, and if we introduced an AbstractIndexable supertype or a trait for AbstractArray and Broadcasted (JuliaLang/julia#31020 (comment)) we could restrict the signature to it, and one would write instead e.g. @lazy cor(log.(X), log.(Y)).

Now, regarding the cor and cov API, @bkamins's proposal to add a skipmissing keyword argument is one solution. But it doesn't address the very basic case that this PR is about, which is to compute the correlation between just two vectors (as opposed to pairwise correlation between multiple variables). More generally, I don't think we can drop the current behavior of cor(::Vector, ::Vector), and having both this method and cor(itr1, itr2) would be confusing (if we keep it in the long term).

So an alternative I had in mind is to introduce pairwise(fun, itr1[, itr2]; skipmissing) to compute pairwise correlation between multiple variables, which would be called as e.g. pairwise(cor, eachcol(X), skipmissing=:obs) or pairwise(cor, eachcol(X), eachcol(Y), skipmissing=:complete). The advantage is that it would also work for Distances.jl, which has a compatible API. The drawback is that the former is relatively verbose for a very common operation, if you compare with R's cor(X, use="complete") or Stata's pwcorr, casewise. So we could allow cor(X, skipmissing=:complete) for convenience.

Probably this shouldn't be discussed here... :-) BTW, another design challenge is to allow combining this with weights to allow computing weighted pairwise correlation while skipping missing values. Composability would be great to have in that case.

@@ -504,6 +516,10 @@ function covzm(x::AbstractMatrix, vardim::Int=1; corrected::Bool=true)
A .= A .* b
return A
end
function covzm(x::Any, y::Any; corrected::Bool = true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case covzm can get Any? It is an internal method and I thought it can only get already processed data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same question applies to covm below.

"""
function cov(itr::Any; corrected::Bool=true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to allow 0 or more than 2 dimensional arrays here?


Return the number one.
"""
cor(itr::Any) = _return_one(itr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we touch this part of code then I do not understand the following (this is in general a separate PR, but we implement this method here, so I would like to clarify what is intended):

julia> x = rand(10, 2)
10×2 Array{Float64,2}:
 0.281236  0.0338547
 0.691944  0.830649
 0.627939  0.62187
 0.251539  0.162161
 0.649065  0.627302
 0.67754   0.227709
 0.904292  0.481443
 0.768511  0.439196
 0.56268   0.885131
 0.520348  0.0185026

julia> y = collect(eachrow(x))
10-element Array{SubArray{Float64,1,Array{Float64,2},Tuple{Int64,Base.Slice{Base.OneTo{Int64}}},true},1}:
 [0.28123641854321346, 0.03385469973171129]
 [0.6919437304763723, 0.8306494675149141]
 [0.6279387501997036, 0.6218700109315964]
 [0.2515386883173856, 0.16216070470557398]
 [0.6490648176650291, 0.6273015737594985]
 [0.6775400549397448, 0.22770867432380815]
 [0.9042917317032804, 0.4814426050177454]
 [0.7685107010600403, 0.4391959677108144]
 [0.562680390776801, 0.8851305746997942]
 [0.5203479821836228, 0.018502575073925165]

julia> cov(x)
2×2 Array{Float64,2}:
 0.0409994  0.0321085
 0.0321085  0.0983311

julia> cov(y)
2×2 Array{Float64,2}:
 0.0409994  0.0321085
 0.0321085  0.0983311

julia> cor(x)
2×2 Array{Float64,2}:
 1.0       0.505691
 0.505691  1.0

julia> cor(y)
ERROR: MethodError: no method matching zero(::Type{SubArray{Float64,1,Array{Float64,2},Tuple{Int64,Base.Slice{Base.OneTo{Int64}}},true}})

and I do not understand why cov and cor behave differently in how they handle x and y.

@bkamins
Copy link
Contributor

bkamins commented May 2, 2020

OK - if we want to go forward with the API proposal in this PR I have left some comments related only to it.

@bkamins
Copy link
Contributor

bkamins commented May 2, 2020

Ah - and a general question, since cov accepts Vector{Vector} as a single argument and with new design "iterable of Vector" is accepted then do we want to accept "iterable of iterables"?

@CameronBieganek
Copy link

We discussed this a bit on triage [...] We felt that a more sensible API would take a single iterator that iterates over pairs.

I just became aware of this method of cor:

"""
cor(x::AbstractVector)

Return the number one.
"""

This basically torpedoes any idea of having a cor(itr) method where itr iterates individual observations. If we added that, then the following two calculations would return different numbers:

itr = ((1, 4), (3, 2), (5, 8), (7, 6))
cor(itr)
itr = [(1, 4), (3, 2), (5, 8), (7, 6)]
cor(itr)

Given this unfortunate situation, perhaps triage would reconsider allowing cor(itr1, itr2)? (I mean where itr1 and itr2 just iterate numbers.)

@aplavin
Copy link
Contributor

aplavin commented Dec 2, 2023

stdlibs can also go to 2.0 before base Julia (that's the whole reason why we introduced them)

So, maybe removing single-argument cor(x) could be done in Statistics 2.0 in a reasonable timeframe? Pretty sure there are other inconsistencies that can be fixed, or improvements can be made, that are breaking and would fit 2.0 nicely.

For functions where arguments are fundamentally coupled (like cor(x, y)), it does make most sense to accept an iterable/collection of pairs instead of a pair of collections. And luckily, Julia has both convenient and zero-cost ways to create one iterator/collection/array from two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants