Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing values support #30

Open
pdeffebach opened this issue Aug 15, 2018 · 10 comments
Open

Missing values support #30

pdeffebach opened this issue Aug 15, 2018 · 10 comments

Comments

@pdeffebach
Copy link

I tested out this package on 0.7 with missings and got the following error:

I think this is because Statistics is not imported by NaNMath, so the function calls to NaNMath.mean can't fall base on Statistics.mean.

I think the solution here is to add import Statistics and add fall backs for operations of type Any so those are called when there are missing values. Does that seem feasible?

nm.mean([1.,2., missing])
> ERROR: WARNING: Base.mean is deprecated: it has been moved to the standard library package `Statistics`.
Add `using Statistics` to your imports.
 in module Base
MethodError: no method matching mean(::Array{Union{Missing, Float64},1})
You may have intended to import Base.mean
Closest candidates are:
  mean(::AbstractArray{T<:AbstractFloat,N} where N) where T<:AbstractFloat at /Users/peterdeffebach/.julia/packages/NaNMath/pEda/src/NaNMath.jl:167
Stacktrace:
 [1] top-level scope at none:0
@schnirz
Copy link

schnirz commented Aug 15, 2018

I encountered a similar issue when using this package with missing values in 0.6.4:

julia> versioninfo()
Julia Version 0.6.4
Commit 9d11f62bcb (2018-07-09 19:09 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=16)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

julia> using NaNMath

julia> NaNMath.log(-100)
NaN

julia> NaNMath.log(missing)
ERROR: MethodError: no method matching log(::Missings.Missing)
You may have intended to import Base.log
Closest candidates are:
  log(::Float32) at /Users/martink/.julia/v0.6/NaNMath/src/NaNMath.jl:12
  log(::Float64) at /Users/martink/.julia/v0.6/NaNMath/src/NaNMath.jl:11
  log(::Real) at /Users/martink/.julia/v0.6/NaNMath/src/NaNMath.jl:13
  ...

julia> Base.log(missing)
missing

Would it make sense to define functions in this packages to return missing for missing input values? This would make sense to me, as missing and NaN here represent two different scenarios: NaN stands for invalid input (e.g., negative values for log), whereas missing stands for entirely absent input.

log here could be defined for Union{Missing, Float64} input types and have a simple ismissing check within its definition. I'm happy to work on this if this is a desired feature of the package.

@mlubin
Copy link
Collaborator

mlubin commented Aug 18, 2018

I see how this is useful, but handling missing extends beyond the problem that this package was designed to address and that I have interest in maintaining. Maybe MissingNaNMath is in order? :)

@pdeffebach
Copy link
Author

Fair enough!

@mkborregaard
Copy link
Contributor

mkborregaard commented Aug 18, 2018

But - given that missing is part of Base Julia, is there any use for a numerical calculation package that does not support it? That sounds quite dangerous.

@mlubin
Copy link
Collaborator

mlubin commented Aug 18, 2018

is there any use for a numerical calculation package that does not support it? That sounds quite dangerous.

This comment is a bit hyperbolic. The package is already being used by upstream AD packages like ForwardDiff and JuMP (https://github.com/JuliaOpt/JuMP.jl/blob/82e998f72e849f7a5f8c264accb41d9eafbcc845/src/Derivatives/Derivatives.jl#L15) for well-defined purposes that currently don't require any support for missing.

@mkborregaard
Copy link
Contributor

mkborregaard commented Aug 18, 2018

I see you're right this package is still equally useful in that context. I guess I'm just disappointed as I thought this could be a more broadly useful package, and most packages that would depend on this will need to be able to support all the numerical functionality defined in Base for these functions.

My suggestion that it might be "dangerous" came from the notion that users would expect Base.mean and NaNMath.mean to behave similarly except in their treatment of NaN.

@pdeffebach
Copy link
Author

One solution that is missing-agnostic is to have fallbacks for non-float types that revert to the functions defined in Base. But you might want to throw errors when non-floats are used given the purpose of NaNMath.

@mlubin
Copy link
Collaborator

mlubin commented Aug 18, 2018

I view a MethodError as pretty low on the dangerousness scale. Silently not handling NaNs as promised is more dangerous, e.g., #21 (comment).

Anyway I'm willing to share maintainership if there's interest in making the package more useful for additional applications.

@mkborregaard
Copy link
Contributor

You're right. After looking into it I'm also less certain that the right approach is to change this to treat missing like NaN. Sorry for the noise.

@nalimilan
Copy link

nalimilan commented Oct 4, 2018

It would make sense to accept arrays iterables of any element type though, and only throw an error when actually encountering a value of an unsupported type. That's what Statistics.mean does, and it's useful if you want to pass skipmissing(x) to NaNMath.mean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants