-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Folded Distributions #1631
base: master
Are you sure you want to change the base?
Folded Distributions #1631
Conversation
Added all the functions necessary to make Folded Distributions work for univariate continuous distributions. Need to figure out how to make this work for discrete distributions
This is a brilliant incredibly useful idea!!! |
Thank you so much! Hoping this can get merged soon 🤞 |
Hi! Just wanted to get an update on this if this is good enough to merge - I think it's ready for review from my end. |
Hey @Potatoasad, I had just hacked together a |
``` | ||
as a quick check we can see if the 1000 samples generated from this are all greater than 0: | ||
```julia | ||
all(rand(folded(Lapace(μ,b),0),1000) .≥ 0) # -> true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all(rand(folded(Lapace(μ,b),0),1000) .≥ 0) # -> true | |
all(≥(0), rand(folded_laplace, 1000)) # -> true |
I think this is easier to read; it's also more efficient, since it avoids allocating an intermediate array.
src/fold.jl
Outdated
@@ -0,0 +1,320 @@ | |||
""" | |||
folded(d::ContinuousUnivariateDistribution, crease::Real) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
folded(d::ContinuousUnivariateDistribution, crease::Real) |
Not necessary to document the no-keyword method, since the keyword method already covers it (by showing the default value).
|
||
|
||
##### Defining the function and the initialization functions ##### | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should follow the Blue style guide for docstrings--see here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. JuliaStats does not use Blue style. One should just be consistent with the existing style until the org decides on a specific style. It does not matter anyway as all these docstrings should be removed (see my review).
""" | ||
struct Folded{D<:ContinuousUnivariateDistribution, K<:Truncated, S<:Continuous, T <: Real} <: ContinuousUnivariateDistribution | ||
original::D # the original distribution (unfolded) | ||
included::K # The part of the old distribution left unchanged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the names here. I feel like I'd have a hard time remembering included
/excluded
😅
folded(d::ContinuousUnivariateDistribution, crease::Real) | ||
folded(d::ContinuousUnivariateDistribution, crease::Real, keep_right=false) | ||
|
||
Creates a _folded distribution_ from the original distribution `d` about the value `crease`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creates a _folded distribution_ from the original distribution `d` about the value `crease`. | |
Fold a distribution `d` at the value `crease`. |
# >> MethodError: no method matching iterate(::Truncated{Beta{...}...}) | ||
``` | ||
""" | ||
mean(d::Folded) = mean(d.included)*d.included.tp - mean(d.excluded)*d.excluded.tp + 2*d.crease |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add comments or expand this out? It's not immediately obvious to me why this is correct.
end | ||
|
||
# Testing if the insupport function works | ||
@testset "Folded insupport, minimum and maximum functions" begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more tests? We should be covering at least mean
, var
, median
, and at least two different quantiles.
@test all(((x->Distributions.maximum(folded(Normal(2,1),x,keep_right=false))).(-10:0.1:10)) .≈ collect(-10:0.1:10)) | ||
end | ||
|
||
@testset "Folded Rand functions" begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as for the above--we need tests that check if rand
is working correctly, e.g. testing if a big sample from a folded normal distribution has the right distribution.
@testset "Folded pdf, logpdf, cdf and logcdf functions" begin | ||
@test pdf(truncated(Normal(3,4),-1,1),-0.1) + pdf(truncated(Normal(3,4),-1,1),0.1) ≈ pdf(folded(truncated(Normal(3,4),-1,1),0.0),0.1) | ||
@test pdf(folded(truncated(Normal(3,4),-1,1),0.0),-0.1) == 0 | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@testset "Folded pdf, logpdf, cdf and logcdf functions" begin | |
@test pdf(truncated(Normal(3,4),-1,1),-0.1) + pdf(truncated(Normal(3,4),-1,1),0.1) ≈ pdf(folded(truncated(Normal(3,4),-1,1),0.0),0.1) | |
@test pdf(folded(truncated(Normal(3,4),-1,1),0.0),-0.1) == 0 | |
end | |
@testset "Folded pdf, logpdf, cdf and logcdf functions" begin | |
trunc_dist = truncated(Normal(3, 4), -1, 1) | |
fold_dist = folded(trunc_dist, 0.0) | |
@test pdf(truncated(trunc_dist, -0.1) + pdf(trunc_dist, 0.1) ≈ pdf(fold_dist, 0.1) | |
@test pdf(fold_dist, -0.1) ≈ 0 | |
end |
@test all(rand(folded(Normal(1.0,1.0),0.0),100) .> 0.0) | ||
end | ||
|
||
@testset "Folded pdf, logpdf, cdf and logcdf functions" begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add tests that deal with cases other than folding at 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR! I added some initial comments.
|
||
# This is the user facing initialization function that will be used in most cases | ||
""" | ||
folded(d::ContinuousUnivariateDistribution, crease::T; keep_right=true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
folded(d::ContinuousUnivariateDistribution, crease::T; keep_right=true) | |
folded(d::ContinuousUnivariateDistribution, crease::Real; keep_right::Bool=true) |
@@ -0,0 +1,320 @@ | |||
""" | |||
folded(d::ContinuousUnivariateDistribution, crease::Real) | |||
folded(d::ContinuousUnivariateDistribution, crease::Real, keep_right=false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
folded(d::ContinuousUnivariateDistribution, crease::Real, keep_right=false) | |
folded(d::ContinuousUnivariateDistribution, crease::Real; keep_right::Bool=true) |
Maybe a (better?) name for the keyword argument would be
folded(d::ContinuousUnivariateDistribution, crease::Real, keep_right=false) | |
folded(d::ContinuousUnivariateDistribution, crease::Real; above::Bool=true) |
or
folded(d::ContinuousUnivariateDistribution, crease::Real, keep_right=false) | |
folded(d::ContinuousUnivariateDistribution, crease::Real; support_above::Bool=true) |
Holds the original distribution, and two truncated copies of the | ||
distribution: one above the crease and one below. | ||
""" | ||
struct Folded{D<:ContinuousUnivariateDistribution, K<:Truncated, S<:Continuous, T <: Real} <: ContinuousUnivariateDistribution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to restrict K
to subtypes of Truncated
? S
is not needed?
struct Folded{D<:ContinuousUnivariateDistribution, K<:Truncated, S<:Continuous, T <: Real} <: ContinuousUnivariateDistribution | |
struct Folded{D<:ContinuousUnivariateDistribution, K, T <: Real} <: ContinuousUnivariateDistribution |
""" | ||
function cdf(d::Folded, x::Real) | ||
if !insupport(d,x) | ||
(x ≤ minimum(d)) && return zero(x) # Is below the support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again type unstable. Again best to always compute the result (possibly with adjusted x
) below. Maybe it's already sufficient to just perform the evaluation for clamp(x, extrema(d)...)
.
(x ≤ minimum(d)) && return zero(x) # Is below the support | ||
(x ≥ maximum(d)) && return one(x) # Is above the support | ||
end | ||
d.included.tp*cdf(d.included, x) + d.excluded.tp*ccdf(d.excluded, unfold_value(x,d)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d.included.tp does not exist for general truncated distributions. Could we use something like
d.included.tp*cdf(d.included, x) + d.excluded.tp*ccdf(d.excluded, unfold_value(x,d)) | |
cdf(d::Folded, x::Real) = exp(logcdf(d, x)) # not sure, maybe the default definition already? | |
function logcdf(d::Folded, x::Real) | |
_x = clamp(x, extrema(d.included)...) | |
y = unfolded_value(d, _x) | |
a, b = d.keep_right ? (y, _x) : (_x, y) | |
return logdiffcdf(d.original, b, a) | |
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding, by looking at the source code was that the Truncated object has a field tp
in its definition, and having this would prevent recalculation each time.
Is there a way to infer from the types whether a truncated distribution has the tp
field? So if it doesn't exist for a particular type, we could dispatch to the code you provided?
Distributions.jl/src/truncate.jl
Line 93 in fa8c30d
struct Truncated{D<:UnivariateDistribution, S<:ValueSupport, T <: Real} <: UnivariateDistribution{S} |
d.included.tp*cdf(d.included, x) + d.excluded.tp*ccdf(d.excluded, unfold_value(x,d)) | ||
end | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ccdf and logccdf are not defined, ie onky fallback definitions will be used?
###################################################### | ||
|
||
""" | ||
mean(d::Folded) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove.
# >> MethodError: no method matching iterate(::Truncated{Beta{...}...}) | ||
``` | ||
""" | ||
mean(d::Folded) = mean(d.included)*d.included.tp - mean(d.excluded)*d.excluded.tp + 2*d.crease |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again tp does not exist in general.
Thank you so much for the detailed review! I'll pour over these and make the changes as needed thank you so much for the feedback and explanations. |
@Potatoasad have you had any luck with this? |
Hey, sorry. Yes I did make some slight progress, but got sidetracked into a paper I was working on. I think I can finish this up in the next two weeks. Apologies for the very long delay on this. |
Hi, just wanted to know if you're still interested in working on this :) |
Co-authored-by: David Widmann <[email protected]>
Co-authored-by: David Widmann <[email protected]>
Codecov ReportPatch coverage is
📢 Thoughts on this report? Let us know!. |
Co-authored-by: Carlos Parada <[email protected]>
Co-authored-by: Carlos Parada <[email protected]>
Folded Distributions
Here's a Generic implementation for
Folded
continuous univariate distributions. I worked on this a while ago and only recently cleaned it up. This has been useful to me in my research and hopefully is helpful to others as well.Definition
A folded distribution$F_{c}(D)$ of a distribution $D$ at crease $c \in \mathbb{R}$ is the reflection of the distribution below $c$ onto the the distribution above $c$ .
The pdf of such a distribution is given by:
where$x' = 2c - x$
Interface
The interface is very very similar to the
truncated
interface.You can take any continuous univariate distribution and make it into a folded distribution like so:
This gives the result:
![folded](https://user-images.githubusercontent.com/18480971/197929554-58d82913-5862-4ded-bcd1-5a2d0ed889d1.png)
The
folded
function creates a folded distribution from the original distributiond
about the valuecrease
.This function defaults to folding the left side onto the right, but by using
keep_right=false
one can reflect the right side onto the left.This can be implemented for any univariate continuous distribution by using the following method:
If one wants to reflect points above the crease$c$ onto points below $c$ (such that the resultant distribution lives on points below $c$ ) one can do that using the
keep_right=false
tag:A very useful and oft-occuring example of this is the action of the absolute value function ($|\cdot|$ ) on random variables.
$$\hat{X} \sim \mathcal{N}(\alpha,\beta)$$
For example if we have a random variable obeying the normal distribution:
Taking the absolute value of this random variable always follows a folded normal distribution with a crease at$0$
In julia that looks like:
and then we can ask for many things like the pdf, or sample the distribution. The samples of course will then be positive:
In general, one can take any univariate continous distribution and fold it using this function.
Example
We can create the absolute value of the laplace distribution by:
as a quick check we can see if the 1000 samples generated from this are all greater than 0:
For more one can check the docs in the PR.
With this addition one should be able to define stuff like the distribution of the absolute value of any univariate random variable.
Distributions this could lead to
In issue #124 ,there was a desire to implement the following distributions:
and all of these can be implemented very easily using the
folded
function.For example, an implementation of the
HalfNormal
distribution, would be as simple asThe others can be implemented similarly very easily. The
Folded
generic subsumes all these cases.Conclusion & Why would this be useful?
I've added tests and docs, and wanted to see if this is an appropriate inclusion to the package.
Let me know if this is would be welcomed by this package, and if there's anything I should add to the tests or the docs, or change the code style.
Thank you for all the work you all do on this amazing package!