Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Folded Distributions #1631

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

Potatoasad
Copy link

@Potatoasad Potatoasad commented Oct 26, 2022

Folded Distributions

Here's a Generic implementation for Folded continuous univariate distributions. I worked on this a while ago and only recently cleaned it up. This has been useful to me in my research and hopefully is helpful to others as well.

Definition

A folded distribution $F_{c}(D)$ of a distribution $D$ at crease $c \in \mathbb{R}$ is the reflection of the distribution below $c$ onto the the distribution above $c$.

The pdf of such a distribution is given by:

$$ p(x | F_{c}(D)) = p(x | D) + p(x' | D) $$

where $x' = 2c - x$

Interface

The interface is very very similar to the truncated interface.
You can take any continuous univariate distribution and make it into a folded distribution like so:

julia> normal_absolute = folded(Normal(2.0,1.0), 0.0)
# Folded(Normal{Float64}(μ=2.0, σ=1.0); crease = 0.0 | fold left onto right)
julia> plot(x -> pdf(normal_absolute, x))

This gives the result:
folded

The folded function creates a folded distribution from the original distribution d about the value crease.
This function defaults to folding the left side onto the right, but by using keep_right=false one can reflect the right side onto the left.

This can be implemented for any univariate continuous distribution by using the following method:

folded(d::ContinuousUnivariateDistribution, crease::Real)

If one wants to reflect points above the crease $c$ onto points below $c$ (such that the resultant distribution lives on points below $c$) one can do that using the keep_right=false tag:

folded(d::ContinuousUnivariateDistribution, crease::Real, keep_right=false)

A very useful and oft-occuring example of this is the action of the absolute value function ($|\cdot|$) on random variables.
For example if we have a random variable obeying the normal distribution:
$$\hat{X} \sim \mathcal{N}(\alpha,\beta)$$

Taking the absolute value of this random variable always follows a folded normal distribution with a crease at $0$

$$|\hat{X}| \sim F_{c=0}(\mathcal{N}(\alpha,\beta))$$
In julia that looks like:

folded_normal = folded(Normal(μ,σ), 0.0)

and then we can ask for many things like the pdf, or sample the distribution. The samples of course will then be positive:

all(rand(folded_normal, 1000) .≥ 0) # True

In general, one can take any univariate continous distribution and fold it using this function.

Example

We can create the absolute value of the laplace distribution by:

folded_laplace = folded(Laplace(μ,b), 0.0)

as a quick check we can see if the 1000 samples generated from this are all greater than 0:

all(rand(folded(Lapace(μ,b),0),1000) .≥ 0) # -> true

For more one can check the docs in the PR.

With this addition one should be able to define stuff like the distribution of the absolute value of any univariate random variable.

Distributions this could lead to

In issue #124 ,there was a desire to implement the following distributions:

  • foldcauchy - A folded Cauchy continuous random variable
  • foldnorm - A folded normal continuous random variable
  • halfcauchy - A Half-Cauchy continuous random variable
  • halflogistic - A half-logistic continuous random variable
  • halfnorm - A half-normal continuous random variable
    and all of these can be implemented very easily using the folded function.

For example, an implementation of the HalfNormal distribution, would be as simple as

halfnorm(σ) = folded(Normal(zero(σ), σ), zero(σ))

The others can be implemented similarly very easily. The Folded generic subsumes all these cases.

Conclusion & Why would this be useful?

I've added tests and docs, and wanted to see if this is an appropriate inclusion to the package.
Let me know if this is would be welcomed by this package, and if there's anything I should add to the tests or the docs, or change the code style.

Thank you for all the work you all do on this amazing package!

@azev77
Copy link
Contributor

azev77 commented Oct 28, 2022

This is a brilliant incredibly useful idea!!!

@Potatoasad
Copy link
Author

This is a brilliant incredibly useful idea!!!

Thank you so much! Hoping this can get merged soon 🤞

@Potatoasad
Copy link
Author

Hi! Just wanted to get an update on this if this is good enough to merge - I think it's ready for review from my end.
I understand the maintainers have a limited time so no worries at all, just wanted to bump this.

@sethaxen
Copy link
Contributor

Hey @Potatoasad, I had just hacked together a Folded distribution for my own research and just now saw your PR. I'd be happy to review it and will try to do so this weekend.

```
as a quick check we can see if the 1000 samples generated from this are all greater than 0:
```julia
all(rand(folded(Lapace(μ,b),0),1000) .≥ 0) # -> true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
all(rand(folded(Lapace(μ,b),0),1000) .≥ 0) # -> true
all((0), rand(folded_laplace, 1000)) # -> true

I think this is easier to read; it's also more efficient, since it avoids allocating an intermediate array.

src/fold.jl Outdated
@@ -0,0 +1,320 @@
"""
folded(d::ContinuousUnivariateDistribution, crease::Real)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
folded(d::ContinuousUnivariateDistribution, crease::Real)

Not necessary to document the no-keyword method, since the keyword method already covers it (by showing the default value).



##### Defining the function and the initialization functions #####
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should follow the Blue style guide for docstrings--see here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. JuliaStats does not use Blue style. One should just be consistent with the existing style until the org decides on a specific style. It does not matter anyway as all these docstrings should be removed (see my review).

"""
struct Folded{D<:ContinuousUnivariateDistribution, K<:Truncated, S<:Continuous, T <: Real} <: ContinuousUnivariateDistribution
original::D # the original distribution (unfolded)
included::K # The part of the old distribution left unchanged
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the names here. I feel like I'd have a hard time remembering included/excluded 😅

folded(d::ContinuousUnivariateDistribution, crease::Real)
folded(d::ContinuousUnivariateDistribution, crease::Real, keep_right=false)

Creates a _folded distribution_ from the original distribution `d` about the value `crease`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Creates a _folded distribution_ from the original distribution `d` about the value `crease`.
Fold a distribution `d` at the value `crease`.

# >> MethodError: no method matching iterate(::Truncated{Beta{...}...})
```
"""
mean(d::Folded) = mean(d.included)*d.included.tp - mean(d.excluded)*d.excluded.tp + 2*d.crease
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comments or expand this out? It's not immediately obvious to me why this is correct.

end

# Testing if the insupport function works
@testset "Folded insupport, minimum and maximum functions" begin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more tests? We should be covering at least mean, var, median, and at least two different quantiles.

@test all(((x->Distributions.maximum(folded(Normal(2,1),x,keep_right=false))).(-10:0.1:10)) .≈ collect(-10:0.1:10))
end

@testset "Folded Rand functions" begin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for the above--we need tests that check if rand is working correctly, e.g. testing if a big sample from a folded normal distribution has the right distribution.

Comment on lines +37 to +40
@testset "Folded pdf, logpdf, cdf and logcdf functions" begin
@test pdf(truncated(Normal(3,4),-1,1),-0.1) + pdf(truncated(Normal(3,4),-1,1),0.1) ≈ pdf(folded(truncated(Normal(3,4),-1,1),0.0),0.1)
@test pdf(folded(truncated(Normal(3,4),-1,1),0.0),-0.1) == 0
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@testset "Folded pdf, logpdf, cdf and logcdf functions" begin
@test pdf(truncated(Normal(3,4),-1,1),-0.1) + pdf(truncated(Normal(3,4),-1,1),0.1) pdf(folded(truncated(Normal(3,4),-1,1),0.0),0.1)
@test pdf(folded(truncated(Normal(3,4),-1,1),0.0),-0.1) == 0
end
@testset "Folded pdf, logpdf, cdf and logcdf functions" begin
trunc_dist = truncated(Normal(3, 4), -1, 1)
fold_dist = folded(trunc_dist, 0.0)
@test pdf(truncated(trunc_dist, -0.1) + pdf(trunc_dist, 0.1) pdf(fold_dist, 0.1)
@test pdf(fold_dist, -0.1) 0
end

@test all(rand(folded(Normal(1.0,1.0),0.0),100) .> 0.0)
end

@testset "Folded pdf, logpdf, cdf and logcdf functions" begin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests that deal with cases other than folding at 0?

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! I added some initial comments.

src/Distributions.jl Outdated Show resolved Hide resolved

# This is the user facing initialization function that will be used in most cases
"""
folded(d::ContinuousUnivariateDistribution, crease::T; keep_right=true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
folded(d::ContinuousUnivariateDistribution, crease::T; keep_right=true)
folded(d::ContinuousUnivariateDistribution, crease::Real; keep_right::Bool=true)

src/fold.jl Outdated Show resolved Hide resolved
@@ -0,0 +1,320 @@
"""
folded(d::ContinuousUnivariateDistribution, crease::Real)
folded(d::ContinuousUnivariateDistribution, crease::Real, keep_right=false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
folded(d::ContinuousUnivariateDistribution, crease::Real, keep_right=false)
folded(d::ContinuousUnivariateDistribution, crease::Real; keep_right::Bool=true)

Maybe a (better?) name for the keyword argument would be

Suggested change
folded(d::ContinuousUnivariateDistribution, crease::Real, keep_right=false)
folded(d::ContinuousUnivariateDistribution, crease::Real; above::Bool=true)

or

Suggested change
folded(d::ContinuousUnivariateDistribution, crease::Real, keep_right=false)
folded(d::ContinuousUnivariateDistribution, crease::Real; support_above::Bool=true)

Holds the original distribution, and two truncated copies of the
distribution: one above the crease and one below.
"""
struct Folded{D<:ContinuousUnivariateDistribution, K<:Truncated, S<:Continuous, T <: Real} <: ContinuousUnivariateDistribution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to restrict K to subtypes of Truncated? S is not needed?

Suggested change
struct Folded{D<:ContinuousUnivariateDistribution, K<:Truncated, S<:Continuous, T <: Real} <: ContinuousUnivariateDistribution
struct Folded{D<:ContinuousUnivariateDistribution, K, T <: Real} <: ContinuousUnivariateDistribution

"""
function cdf(d::Folded, x::Real)
if !insupport(d,x)
(x ≤ minimum(d)) && return zero(x) # Is below the support
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again type unstable. Again best to always compute the result (possibly with adjusted x) below. Maybe it's already sufficient to just perform the evaluation for clamp(x, extrema(d)...).

(x ≤ minimum(d)) && return zero(x) # Is below the support
(x ≥ maximum(d)) && return one(x) # Is above the support
end
d.included.tp*cdf(d.included, x) + d.excluded.tp*ccdf(d.excluded, unfold_value(x,d))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d.included.tp does not exist for general truncated distributions. Could we use something like

Suggested change
d.included.tp*cdf(d.included, x) + d.excluded.tp*ccdf(d.excluded, unfold_value(x,d))
cdf(d::Folded, x::Real) = exp(logcdf(d, x)) # not sure, maybe the default definition already?
function logcdf(d::Folded, x::Real)
_x = clamp(x, extrema(d.included)...)
y = unfolded_value(d, _x)
a, b = d.keep_right ? (y, _x) : (_x, y)
return logdiffcdf(d.original, b, a)
end

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding, by looking at the source code was that the Truncated object has a field tp in its definition, and having this would prevent recalculation each time.

Is there a way to infer from the types whether a truncated distribution has the tp field? So if it doesn't exist for a particular type, we could dispatch to the code you provided?

struct Truncated{D<:UnivariateDistribution, S<:ValueSupport, T <: Real} <: UnivariateDistribution{S}

d.included.tp*cdf(d.included, x) + d.excluded.tp*ccdf(d.excluded, unfold_value(x,d))
end


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ccdf and logccdf are not defined, ie onky fallback definitions will be used?

######################################################

"""
mean(d::Folded)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove.

# >> MethodError: no method matching iterate(::Truncated{Beta{...}...})
```
"""
mean(d::Folded) = mean(d.included)*d.included.tp - mean(d.excluded)*d.excluded.tp + 2*d.crease
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again tp does not exist in general.

@Potatoasad
Copy link
Author

Thank you so much for the detailed review! I'll pour over these and make the changes as needed thank you so much for the feedback and explanations.

@azev77
Copy link
Contributor

azev77 commented May 10, 2023

Thank you so much for the detailed review! I'll pour over these and make the changes as needed thank you so much for the feedback and explanations.

@Potatoasad have you had any luck with this?

@Potatoasad
Copy link
Author

Thank you so much for the detailed review! I'll pour over these and make the changes as needed thank you so much for the feedback and explanations.

@Potatoasad have you had any luck with this?

Hey, sorry. Yes I did make some slight progress, but got sidetracked into a paper I was working on. I think I can finish this up in the next two weeks. Apologies for the very long delay on this.

@ParadaCarleton
Copy link
Contributor

Thank you so much for the detailed review! I'll pour over these and make the changes as needed thank you so much for the feedback and explanations.

@Potatoasad have you had any luck with this?

Hey, sorry. Yes I did make some slight progress, but got sidetracked into a paper I was working on. I think I can finish this up in the next two weeks. Apologies for the very long delay on this.

Hi, just wanted to know if you're still interested in working on this :)

Potatoasad and others added 2 commits September 4, 2023 17:27
Co-authored-by: David Widmann <[email protected]>
Co-authored-by: David Widmann <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Sep 4, 2023

Codecov Report

Patch coverage is 59.25% of modified lines.

❗ Current head 009f83d differs from pull request most recent head f79504f. Consider uploading reports for the commit f79504f to get more accurate results

Files Changed Coverage
src/Distributions.jl ø
src/fold.jl 59.25%

📢 Thoughts on this report? Let us know!.

Potatoasad and others added 2 commits September 4, 2023 17:29
Co-authored-by: Carlos Parada <[email protected]>
Co-authored-by: Carlos Parada <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants