Skip to content

Commit

Permalink
Merge branch 'StatsBase2021' into nl/weightedstats
Browse files Browse the repository at this point in the history
  • Loading branch information
nalimilan committed Sep 25, 2021
2 parents 850d3e6 + 1e5d2a8 commit b3e9325
Show file tree
Hide file tree
Showing 46 changed files with 3,217 additions and 1,097 deletions.
26 changes: 26 additions & 0 deletions .github/workflows/CompatHelper.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: CompatHelper
on:
schedule:
- cron: 0 0 * * *
workflow_dispatch:
jobs:
CompatHelper:
runs-on: ubuntu-latest
steps:
- name: "Install CompatHelper"
run: |
import Pkg
name = "CompatHelper"
uuid = "aa819f21-2bde-4658-8897-bab36330d9b7"
version = "2"
Pkg.add(; name, uuid, version)
shell: julia --color=yes {0}
- name: "Run CompatHelper"
run: |
import CompatHelper
CompatHelper.main()
shell: julia --color=yes {0}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
COMPATHELPER_PRIV: ${{ secrets.DOCUMENTER_KEY }}
# COMPATHELPER_PRIV: ${{ secrets.COMPATHELPER_PRIV }}
11 changes: 11 additions & 0 deletions .github/workflows/TagBot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
name: TagBot
on:
schedule:
- cron: 0 * * * *
jobs:
TagBot:
runs-on: ubuntu-latest
steps:
- uses: JuliaRegistries/TagBot@v1
with:
token: ${{ secrets.GITHUB_TOKEN }}
8 changes: 8 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,10 @@ jobs:
${{ runner.os }}-test-${{ env.cache-name }}-
${{ runner.os }}-test-
${{ runner.os }}-
<<<<<<< HEAD
=======
- run: julia --color=yes .ci/test_and_change_uuid.jl
>>>>>>> master
- uses: julia-actions/julia-buildpkg@v1
- uses: julia-actions/julia-runtest@v1
- uses: julia-actions/julia-processcoverage@v1
Expand All @@ -52,6 +55,10 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
<<<<<<< HEAD
- uses: julia-actions/julia-buildpkg@latest
- uses: julia-actions/julia-docdeploy@latest
=======
- uses: julia-actions/setup-julia@v1
with:
version: '1'
Expand All @@ -61,6 +68,7 @@ jobs:
Pkg.develop(PackageSpec(path=pwd()))
Pkg.instantiate()'
- run: julia --project=docs docs/make.jl
>>>>>>> master
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
DOCUMENTER_KEY: ${{ secrets.DOCUMENTER_KEY }}
5 changes: 3 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
name = "Statistics"
uuid = "20745b16-79ce-11e8-11f9-7d13ad32a3b2"
uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"

[deps]
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"

[extras]
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["Random", "Test"]
test = ["Dates", "Random", "Test"]
26 changes: 11 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,16 @@
## StatsBase.jl
# Statistics.jl

*StatsBase.jl* is a Julia package that provides basic support for statistics. Particularly, it implements a variety of statistics-related functions, such as scalar statistics, high-order moment computation, counting, ranking, covariances, sampling, and empirical density estimation.
[![Build status](https://github.com/JuliaLang/Statistics.jl/workflows/CI/badge.svg)](https://github.com/JuliaLang/Statistics.jl/actions?query=workflow%3ACI+branch%3Amaster)

- **Current Release**:
[![StatsBase](http://pkg.julialang.org/badges/StatsBase_0.5.svg)](http://pkg.julialang.org/?pkg=StatsBase)
[![StatsBase](http://pkg.julialang.org/badges/StatsBase_0.6.svg)](http://pkg.julialang.org/?pkg=StatsBase)
- **Build & Testing Status:**
[![Build Status](https://travis-ci.org/JuliaStats/StatsBase.jl.svg?branch=master)](https://travis-ci.org/JuliaStats/StatsBase.jl)
[![Build status](https://ci.appveyor.com/api/projects/status/fsut3j3onulvws1w?svg=true)](https://ci.appveyor.com/project/nalimilan/statsbase-jl)
[![Coverage Status](https://coveralls.io/repos/JuliaStats/StatsBase.jl/badge.svg?branch=master)](https://coveralls.io/r/JuliaStats/StatsBase.jl?branch=master)
[![Coverage Status](http://codecov.io/github/JuliaStats/StatsBase.jl/coverage.svg?branch=master)](http://codecov.io/github/JuliaStats/StatsBase.jl?branch=master)
Development repository for the Statistics standard library (stdlib) that ships with Julia.

- **Documentation**: [![][docs-stable-img]][docs-stable-url] [![][docs-latest-img]][docs-latest-url]
#### Using the development version of Statistics.jl

[docs-latest-img]: https://img.shields.io/badge/docs-latest-blue.svg
[docs-latest-url]: http://JuliaStats.github.io/StatsBase.jl/latest/
If you want to develop this package, do the following steps:
- Clone the repo anywhere.
- In line 2 of the `Project.toml` file (the line that begins with `uuid = ...`), modify the UUID, e.g. change the `107` to `207`.
- Change the current directory to the Statistics repo you just cloned and start julia with `julia --project`.
- `import Statistics` will now load the files in the cloned repo instead of the Statistics stdlib.
- To test your changes, simply do `include("test/runtests.jl")`.

[docs-stable-img]: https://img.shields.io/badge/docs-stable-blue.svg
[docs-stable-url]: http://JuliaStats.github.io/StatsBase.jl/stable/
If you need to build Julia from source with a git checkout of Statistics, then instead use `make DEPS_GIT=Statistics` when building Julia. The `Statistics` repo is in `stdlib/Statistics`, and created initially with a detached `HEAD`. If you're doing this from a pre-existing Julia repository, you may need to `make clean` beforehand.
6 changes: 3 additions & 3 deletions docs/src/empirical.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

## Histograms

The `Histogram` type represents data that has been tabulated into intervals
(known as *bins*) along the real line, or in higher dimensions, over the real
plane.
```@docs
Histogram
```

Histograms can be fitted to data using the `fit` method.

Expand Down
2 changes: 1 addition & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,4 @@ corrections where necessary.
Pages = ["weights.md", "scalarstats.md", "cov.md", "robust.md", "ranking.jl",
"empirical.md"]
Depth = 2
```
```
2 changes: 1 addition & 1 deletion docs/src/scalarstats.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,4 +71,4 @@ modes

```@docs
describe
```
```
83 changes: 81 additions & 2 deletions docs/src/weights.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,15 +64,91 @@ w = ProbabilityWeights([0.2, 0.1, 0.3])
w = pweights([0.2, 0.1, 0.3])
```

### `UnitWeights`

Unit weights are a special case in which all observations are given a weight equal to `1`. Using such weights is equivalent to computing unweighted statistics.

This type can notably be used when implementing an algorithm so that a only a weighted variant has to be written. The unweighted variant is then obtained by passing a `UnitWeights` object. This is very efficient since no weights vector is actually allocated.

```julia
w = uweights(3)
w = uweights(Float64, 3)
```

### `Weights`

The `Weights` type describes a generic weights vector which does not support all operations possible for `FrequencyWeights`, `AnalyticWeights` and `ProbabilityWeights`.
The `Weights` type describes a generic weights vector which does not support all operations possible for `FrequencyWeights`, `AnalyticWeights`, `ProbabilityWeights` and `UnitWeights`.

```julia
w = Weights([1., 2., 3.])
w = weights([1., 2., 3.])
```

### Exponential weights: `eweights`

Exponential weights are a common form of temporal weights which assign exponentially decreasing
weights to past observations.

If `t` is a vector of temporal indices then for each index `i` we compute the weight as:

``λ (1 - λ)^{1 - i}``

``λ`` is a smoothing factor or rate parameter such that ``0 < λ ≤ 1``.
As this value approaches 0, the resulting weights will be almost equal,
while values closer to 1 will put greater weight on the tail elements of the vector.

For example, the following call generates exponential weights for ten observations with ``λ = 0.3``.
```julia-repl
julia> eweights(1:10, 0.3)
10-element Weights{Float64,Float64,Array{Float64,1}}:
0.3
0.42857142857142855
0.6122448979591837
0.8746355685131197
1.249479383590171
1.7849705479859588
2.549957925694227
3.642797036706039
5.203995766722913
7.434279666747019
```

Simply passing the number of observations `n` is equivalent to passing in `1:n`.

```julia-repl
julia> eweights(10, 0.3)
10-element Weights{Float64,Float64,Array{Float64,1}}:
0.3
0.42857142857142855
0.6122448979591837
0.8746355685131197
1.249479383590171
1.7849705479859588
2.549957925694227
3.642797036706039
5.203995766722913
7.434279666747019
```

Finally, you can construct exponential weights from an arbitrary subset of timestamps within a larger range.

```julia-repl
julia> t
2019-01-01T01:00:00:2 hours:2019-01-01T05:00:00
julia> r
2019-01-01T01:00:00:1 hour:2019-01-02T01:00:00
julia> eweights(t, r, 0.3)
3-element Weights{Float64,Float64,Array{Float64,1}}:
0.3
0.6122448979591837
1.249479383590171
```

NOTE: This is equivalent to `eweights(something.(indexin(t, r)), 0.3)`, which is saying that for each value in `t` return the corresponding index for that value in `r`.
Since `indexin` returns `nothing` if there is no corresponding value from `t` in `r` we use `something` to eliminate that possibility.

## Methods

`AbstractWeights` implements the following methods:
Expand All @@ -90,9 +166,12 @@ AbstractWeights
AnalyticWeights
FrequencyWeights
ProbabilityWeights
UnitWeights
Weights
aweights
fweights
pweights
eweights
uweights
weights
```
```
7 changes: 5 additions & 2 deletions perf/sampling.jl
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ using StatsBase

import StatsBase: direct_sample!, xmultinom_sample!
import StatsBase: knuths_sample!, fisher_yates_sample!, self_avoid_sample!
import StatsBase: seqsample_a!, seqsample_c!
import StatsBase: seqsample_a!, seqsample_c!, seqsample_d!

### generic sampling benchmarking

Expand Down Expand Up @@ -42,6 +42,9 @@ tsample!(s::Seq_A, a, x) = seqsample_a!(a, x)
mutable struct Seq_C <: NoRep end
tsample!(s::Seq_C, a, x) = seqsample_c!(a, x)

mutable struct Seq_D <: NoRep end
tsample!(s::Seq_D, a, x) = seqsample_d!(a, x)

mutable struct Sample_NoRep <: NoRep end
tsample!(s::Sample_NoRep, a, x) = sample!(a, x; replace=false, ordered=false)

Expand Down Expand Up @@ -87,6 +90,7 @@ const procs2 = Proc[ SampleProc{Knuths}(),
SampleProc{Sample_NoRep}(),
SampleProc{Seq_A}(),
SampleProc{Seq_C}(),
SampleProc{Seq_D}(),
SampleProc{Sample_NoRep_Ord}() ]

const cfgs2 = (Int, Int)[]
Expand All @@ -110,4 +114,3 @@ println("Sampling Without Replacement")
println("===================================")
show(rtable2; unit=:mps, cfghead="(n, k)")
println()

Loading

0 comments on commit b3e9325

Please sign in to comment.