Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc/add glossary of terms #415

Merged
merged 5 commits into from
Sep 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ the board, with a focus on documentation and usability.
[PR #365](https://github.com/aai-institute/pyDVL/pull/365)
- Enabled parallel computation for Leave-One-Out values
[PR #406](https://github.com/aai-institute/pyDVL/pull/406)
- Added more abbreviations to documentation
[PR #415](https://github.com/aai-institute/pyDVL/pull/415)

### Changed
- Replaced sphinx with mkdocs for documentation. Major overhaul of documentation
Expand Down
10 changes: 10 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,16 @@ def f(x: float) -> float:
return 1/(x*x)
```

### Abbreviations

We keep the abbreviations used in the documentation inside the
[docs_include/abbreviations.md](docs_includes%2Fabbreviations.md) file.

The syntax for abbreviations is:

```markdown
*[ABBR]: Abbreviation
```

## CI

Expand Down
5 changes: 2 additions & 3 deletions docs/value/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ there are additional desiderata, like having a value function that does not
increase with repeated samples. Game-theoretic methods are all rooted in axioms
that by construction ensure different desiderata, but despite their practical
usefulness, none of them are either necessary or sufficient for all
applications. For instance, *[SV]s try to equitably distribute all value
applications. For instance, SV methods try to equitably distribute all value
among all samples, failing to identify repeated ones as unnecessary, with e.g. a
zero value.

Expand Down Expand Up @@ -332,8 +332,7 @@ nature of every (non-trivial) ML problem can have an effect:
[@wang_data_2022] prove that by relaxing one of the Shapley axioms
and considering the general class of semi-values, of which Shapley is an
instance, one can prove that a choice of constant weights is the best one can
do in a utility-agnostic setting. So-called *Data Banzhaf* is on our to-do
list!
do in a utility-agnostic setting. So-called *Data Banzhaf*.

* **Data set size**: Computing exact Shapley values is NP-hard, and Monte Carlo
approximations can converge slowly. Massive datasets are thus impractical, at
Expand Down
2 changes: 1 addition & 1 deletion docs/value/semi-values.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ values = compute_generic_semivalues(
u=utility,
coefficient=beta_coefficient(alpha=1, beta=16),
done=AbsoluteStandardError(threshold=1e-4),
)
)
```

Allowing any coefficient can help when experimenting with models which are more
Expand Down
4 changes: 4 additions & 0 deletions docs_includes/abbreviations.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,7 @@
*[MSE]: Mean Squared Error
*[SV]: Shapley Value
*[TMCS]: Truncated Monte Carlo Shapley
*[IF]: Influence Function
*[iHVP]: inverse Hessian-vector product
*[LiSSA]: Linear-time Stochastic Second-order Algorithm
*[DUL]: Data Utility Learning
Loading