Augmentation benchmark #150

robsdavis · 2023-03-13T11:40:08Z

Description

Added an augmentation benchmark pipeline.

closes #136

Affected Dependencies

None

How has this been tested?

Tests added to test_benchmarks.py and metrics/test_api.py

Checklist

I have followed the Contribution Guidelines and Code of Conduct
I have commented my code following the van der Schaar Lab Styleguide
I have labeled this PR with the relevant Type labels
My changes are covered by tests

…nto augmentation-benchmark

bcebere · 2023-03-13T12:02:00Z

tests/plugins/survival_analysis/test_survival_ctgan.py

@@ -65,7 +65,7 @@ def test_plugin_hyperparams(test_plugin: Plugin) -> None:
    ],
 )
 def test_plugin_fit(tte_strategy: str) -> None:
-    test_plugin = plugin(tte_strategy=tte_strategy, device="cpu", **plugins_args)
+    test_plugin = plugin(tte_strategy=tte_strategy, device="zz", **plugins_args)


What does zz mean?

Typo. Now fixed

bcebere · 2023-03-13T13:00:30Z

src/synthcity/metrics/eval.py

+                        use_cache=use_cache,
+                    ),
+                    X_gt.sample(eval_cnt),
+                    X_augmented.sample(eval_cnt),


Is sample(eval_cnt) relevant for the augmented dataset? The augmented dataset will be larger than X_gt everytime, right?

Yes, good point. I'll remove the sample call in the next push

bcebere

Great work!

Some changes are still needed, after that this can be merged

bcebere · 2023-03-14T15:33:19Z

src/synthcity/benchmark/utils.py

+            if not set(ad_hoc_augment_vals.keys()).issubset(
+                set(X_train[fairness_column].values)
+            ):
+                print(set(X_train[fairness_column].values))


Don't leave prints in the code. use log if the logs are needed.

bcebere · 2023-03-14T15:34:26Z

src/synthcity/plugins/core/dataloader.py

@@ -290,8 +295,6 @@ class GenericDataLoader(DataLoader):
        >>> from synthcity.plugins.core.dataloader import GenericDataLoader
        >>> X, y = load_diabetes(return_X_y=True, as_frame=True)
        >>> X["target"] = y
-        >>> # Important note: preprocessing data with OneHotEncoder or StandardScaler is not needed or recommended.
-        >>> # Synthcity handles feature encoding and standardization internally.


Why did you remove these lines?

Accident, re-instated.

bcebere · 2023-03-14T15:35:53Z

tests/metrics/test_api.py

+
+
+@pytest.mark.parametrize(
+    "fairness_column, rule, strict, add_hoc_vals",


Do you mean ad_hoc?

robsdavis added 8 commits February 24, 2023 09:42

Add new benchmark code

56c97c6

Merge branch 'main' into augmentation-benchmark

e5ffa12

Merge main into branch

ce76de1

Merge branch 'main' into augmentation-benchmark

32e049b

Augmentation benchmark added

6c9a0d7

Clean up

de042de

Cleaning up

c2736ba

Merge branch 'main' of https://github.com/vanderschaarlab/synthcity i…

af63f20

…nto augmentation-benchmark

robsdavis added the enhancement New feature or request label Mar 13, 2023

robsdavis requested a review from bcebere March 13, 2023 11:40

robsdavis linked an issue Mar 13, 2023 that may be closed by this pull request

Benchmark pipeline for data augmentation tasks #136

Closed

robsdavis added 2 commits March 13, 2023 11:58

Remove unnecessary tutorial file

3459334

Clean up

21657a3

bcebere reviewed Mar 13, 2023

View reviewed changes

robsdavis added 6 commits March 13, 2023 14:09

clean up

fdca5a7

Debug test and clean up

b79fe84

Added new tests for augmentation benchmark

06d90f2

Added new metric api tests for augmentation

eec447a

clean up

2eb7f06

clean up

26aa26c

robsdavis changed the title ~~[WIP] Augmentation benchmark~~ Augmentation benchmark Mar 14, 2023

bcebere suggested changes Mar 14, 2023

View reviewed changes

version bumped and clean up

d34e995

bcebere approved these changes Mar 14, 2023

View reviewed changes

clean up docstrings

7d6a667

robsdavis merged commit cf6ea56 into main Mar 15, 2023

robsdavis deleted the augmentation-benchmark branch March 15, 2023 09:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Augmentation benchmark #150

Augmentation benchmark #150

robsdavis commented Mar 13, 2023 •

edited

Loading

bcebere Mar 13, 2023

robsdavis Mar 13, 2023

bcebere Mar 13, 2023

robsdavis Mar 13, 2023

bcebere left a comment

bcebere Mar 14, 2023

robsdavis Mar 14, 2023

bcebere Mar 14, 2023

robsdavis Mar 14, 2023

bcebere Mar 14, 2023

robsdavis Mar 14, 2023



		@pytest.mark.parametrize(
		"fairness_column, rule, strict, add_hoc_vals",

Augmentation benchmark #150

Augmentation benchmark #150

Conversation

robsdavis commented Mar 13, 2023 • edited Loading

Description

Affected Dependencies

How has this been tested?

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bcebere left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robsdavis commented Mar 13, 2023 •

edited

Loading