From d64e39f8bb23a5d86fceb5f22249b276cd2fb45e Mon Sep 17 00:00:00 2001 From: Hanna Imshenetska Date: Mon, 9 Dec 2024 12:39:58 +0000 Subject: [PATCH] update 'README.md' --- README.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index d25ff675..2f6e5add 100644 --- a/README.md +++ b/README.md @@ -31,7 +31,7 @@ pip install syngen[ui] *Note:* see details of the UI usage in the [corresponding section](#ui-web-interface) -The training and inference processes are separated with two cli entry points. The training one receives paths to the original table, metadata json file or table name and used hyperparameters.
+The training and inference processes are separated with two CLI entry points. The training one receives paths to the original table, metadata json file or table name and used hyperparameters.
To start training with defaults parameters run: @@ -88,7 +88,7 @@ train --source PATH_TO_ORIGINAL_CSV \ ``` The accepted values for the parameter "reports": - "none" (default) - no reports will be generated - - "accuracy" - generates an accuracy report (after the training process, synthetic data of the same size as the original data is generated, and then the accuracy report of synthetic data is provided to verify the quality of the model) + - "accuracy" - generates an accuracy report to measure the quality of synthetic data relative to the original dataset. This report is produced after the completion of the training process, during which a model learns to generate new data. The synthetic data generated for this report is of the same size as the original dataset to reach more accurate comparison. - "sample" - generates a sample report (if original data is sampled, the comparison of distributions of original data and sampled data is provided in the report) - "metrics_only" - outputs the metrics information only to standard output without generation of an accuracy report - "all" - generates both accuracy and sample reports
@@ -100,7 +100,7 @@ To train one or more tables using a metadata file, you can use the following com train --metadata_path PATH_TO_METADATA_YAML ``` -Parameters which you can set up for training process: +Parameters that you can set up for training process: - source – required parameter for training of single table, a path to the file that you want to use as a reference - table_name – required parameter for training of single table, an arbitrary string to name the directories @@ -119,7 +119,7 @@ Requirements for parameters of training process: * row_limit - data type - integer * drop_null - data type - boolean, default value - False * batch_size - data type - integer, must be equal to or more than 1, default value - 32 -* reports - data type - if the value is passed through CLI - string, if the value is passed in the metadata file - string or list, accepted values: "none" (default) - no reports will be generated, "all" - generates both accuracy and sample reports, "accuracy" - generates an accuracy report, "sample" - generates a sample report, "metrics_only" - outputs the metrics information only to standard output without generation of a report. Default value is none. In the metadata file multiple values can be specified as a list to generate multiple types of reports simultaneously, e.g. ["metrics_only", "sample"] +* reports - data type - if the value is passed through CLI - string, if the value is passed in the metadata file - string or list, accepted values: "none" (default) - no reports will be generated, "all" - generates both accuracy and sample reports, "accuracy" - generates an accuracy report, "sample" - generates a sample report, "metrics_only" - outputs the metrics information only to standard output without generation of a report. Default value is "none". In the metadata file multiple values can be specified as a list of available options ("accuracy", "sample", "metrics_only") to generate multiple types of reports simultaneously, e.g. ["metrics_only", "sample"] * metadata_path - data type - string * column_types - data type - dictionary with the key categorical - the list of columns (data type - string) @@ -145,7 +145,7 @@ infer --table_name TABLE_NAME \ ``` The accepted values for the parameter "reports": - "none" (default) - no reports will be generated - - "accuracy" - generates an accuracy report verify the quality of the generated data + - "accuracy" - generates an accuracy report that compares original and synthetic data patterns to verify the quality of the generated data - "metrics_only" - outputs the metrics information only to standard output without generation of an accuracy report - "all" - generates an accuracy report
Default value is none. @@ -172,7 +172,7 @@ Requirements for parameters of generation process: * run_parallel - data type - boolean, default value is False * batch_size - data type - integer, must be equal to or more than 1 * random_seed - data type - integer, must be equal to or more than 0 -* reports - data type - if the value is passed through CLI - string, if the value is passed in the metadata file - string or list, accepted values: "none" (default) - no reports will be generated, "all" - generates an accuracy report, "accuracy" - generates an accuracy report, "metrics_only" - outputs the metrics information only to standard output without generation of a report. Default value is none. In the metadata file multiple values can be specified as a list to generate multiple types of reports simultaneously +* reports - data type - if the value is passed through CLI - string, if the value is passed in the metadata file - string or list, accepted values: "none" (default) - no reports will be generated, "all" - generates an accuracy report, "accuracy" - generates an accuracy report, "metrics_only" - outputs the metrics information only to standard output without generation of a report. Default value is "none". In the metadata file multiple values can be specified as a list of available options ("accuracy", "metrics_only") to generate multiple types of reports simultaneously * metadata_path - data type - string The metadata can contain any of the arguments above for each table. If so, the duplicated arguments from the CLI @@ -352,15 +352,15 @@ infer --metadata_path="./examples/example-metadata/housing_metadata.yaml" If `--metadata_path` is present and the metadata contains the necessary parameters, other CLI parameters will be ignored.
-### Ways to set the value(s) in the section "reports" in the metadata file +### Ways to set the value(s) in the section "reports" of the metadata file The accepted values in the section "reports" in "train_settings": - "none" (default) - no reports will be generated - - "accuracy" - generates an accuracy report (after the training process, synthetic data of the same size as the original data is generated, and then the accuracy report of synthetic data is provided to verify the quality of the model) + - "accuracy" - generates an accuracy report to measure the quality of synthetic data relative to the original dataset. This report is produced after the completion of the training process, during which a model learns to generate new data. The synthetic data generated for this report is of the same size as the original dataset to reach more accurate comparison. - "sample" - generates a sample report (if original data is sampled, the comparison of distributions of original data and sampled data is provided in the report) - "metrics_only" - outputs the metrics information only to standard output without generation of an accuracy report - "all" - generates both accuracy and sample reports
-Default value is none. +Default value is "none". Examples how to set the value(s) in the section "reports" in "train_settings": ```yaml @@ -382,10 +382,10 @@ reports: ``` The accepted values for the parameter "reports" in "infer_settings": - "none" (default) - no reports will be generated - - "accuracy" - generates an accuracy report verify the quality of the generated data + - "accuracy" - generates an accuracy report to verify the quality of the generated data - "metrics_only" - outputs the metrics information only to standard output without generation of an accuracy report - "all" - generates an accuracy report
-Default value is none. +Default value is "none". Examples how to set the value(s) in the section "reports" in "infer_settings": ```yaml