Skip to content

Commit 6d7367c

Browse files
VibhuJawaaschilling-nvayushdgjgerhavem-nv
authored andcommitted
Enable Sem-dedup (NVIDIA#130)
* Applying SEO Best Pratices (NVIDIA#104) * Rename CPUvsGPU.rst to cpuvsgpu.rst Signed-off-by: Andrew Schilling <[email protected]> * Rename DataCuration.rsts to datacuration.rsts Signed-off-by: Andrew Schilling <[email protected]> * Rename DistributedDataClassification.rst to distributeddataclassification.rst Signed-off-by: Andrew Schilling <[email protected]> * Rename DocumentDataset.rst to documentdataset.rst Signed-off-by: Andrew Schilling <[email protected]> * Rename Download.rst to download.rst Signed-off-by: Andrew Schilling <[email protected]> * Rename GpuDeduplication.rst to gpudeduplication.rst Signed-off-by: Andrew Schilling <[email protected]> * Rename KubernetesCurator.rst to kubernetescurator.rst Signed-off-by: Andrew Schilling <[email protected]> * Rename LanguageIdentificationUnicodeFormatting.rst to languageidentificationunicodeformatting.rst Signed-off-by: Andrew Schilling <[email protected]> * Rename PersonalIdentifiableInformationIdentificationAndRemoval.rst to personalidentifiableinformationidentificationandremoval.rst Signed-off-by: Andrew Schilling <[email protected]> * Rename QualityFiltering.rst to qualityfiltering.rst Signed-off-by: Andrew Schilling <[email protected]> * Rename TaskDecontamination.rst to taskdecontamination.rst Signed-off-by: Andrew Schilling <[email protected]> * Update index.rst Setting all RST files to lowercase names. Signed-off-by: Andrew Schilling <[email protected]> * Ignore docs for EOF fixer hook Signed-off-by: Ayush Dattagupta <[email protected]> --------- Signed-off-by: Andrew Schilling <[email protected]> Signed-off-by: Ayush Dattagupta <[email protected]> Co-authored-by: Ayush Dattagupta <[email protected]> Signed-off-by: Vibhu Jawa <[email protected]> * Shuffle CC result on group before writing out (NVIDIA#110) Signed-off-by: Ayush Dattagupta <[email protected]> Signed-off-by: Vibhu Jawa <[email protected]> * Update index.rst (NVIDIA#113) Added links to tutorials Signed-off-by: jgerh <[email protected]> Signed-off-by: Vibhu Jawa <[email protected]> * first commit Signed-off-by: avinashvem <[email protected]> Signed-off-by: Vibhu Jawa <[email protected]> * mv under modules dir Signed-off-by: avinashvem <[email protected]> Signed-off-by: Vibhu Jawa <[email protected]> * first commit Signed-off-by: avinashvem <[email protected]> Signed-off-by: Vibhu Jawa <[email protected]> * mv under modules dir Signed-off-by: avinashvem <[email protected]> Signed-off-by: Vibhu Jawa <[email protected]> * first commit Signed-off-by: Vibhu Jawa <[email protected]> * mv under modules dir Signed-off-by: Vibhu Jawa <[email protected]> * embed by cluster saved Signed-off-by: Vibhu Jawa <[email protected]> * id map script Signed-off-by: Vibhu Jawa <[email protected]> * test commit Signed-off-by: Vibhu Jawa <[email protected]> * add id map script Signed-off-by: Vibhu Jawa <[email protected]> * Cleanup compute_embeddings_crossfit.py Signed-off-by: Vibhu Jawa <[email protected]> * Cleanup compute_embeddings_crossfit.py Signed-off-by: Vibhu Jawa <[email protected]> * Pre-commit style fixes Signed-off-by: Vibhu Jawa <[email protected]> * clustering_dask_crossfit.py Signed-off-by: Vibhu Jawa <[email protected]> * Minor clean up to sort_clusters_crossfit.py Signed-off-by: Vibhu Jawa <[email protected]> * cleanup semdedup_crossfit Signed-off-by: Vibhu Jawa <[email protected]> * Remove undo changes Signed-off-by: Vibhu Jawa <[email protected]> * Remove rename changes Signed-off-by: Vibhu Jawa <[email protected]> * Fix rename Signed-off-by: Vibhu Jawa <[email protected]> * Readme formatting Signed-off-by: Vibhu Jawa <[email protected]> * add dask to semdedup_crossfit.py Signed-off-by: Vibhu Jawa <[email protected]> * README.md updates Signed-off-by: Vibhu Jawa <[email protected]> * README.md updates Signed-off-by: Vibhu Jawa <[email protected]> * README.md updates Signed-off-by: Vibhu Jawa <[email protected]> * README.md updates Signed-off-by: Vibhu Jawa <[email protected]> * README.md updates Signed-off-by: Vibhu Jawa <[email protected]> * configure max memory using a cli Signed-off-by: Vibhu Jawa <[email protected]> * Dumb id results to parquet Signed-off-by: Vibhu Jawa <[email protected]> * Embedding fixes Signed-off-by: Vibhu Jawa <[email protected]> * README.md updates Signed-off-by: Vibhu Jawa <[email protected]> * Working end to end Signed-off-by: Vibhu Jawa <[email protected]> * Minor yaml fixes Signed-off-by: Vibhu Jawa <[email protected]> * Undo changes to index.rst Signed-off-by: Vibhu Jawa <[email protected]> * Update .pre-commit-config.yaml Signed-off-by: Vibhu Jawa <[email protected]> * Update index.rst Signed-off-by: Vibhu Jawa <[email protected]> * Update index.rst Signed-off-by: Vibhu Jawa <[email protected]> * Undo changes to docs/personalidentifiableinformationidentificationandremoval.rst Signed-off-by: Vibhu Jawa <[email protected]> * Update fuzzy_dedup.py Signed-off-by: Vibhu Jawa <[email protected]> * Undo changes to docs/personalidentifiableinformationidentificationandremoval.rst Signed-off-by: Vibhu Jawa <[email protected]> * Update index.rst Signed-off-by: Vibhu Jawa <[email protected]> * Add end to end script in readme.md Signed-off-by: Vibhu Jawa <[email protected]> * Add type hints Signed-off-by: Vibhu Jawa <[email protected]> * Use dask for sort_clusters Signed-off-by: Vibhu Jawa <[email protected]> * Make sort_clusters work on MNMG scales Signed-off-by: Vibhu Jawa <[email protected]> * Cleaned up dask shutdown Signed-off-by: Vibhu Jawa <[email protected]> * Decrease noise in E2E scripts Signed-off-by: Vibhu Jawa <[email protected]> * Clean up scripts Signed-off-by: Vibhu Jawa <[email protected]> * Fix scripts/end_to_end_script.sh Signed-off-by: Vibhu Jawa <[email protected]> * Some more cleanup Signed-off-by: Vibhu Jawa <[email protected]> * Add copyright Signed-off-by: Vibhu Jawa <[email protected]> * Fix README.md Signed-off-by: Vibhu Jawa <[email protected]> * Address reviews Signed-off-by: Vibhu Jawa <[email protected]> * Make work with a SemDedupConfig Signed-off-by: Vibhu Jawa <[email protected]> * Make work with SemDedupConfig Signed-off-by: Vibhu Jawa <[email protected]> * Move to nemo-curator's logger Signed-off-by: Vibhu Jawa <[email protected]> * Semdedup-extract_dedup_data.py Signed-off-by: Vibhu Jawa <[email protected]> * Update index.rst Signed-off-by: Vibhu Jawa <[email protected]> * Applying SEO Best Pratices (NVIDIA#104) * Rename CPUvsGPU.rst to cpuvsgpu.rst Signed-off-by: Andrew Schilling <[email protected]> * Rename DataCuration.rsts to datacuration.rsts Signed-off-by: Andrew Schilling <[email protected]> * Rename DistributedDataClassification.rst to distributeddataclassification.rst Signed-off-by: Andrew Schilling <[email protected]> * Rename DocumentDataset.rst to documentdataset.rst Signed-off-by: Andrew Schilling <[email protected]> * Rename Download.rst to download.rst Signed-off-by: Andrew Schilling <[email protected]> * Rename GpuDeduplication.rst to gpudeduplication.rst Signed-off-by: Andrew Schilling <[email protected]> * Rename KubernetesCurator.rst to kubernetescurator.rst Signed-off-by: Andrew Schilling <[email protected]> * Rename LanguageIdentificationUnicodeFormatting.rst to languageidentificationunicodeformatting.rst Signed-off-by: Andrew Schilling <[email protected]> * Rename PersonalIdentifiableInformationIdentificationAndRemoval.rst to personalidentifiableinformationidentificationandremoval.rst Signed-off-by: Andrew Schilling <[email protected]> * Rename QualityFiltering.rst to qualityfiltering.rst Signed-off-by: Andrew Schilling <[email protected]> * Rename TaskDecontamination.rst to taskdecontamination.rst Signed-off-by: Andrew Schilling <[email protected]> * Update index.rst Setting all RST files to lowercase names. Signed-off-by: Andrew Schilling <[email protected]> * Ignore docs for EOF fixer hook Signed-off-by: Ayush Dattagupta <[email protected]> --------- Signed-off-by: Andrew Schilling <[email protected]> Signed-off-by: Ayush Dattagupta <[email protected]> Co-authored-by: Ayush Dattagupta <[email protected]> * Update index.rst Signed-off-by: Vibhu Jawa <[email protected]> * Fix bad merge Signed-off-by: Vibhu Jawa <[email protected]> * Update index.rst Signed-off-by: Vibhu Jawa <[email protected]> * Update index.rst Signed-off-by: Vibhu Jawa <[email protected]> * Update index.rst Signed-off-by: Vibhu Jawa <[email protected]> * Update index.rst Signed-off-by: Vibhu Jawa <[email protected]> * Add Module for embedding+clustering Signed-off-by: Vibhu Jawa <[email protected]> * Add sorting to clustering Signed-off-by: Vibhu Jawa <[email protected]> * Refactor Semdup modules Signed-off-by: Vibhu Jawa <[email protected]> * Refactor Semdup modules Signed-off-by: Vibhu Jawa <[email protected]> * Refactor Semdup modules Signed-off-by: Vibhu Jawa <[email protected]> * Fix Readme.md Signed-off-by: Vibhu Jawa <[email protected]> * Add a environment variable to silence HF warnings Signed-off-by: Vibhu Jawa <[email protected]> * dask-cudf fix Signed-off-by: Vibhu Jawa <[email protected]> * dask-cudf fix Signed-off-by: Vibhu Jawa <[email protected]> * dask-cudf fix Signed-off-by: Vibhu Jawa <[email protected]> * Make config a flat file based on reviews Signed-off-by: Vibhu Jawa <[email protected]> * Add docstrings Signed-off-by: Vibhu Jawa <[email protected]> * Fix argparse and seed function Signed-off-by: Vibhu Jawa <[email protected]> * Use argparse to read config Signed-off-by: Vibhu Jawa <[email protected]> * Move around config files Signed-off-by: Vibhu Jawa <[email protected]> * Move around config files Signed-off-by: Vibhu Jawa <[email protected]> * Move around config files Signed-off-by: Vibhu Jawa <[email protected]> * Remove end_to_end_script.sh Signed-off-by: Vibhu Jawa <[email protected]> * Append Readme Signed-off-by: Vibhu Jawa <[email protected]> * Address Reviews Signed-off-by: Vibhu Jawa <[email protected]> * Change config Signed-off-by: Vibhu Jawa <[email protected]> * Make embedding creation optionally lazy Signed-off-by: Vibhu Jawa <[email protected]> * fix docstring Signed-off-by: Vibhu Jawa <[email protected]> * Address Reviews and docstrings Signed-off-by: Vibhu Jawa <[email protected]> * Address Reviews and make eps_thresholds a list of values Signed-off-by: Vibhu Jawa <[email protected]> * Minor import fix Signed-off-by: Vibhu Jawa <[email protected]> * Empty Commit Signed-off-by: Vibhu Jawa <[email protected]> * Add modules to __init__ and README.md Signed-off-by: Vibhu Jawa <[email protected]> * Fix init Signed-off-by: Vibhu Jawa <[email protected]> * Move comment Signed-off-by: Vibhu Jawa <[email protected]> * Empty commit to restart CI (which failed due to a download issue) Signed-off-by: Vibhu Jawa <[email protected]> * Empty commit to restart CI (which failed due to a download issue) Signed-off-by: Vibhu Jawa <[email protected]> --------- Signed-off-by: Andrew Schilling <[email protected]> Signed-off-by: Ayush Dattagupta <[email protected]> Signed-off-by: Vibhu Jawa <[email protected]> Signed-off-by: jgerh <[email protected]> Signed-off-by: avinashvem <[email protected]> Co-authored-by: Andrew Schilling <[email protected]> Co-authored-by: Ayush Dattagupta <[email protected]> Co-authored-by: jgerh <[email protected]> Co-authored-by: avinashvem <[email protected]>
1 parent 4fec0f6 commit 6d7367c

18 files changed

+1683
-12
lines changed

README.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,9 @@ NeMo Curator provides a collection of scalable data-mining modules. Some of the
3939

4040
- [Document-level deduplication](docs/user-guide/gpudeduplication.rst)
4141

42-
- Both exact and fuzzy (near-identical) deduplication are accelerated using cuDF and Dask
42+
- exact and fuzzy (near-identical) deduplication are accelerated using cuDF and Dask
4343
- For fuzzy deduplication, our implementation follows the method described in [Microsoft Turing NLG 530B](https://arxiv.org/abs/2201.11990)
44+
- For semantic deduplication, our implementation follows the method described in [SemDeDup] (https://arxiv.org/pdf/2303.09540) by Meta AI (FAIR) (https://github.com/facebookresearch/SemDeDup)
4445

4546
- [Multilingual downstream-task decontamination](docs/user-guide/taskdecontamination.rst) following the approach of [OpenAI GPT3](https://arxiv.org/pdf/2005.14165.pdf) and [Microsoft Turing NLG 530B](https://arxiv.org/abs/2201.11990)
4647

config/sem_dedup_config.yaml

+32
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Configuration file for semdantic dedup
2+
cache_dir: "semdedup_cache"
3+
num_files: 16
4+
id_col_name: "id"
5+
id_col_type: "int"
6+
input_column: "text"
7+
8+
# Embeddings configuration
9+
embeddings_save_loc: "embeddings"
10+
embedding_model_name_or_path: "sentence-transformers/all-MiniLM-L6-v2"
11+
embedding_batch_size: 128
12+
embedding_max_mem_gb: 25
13+
14+
# Clustering configuration
15+
clustering_save_loc: "clustering_results"
16+
n_clusters: 1000
17+
seed: 1234
18+
max_iter: 100
19+
kmeans_with_cos_dist: false
20+
21+
# Semdedup configuration
22+
which_to_keep: "hard"
23+
largest_cluster_size_to_process: 100000
24+
sim_metric: "cosine"
25+
26+
# Extract dedup configuration
27+
eps_thresholds:
28+
- 0.01
29+
- 0.001
30+
31+
# Which threshold to use for extracting deduped data
32+
eps_to_extract: 0.01

docs/user-guide/index.rst

-1
Original file line numberDiff line numberDiff line change
@@ -46,4 +46,3 @@
4646
personalidentifiableinformationidentificationandremoval.rst
4747
distributeddataclassification.rst
4848
kubernetescurator.rst
49-

examples/semdedup_example.py

+84
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
import logging
16+
import os
17+
import time
18+
19+
from nemo_curator.datasets import DocumentDataset
20+
from nemo_curator.log import create_logger
21+
from nemo_curator.modules.config import SemDedupConfig
22+
from nemo_curator.modules.semantic_dedup import SemDedup
23+
from nemo_curator.utils.distributed_utils import get_client, read_data
24+
from nemo_curator.utils.file_utils import (
25+
expand_outdir_and_mkdir,
26+
get_all_files_paths_under,
27+
)
28+
from nemo_curator.utils.script_utils import ArgumentHelper
29+
30+
31+
def silence_hf_warnings():
32+
from transformers.utils import logging
33+
34+
logging.set_verbosity_error()
35+
36+
37+
def main(args):
38+
semdedup_config = SemDedupConfig.from_yaml(args.config_file)
39+
client = get_client(**ArgumentHelper.parse_client_args(args))
40+
41+
silence_hf_warnings()
42+
client.run(silence_hf_warnings)
43+
44+
expand_outdir_and_mkdir(semdedup_config.cache_dir)
45+
logger = create_logger(
46+
rank=0,
47+
name="logger-end-to_end-semdup",
48+
log_file=os.path.join(semdedup_config.cache_dir, "compute_embeddings.log"),
49+
log_level=logging.INFO,
50+
stdout=True,
51+
)
52+
st = time.time()
53+
input_files = get_all_files_paths_under(
54+
root=args.input_data_dir,
55+
)
56+
if semdedup_config.num_files > 0:
57+
input_files = input_files[: semdedup_config.num_files]
58+
logger.info(f"Processing {len(input_files)} files")
59+
ddf = read_data(
60+
input_files=input_files,
61+
file_type=args.input_file_type,
62+
add_filename=False,
63+
backend="cudf",
64+
)
65+
dataset = DocumentDataset(ddf)
66+
semdup = SemDedup(semdedup_config, logger=logger)
67+
dedup_ids = semdup(dataset)
68+
print(dedup_ids.df.head())
69+
logger.info(f"Time taken: {time.time() - st}")
70+
client.cancel(client.futures, force=True)
71+
client.close()
72+
73+
74+
def attach_args():
75+
parser = ArgumentHelper.parse_semdedup_args(add_input_args=True)
76+
return parser
77+
78+
79+
def console_script():
80+
main(attach_args().parse_args())
81+
82+
83+
if __name__ == "__main__":
84+
main(attach_args().parse_args())

nemo_curator/log.py

+6-2
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
from nemo_curator.utils.file_utils import expand_outdir_and_mkdir
2020

2121

22-
def create_logger(rank, log_file, name="logger", log_level=logging.INFO):
22+
def create_logger(rank, log_file, name="logger", log_level=logging.INFO, stdout=False):
2323
# Create the logger
2424
logger = logging.getLogger(name)
2525
logger.setLevel(log_level)
@@ -36,8 +36,12 @@ def create_logger(rank, log_file, name="logger", log_level=logging.INFO):
3636
file_handler.setFormatter(formatter)
3737
logger.addHandler(file_handler)
3838

39-
logger = logging.LoggerAdapter(logger, extra)
39+
if stdout:
40+
stdout_handler = logging.StreamHandler()
41+
stdout_handler.setFormatter(formatter)
42+
logger.addHandler(stdout_handler)
4043

44+
logger = logging.LoggerAdapter(logger, extra)
4145
return logger
4246

4347

nemo_curator/modules/__init__.py

+16-2
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
from nemo_curator.utils.import_utils import gpu_only_import_from
2323

2424
from .add_id import AddId
25-
from .config import FuzzyDuplicatesConfig
25+
from .config import FuzzyDuplicatesConfig, SemDedupConfig
2626
from .dataset_ops import blend_datasets, Shuffle
2727
from .exact_dedup import ExactDuplicates
2828
from .filter import Filter, Score, ScoreFilter
@@ -36,10 +36,19 @@
3636
FuzzyDuplicates = gpu_only_import_from(
3737
"nemo_curator.modules.fuzzy_dedup", "FuzzyDuplicates"
3838
)
39-
4039
# Pytorch related imports must come after all imports that require cugraph,
4140
# because of context cleanup issues b/w pytorch and cugraph
4241
# See this issue: https://github.com/rapidsai/cugraph/issues/2718
42+
SemDedup = gpu_only_import_from("nemo_curator.modules.semantic_dedup", "SemDedup")
43+
EmbeddingCreator = gpu_only_import_from(
44+
"nemo_curator.modules.semantic_dedup", "EmbeddingCreator"
45+
)
46+
ClusteringModel = gpu_only_import_from(
47+
"nemo_curator.modules.semantic_dedup", "ClusteringModel"
48+
)
49+
SemanticClusterLevelDedup = gpu_only_import_from(
50+
"nemo_curator.modules.semantic_dedup", "SemanticClusterLevelDedup"
51+
)
4352
from .distributed_data_classifier import DomainClassifier, QualityClassifier
4453

4554
__all__ = [
@@ -59,4 +68,9 @@
5968
"AddId",
6069
"blend_datasets",
6170
"Shuffle",
71+
"SemDedup",
72+
"SemDedupConfig",
73+
"EmbeddingCreator",
74+
"ClusteringModel",
75+
"SemanticClusterLevelDedup",
6276
]

nemo_curator/modules/config.py

+69-1
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,8 @@
1313
# limitations under the License.
1414

1515
import warnings
16-
from dataclasses import dataclass
16+
from dataclasses import dataclass, field
17+
from typing import List
1718

1819
import yaml
1920

@@ -98,3 +99,70 @@ def __post_init__(self):
9899
raise ValueError("Jaccard Threshold must be between [0,1]")
99100
if self.buckets_per_shuffle <= 0:
100101
raise ValueError("Buckets per shuffle must be greater than 0")
102+
103+
104+
@dataclass
105+
class SemDedupConfig(BaseConfig):
106+
"""
107+
Configuration for Semantic Deduplication.
108+
109+
Attributes:
110+
cache_dir (str): Directory to store cache.
111+
num_files (int): Number of files. Default is -1, meaning all files.
112+
id_col_name (str): Column name for ID.
113+
id_col_type (str): Column type for ID.
114+
input_column (str): Input column for embeddings.
115+
embeddings_save_loc (str): Location to save embeddings.
116+
embedding_model_name_or_path (str): Model name or path for embeddings.
117+
embedding_batch_size (int): Inital Batch size for processing embeddings.
118+
embedding_max_mem_gb (int): Maximum memory in GB for embeddings.
119+
clustering_save_loc (str): Location to save clustering results.
120+
n_clusters (int): Number of clusters.
121+
seed (int): Seed for clustering.
122+
max_iter (int): Maximum iterations for clustering.
123+
kmeans_with_cos_dist (bool): Use KMeans with cosine distance.
124+
which_to_keep (str): Which duplicates to keep.
125+
largest_cluster_size_to_process (int): Largest cluster size to process.
126+
sim_metric (str): Similarity metric for deduplication.
127+
eps_thresholds (List[float]): Epsilon thresholds to calculate if semantically similar or not.
128+
eps_to_extract (float): Epsilon value to extract deduplicated data.
129+
"""
130+
131+
cache_dir: str
132+
num_files: int = -1
133+
id_col_name: str = "id"
134+
id_col_type: str = "str"
135+
input_column: str = "text"
136+
137+
# Embeddings
138+
embeddings_save_loc: str = "embeddings"
139+
embedding_model_name_or_path: str = "sentence-transformers/all-MiniLM-L6-v2"
140+
embedding_batch_size: int = 128
141+
embedding_max_mem_gb: int = 25
142+
143+
# Clustering config
144+
clustering_save_loc: str = "clustering_results"
145+
n_clusters: int = 1000
146+
seed: int = 1234
147+
max_iter: int = 100
148+
kmeans_with_cos_dist: bool = False
149+
150+
# Semdedup config
151+
which_to_keep: str = "hard"
152+
largest_cluster_size_to_process: int = 100000
153+
sim_metric: str = "cosine"
154+
155+
# Extract dedup config
156+
eps_thresholds: List[float] = field(default_factory=lambda: [0.01, 0.001])
157+
eps_to_extract: float = 0.01
158+
159+
def __post_init__(self):
160+
if self.cache_dir is None:
161+
raise ValueError(
162+
"Finding sem-dedup requires a cache directory accessible via all workers to store intermediates"
163+
)
164+
165+
if self.eps_to_extract not in self.eps_thresholds:
166+
raise ValueError(
167+
f"Epsilon to extract {self.eps_to_extract} must be in eps_thresholds {self.eps_thresholds}"
168+
)

0 commit comments

Comments
 (0)