Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename data-leakage checker and Update randomness-control checkers #84

Merged
merged 9 commits into from
May 27, 2022
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,15 +40,15 @@ hyperparameters-tensorflow,hyperparameters-pytorch,memory-release-tensorflow,\
deterministic-pytorch,randomness-control-numpy,randomness-control-scikitlearn,\
randomness-control-tensorflow,randomness-control-pytorch,randomness-control-dataloader-pytorch,\
missing-mask-tensorflow,missing-mask-pytorch,tensor-array-tensorflow,\
forward-pytorch,gradient-clear-pytorch,data-leakage-scikitlearn,\
forward-pytorch,gradient-clear-pytorch,pipeline-not-used-scikitlearn,\
dependent-threshold-scikitlearn,dependent-threshold-tensorflow,dependent-threshold-pytorch \
--output-format=json:report.json,text:report.txt,colorized \
--output-format=text:report.txt,colorized \
--reports=y \
<path_to_sources>
```
[For Windows Users]:
```
pylint --load-plugins=dslinter --disable=all --enable=import,unnecessary-iteration-pandas,unnecessary-iteration-tensorflow,nan-numpy,chain-indexing-pandas,datatype-pandas,column-selection-pandas,merge-parameter-pandas,inplace-pandas,dataframe-conversion-pandas,scaler-missing-scikitlearn,hyperparameters-scikitlearn,hyperparameters-tensorflow,hyperparameters-pytorch,memory-release-tensorflow,deterministic-pytorch,randomness-control-numpy,randomness-control-scikitlearn,randomness-control-tensorflow,randomness-control-pytorch,randomness-control-dataloader-pytorch,missing-mask-tensorflow,missing-mask-pytorch,tensor-array-tensorflow,forward-pytorch,gradient-clear-pytorch,data-leakage-scikitlearn,dependent-threshold-scikitlearn,dependent-threshold-tensorflow,dependent-threshold-pytorch --output-format=json:report.json,text:report.txt,colorized --reports=y <path_to_sources>
pylint --load-plugins=dslinter --disable=all --enable=import,unnecessary-iteration-pandas,unnecessary-iteration-tensorflow,nan-numpy,chain-indexing-pandas,datatype-pandas,column-selection-pandas,merge-parameter-pandas,inplace-pandas,dataframe-conversion-pandas,scaler-missing-scikitlearn,hyperparameters-scikitlearn,hyperparameters-tensorflow,hyperparameters-pytorch,memory-release-tensorflow,deterministic-pytorch,randomness-control-numpy,randomness-control-scikitlearn,randomness-control-tensorflow,randomness-control-pytorch,randomness-control-dataloader-pytorch,missing-mask-tensorflow,missing-mask-pytorch,tensor-array-tensorflow,forward-pytorch,gradient-clear-pytorch,pipeline-not-used-scikitlearn,dependent-threshold-scikitlearn,dependent-threshold-tensorflow,dependent-threshold-pytorch --output-format=text:report.txt,colorized --reports=y <path_to_sources>
```
Or place a [`.pylintrc` configuration file](https://github.com/Hynn01/dslinter/blob/main/docs/pylint-configuration-examples/pylintrc-with-only-dslinter-settings/.pylintrc) which contains above settings in the folder where you run your command on, and run:
```
Expand Down Expand Up @@ -141,7 +141,7 @@ poetry run pytest .

- **W5517 | gradient-clear-pytorch | Gradient Clear Checker(PyTorch)**: The loss_fn.backward() and optimizer.step() should be used together with optimizer.zero_grad(). If the `.zero_grad()` is missing in the code, the rule is violated.

- **W5518 | data-leakage-scikitlearn | Data Leakage Checker(ScikitLearn)**: All scikit-learn estimators should be used inside Pipelines, to prevent data leakage between training and test data.
- **W5518 | pipeline-not-used-scikitlearn | Pipeline Checker(ScikitLearn)**: All scikit-learn estimators should be used inside Pipelines, to prevent data leakage between training and test data.

- **W5519 | dependent-threshold-scikitlearn | Dependent Threshold Checker(TensorFlow)**: If threshold-dependent evaluation(e.g., f-score) is used in the code, check whether threshold-indenpendent evaluation(e.g., auc) metrics is also used in the code.

Expand Down
12 changes: 6 additions & 6 deletions STEPS_TO_FOLLOW.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,15 @@ hyperparameters-tensorflow,hyperparameters-pytorch,memory-release-tensorflow,\
deterministic-pytorch,randomness-control-numpy,randomness-control-scikitlearn,\
randomness-control-tensorflow,randomness-control-pytorch,randomness-control-dataloader-pytorch,\
missing-mask-tensorflow,missing-mask-pytorch,tensor-array-tensorflow,\
forward-pytorch,gradient-clear-pytorch,data-leakage-scikitlearn,\
forward-pytorch,gradient-clear-pytorch,pipeline-not-used-scikitlearn,\
dependent-threshold-scikitlearn,dependent-threshold-tensorflow,dependent-threshold-pytorch \
--output-format=json:report.json,text:report.txt,colorized \
--output-format=text:report.txt,colorized \
--reports=y \
<path_to_the_project>
```
[For Windows Users]:
```
pylint --load-plugins=dslinter --disable=all --enable=import,unnecessary-iteration-pandas,unnecessary-iteration-tensorflow,nan-numpy,chain-indexing-pandas,datatype-pandas,column-selection-pandas,merge-parameter-pandas,inplace-pandas,dataframe-conversion-pandas,scaler-missing-scikitlearn,hyperparameters-scikitlearn,hyperparameters-tensorflow,hyperparameters-pytorch,memory-release-tensorflow,deterministic-pytorch,randomness-control-numpy,randomness-control-scikitlearn,randomness-control-tensorflow,randomness-control-pytorch,randomness-control-dataloader-pytorch,missing-mask-tensorflow,missing-mask-pytorch,tensor-array-tensorflow,forward-pytorch,gradient-clear-pytorch,data-leakage-scikitlearn,dependent-threshold-scikitlearn,dependent-threshold-tensorflow,dependent-threshold-pytorch --output-format=json:report.json,text:report.txt,colorized --reports=y <path_to_sources>
pylint --load-plugins=dslinter --disable=all --enable=import,unnecessary-iteration-pandas,unnecessary-iteration-tensorflow,nan-numpy,chain-indexing-pandas,datatype-pandas,column-selection-pandas,merge-parameter-pandas,inplace-pandas,dataframe-conversion-pandas,scaler-missing-scikitlearn,hyperparameters-scikitlearn,hyperparameters-tensorflow,hyperparameters-pytorch,memory-release-tensorflow,deterministic-pytorch,randomness-control-numpy,randomness-control-scikitlearn,randomness-control-tensorflow,randomness-control-pytorch,randomness-control-dataloader-pytorch,missing-mask-tensorflow,missing-mask-pytorch,tensor-array-tensorflow,forward-pytorch,gradient-clear-pytorch,pipeline-not-used-scikitlearn,dependent-threshold-scikitlearn,dependent-threshold-tensorflow,dependent-threshold-pytorch --output-format=text:report.txt,colorized --reports=y <path_to_sources>
```

## For Notebook:
Expand Down Expand Up @@ -67,13 +67,13 @@ hyperparameters-tensorflow,hyperparameters-pytorch,memory-release-tensorflow,\
deterministic-pytorch,randomness-control-numpy,randomness-control-scikitlearn,\
randomness-control-tensorflow,randomness-control-pytorch,randomness-control-dataloader-pytorch,\
missing-mask-tensorflow,missing-mask-pytorch,tensor-array-tensorflow,\
forward-pytorch,gradient-clear-pytorch,data-leakage-scikitlearn,\
forward-pytorch,gradient-clear-pytorch,pipeline-not-used-scikitlearn,\
dependent-threshold-scikitlearn,dependent-threshold-tensorflow,dependent-threshold-pytorch \
--output-format=json:report.json,text:report.txt,colorized \
--output-format=text:report.txt,colorized \
--reports=y \
<path_to_the_python_file>
```
[For Windows Users]:
```
pylint --load-plugins=dslinter --disable=all --enable=import,unnecessary-iteration-pandas,unnecessary-iteration-tensorflow,nan-numpy,chain-indexing-pandas,datatype-pandas,column-selection-pandas,merge-parameter-pandas,inplace-pandas,dataframe-conversion-pandas,scaler-missing-scikitlearn,hyperparameters-scikitlearn,hyperparameters-tensorflow,hyperparameters-pytorch,memory-release-tensorflow,deterministic-pytorch,randomness-control-numpy,randomness-control-scikitlearn,randomness-control-tensorflow,randomness-control-pytorch,randomness-control-dataloader-pytorch,missing-mask-tensorflow,missing-mask-pytorch,tensor-array-tensorflow,forward-pytorch,gradient-clear-pytorch,data-leakage-scikitlearn,dependent-threshold-scikitlearn,dependent-threshold-tensorflow,dependent-threshold-pytorch --output-format=json:report.json,text:report.txt,colorized --reports=y <path_to_the_python_file>
pylint --load-plugins=dslinter --disable=all --enable=import,unnecessary-iteration-pandas,unnecessary-iteration-tensorflow,nan-numpy,chain-indexing-pandas,datatype-pandas,column-selection-pandas,merge-parameter-pandas,inplace-pandas,dataframe-conversion-pandas,scaler-missing-scikitlearn,hyperparameters-scikitlearn,hyperparameters-tensorflow,hyperparameters-pytorch,memory-release-tensorflow,deterministic-pytorch,randomness-control-numpy,randomness-control-scikitlearn,randomness-control-tensorflow,randomness-control-pytorch,randomness-control-dataloader-pytorch,missing-mask-tensorflow,missing-mask-pytorch,tensor-array-tensorflow,forward-pytorch,gradient-clear-pytorch,pipeline-not-used-scikitlearn,dependent-threshold-scikitlearn,dependent-threshold-tensorflow,dependent-threshold-pytorch --output-format=text:report.txt,colorized --reports=y <path_to_the_python_file>
```
19 changes: 15 additions & 4 deletions dslinter/checkers/deterministic_pytorch.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,15 @@ def visit_module(self, module: astroid.Module):
if _import_pytorch is False:
_import_pytorch = has_import(node, "torch")

if isinstance(node, astroid.nodes.Expr) and hasattr(node, "value"):
call_node = node.value
if isinstance(node, astroid.nodes.Expr):
if _has_deterministic_algorithm_option is False:
_has_deterministic_algorithm_option = self._check_deterministic_algorithm_option(call_node)
_has_deterministic_algorithm_option = self._check_deterministic_algorithm_option_in_expr_node(node)

if isinstance(node, astroid.nodes.FunctionDef):
for nod in node.body:
if isinstance(nod, astroid.nodes.Expr):
if _has_deterministic_algorithm_option is False:
_has_deterministic_algorithm_option = self._check_deterministic_algorithm_option_in_expr_node(nod)

# check if the rules are violated
if(
Expand All @@ -70,7 +75,13 @@ def visit_module(self, module: astroid.Module):
ExceptionHandler.handle(self, module)

@staticmethod
def _check_deterministic_algorithm_option(call_node: astroid.Call):
def _check_deterministic_algorithm_option_in_expr_node(expr_node: astroid.Expr):
if hasattr(expr_node, "value"):
call_node = expr_node.value
return DeterministicAlgorithmChecker._check_deterministic_algorithm_option_in_call_node(call_node)

@staticmethod
def _check_deterministic_algorithm_option_in_call_node(call_node: astroid.Call):
# if torch.use_deterministic_algorithm() is call and the argument is True,
# set _has_deterministic_algorithm_option to True
if(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,17 @@
from dslinter.utils.resources import Resources


class DataLeakageScikitLearnChecker(BaseChecker):
class PipelineScikitLearnChecker(BaseChecker):
"""Checker which checks rules for preventing data leakage between training and test data."""

__implements__ = IAstroidChecker

name = "data-leakage-scikitlearn"
name = "pipeline-not-used-scikitlearn"
priority = -1
msgs = {
"W5518": (
"There are both preprocessing and estimation operations in the code, but they are not used in a pipeline.",
"data-leakage-scikitlearn",
"pipeline-not-used-scikitlearn",
"Scikit-learn preprocessors and estimators should be used inside pipelines, to prevent data leakage between training and test data.",
),
}
Expand Down Expand Up @@ -84,7 +84,7 @@ def visit_call(self, call_node: astroid.Call):
if self._expr_is_preprocessor(value.func.expr):
has_preprocessing_function = True
if has_learning_function is True and has_preprocessing_function is True:
self.add_message("data-leakage-scikitlearn", node=call_node)
self.add_message("pipeline-not-used-scikitlearn", node=call_node)

except: # pylint: disable=bare-except
ExceptionHandler.handle(self, call_node)
Expand All @@ -98,14 +98,14 @@ def _expr_is_estimator(expr: astroid.node_classes.NodeNG) -> bool:
:return: True when the expression is an estimator.
"""
if isinstance(expr, astroid.Call) \
and DataLeakageScikitLearnChecker._call_initiates_estimator(expr):
and PipelineScikitLearnChecker._call_initiates_estimator(expr):
return True

# If expr is a Name, check whether that name is assigned to an estimator.
if isinstance(expr, astroid.Name):
values = AssignUtil.assignment_values(expr)
for value in values:
if DataLeakageScikitLearnChecker._expr_is_estimator(value):
if PipelineScikitLearnChecker._expr_is_estimator(value):
return True
return False

Expand All @@ -120,7 +120,7 @@ def _call_initiates_estimator(call: astroid.Call) -> bool:
return (
call.func is not None
and hasattr(call.func, "name")
and call.func.name in DataLeakageScikitLearnChecker._get_estimator_classes()
and call.func.name in PipelineScikitLearnChecker._get_estimator_classes()
)

@staticmethod
Expand Down
19 changes: 15 additions & 4 deletions dslinter/checkers/randomness_control_numpy.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,10 +60,15 @@ def visit_module(self, module: astroid.Module):
if _import_ml_libraries is False:
_import_ml_libraries = has_importfrom_sklearn(node)

if isinstance(node, astroid.nodes.Expr) and hasattr(node, "value"):
call_node = node.value
if isinstance(node, astroid.nodes.Expr):
if _has_numpy_manual_seed is False:
_has_numpy_manual_seed = self._check_numpy_manual_seed(call_node)
_has_numpy_manual_seed = self._check_numpy_manual_seed_in_expr_node(node)

if isinstance(node, astroid.nodes.FunctionDef):
for nod in node.body:
if isinstance(nod, astroid.nodes.Expr):
if _has_numpy_manual_seed is False:
_has_numpy_manual_seed = self._check_numpy_manual_seed_in_expr_node(nod)

# check if the rules are violated
if(
Expand All @@ -76,7 +81,13 @@ def visit_module(self, module: astroid.Module):
ExceptionHandler.handle(self, module)

@staticmethod
def _check_numpy_manual_seed(call_node: astroid.Call):
def _check_numpy_manual_seed_in_expr_node(expr_node: astroid.Expr):
if hasattr(expr_node, "value"):
call_node = expr_node.value
return RandomnessControlNumpyChecker._check_numpy_manual_seed_in_call_node(call_node)

@staticmethod
def _check_numpy_manual_seed_in_call_node(call_node: astroid.Call):
if(
hasattr(call_node, "func")
and hasattr(call_node.func, "attrname")
Expand Down
19 changes: 15 additions & 4 deletions dslinter/checkers/randomness_control_pytorch.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,15 @@ def visit_module(self, module: astroid.Module):
if _import_pytorch is False:
_import_pytorch = has_import(node, "torch")

if isinstance(node, astroid.nodes.Expr) and hasattr(node, "value"):
call_node = node.value
if isinstance(node, astroid.nodes.Expr):
if _has_pytorch_manual_seed is False:
_has_pytorch_manual_seed = self._check_pytorch_manual_seed(call_node)
_has_pytorch_manual_seed = self._check_pytorch_manual_seed_in_expr_node(node)

if isinstance(node, astroid.nodes.FunctionDef):
for nod in node.body:
if isinstance(nod, astroid.nodes.Expr):
if _has_pytorch_manual_seed is False:
_has_pytorch_manual_seed = self._check_pytorch_manual_seed_in_expr_node(nod)

# check if the rules are violated
if(
Expand All @@ -68,7 +73,13 @@ def visit_module(self, module: astroid.Module):
ExceptionHandler.handle(self, module)

@staticmethod
def _check_pytorch_manual_seed(call_node: astroid.Call):
def _check_pytorch_manual_seed_in_expr_node(expr_node: astroid.Expr):
if hasattr(expr_node, "value"):
call_node = expr_node.value
return RandomnessControlPytorchChecker._check_pytorch_manual_seed_in_call_node(call_node)

@staticmethod
def _check_pytorch_manual_seed_in_call_node(call_node: astroid.Call):
if(
hasattr(call_node, "func")
and hasattr(call_node.func, "attrname")
Expand Down
19 changes: 15 additions & 4 deletions dslinter/checkers/randomness_control_tensorflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,15 @@ def visit_module(self, module: astroid.Module):
if _import_tensorflow is False:
_import_tensorflow = has_import(node, "tensorflow")

if isinstance(node, astroid.nodes.Expr) and hasattr(node, "value"):
call_node = node.value
if isinstance(node, astroid.nodes.Expr):
if _has_tensorflow_manual_seed is False:
_has_tensorflow_manual_seed = self._check_tensorflow_manual_seed(call_node)
_has_tensorflow_manual_seed = self._check_tensorflow_manual_seed_in_expr_node(node)

if isinstance(node, astroid.nodes.FunctionDef):
for nod in node.body:
if isinstance(nod, astroid.nodes.Expr):
if _has_tensorflow_manual_seed is False:
_has_tensorflow_manual_seed = self._check_tensorflow_manual_seed_in_expr_node(nod)

# check if the rules are violated
if(
Expand All @@ -68,7 +73,13 @@ def visit_module(self, module: astroid.Module):
ExceptionHandler.handle(self, module)

@staticmethod
def _check_tensorflow_manual_seed(call_node: astroid.Call):
def _check_tensorflow_manual_seed_in_expr_node(expr_node: astroid.Expr):
if hasattr(expr_node, "value"):
call_node = expr_node.value
return RandomnessControlTensorflowChecker._check_tensorflow_manual_seed_in_call_node(call_node)

@staticmethod
def _check_tensorflow_manual_seed_in_call_node(call_node: astroid.Call):
if(
hasattr(call_node, "func")
and hasattr(call_node.func, "attrname")
Expand Down
4 changes: 2 additions & 2 deletions dslinter/plugin.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
from dslinter.checkers.unnecessary_iteration_pandas import UnnecessaryIterationPandasChecker
from dslinter.checkers.unnecessary_iteration_tensorflow import UnnecessaryIterationTensorflowChecker
from dslinter.checkers.deterministic_pytorch import DeterministicAlgorithmChecker
from dslinter.checkers.data_leakage_scikitlearn import DataLeakageScikitLearnChecker
from dslinter.checkers.pipeline_scikitlearn import PipelineScikitLearnChecker
from dslinter.checkers.hyperparameters_pytorch import HyperparameterPyTorchChecker
from dslinter.checkers.hyperparameters_tensorflow import HyperparameterTensorflowChecker
# pylint: disable = line-too-long
Expand Down Expand Up @@ -58,7 +58,7 @@ def register(linter):
linter.register_checker(RandomnessControlDataloaderPytorchChecker(linter))
linter.register_checker(RandomnessControlTensorflowChecker(linter))
linter.register_checker(RandomnessControlNumpyChecker(linter))
linter.register_checker(DataLeakageScikitLearnChecker(linter))
linter.register_checker(PipelineScikitLearnChecker(linter))
linter.register_checker(DependentThresholdPytorchChecker(linter))
linter.register_checker(DependentThresholdTensorflowChecker(linter))
linter.register_checker(DependentThresholdScikitLearnChecker(linter))
Expand Down
Loading