-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref: Add ruff rules for bandit (S) #4111
Conversation
@@ -143,7 +143,7 @@ def parse_text_file_to_data(file_path: str, silent_errors: bool) -> Data | None: | |||
elif file_path.endswith((".yaml", ".yml")): | |||
text = yaml.safe_load(text) | |||
elif file_path.endswith(".xml"): | |||
xml_element = ET.fromstring(text) | |||
xml_element = ET.fromstring(text) # noqa: S314 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this data trusted ? Rule S314 suggests to use the safer defusedxml lib if it's not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data comes from the user, which means if they load untrusted data, it will damage their system. If defusedxml works just as well, then I say we use it instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added defusedxml.
@@ -25,7 +25,7 @@ def build_agent(self) -> AgentExecutor: | |||
path = Path(self.path) | |||
if self.path.endswith("yaml") or self.path.endswith("yml"): | |||
with path.open() as file: | |||
yaml_dict = yaml.load(file, Loader=yaml.FullLoader) | |||
yaml_dict = yaml.full_load(file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this data trusted ? If not, we should use yaml.safe_load()
here. Is there a requirement to use yaml.FullLoader
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user is the one providing the data which means they should know if the data is trusted or not.
If there's no loss in using safer ways we should go for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FullLoader allows to use advanced (and unsafe) features of yaml such as:
>>> someval = "123"
>>> yaml.load("!!python/name:__main__.someval", Loader=yaml.FullLoader)
'123'
I think it's quite unusual to use these features that don't exist in JSON, so I'll replace with safe_load
if it's OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added use of safe_load
@@ -81,7 +81,9 @@ class Config: | |||
msg = f"Error loading documents: {e}" | |||
raise ValueError(msg) from e | |||
|
|||
assert len(docs) == 1, "Expected a single document to be loaded." | |||
if len(docs) != 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
"S102", # Use of exec | ||
] | ||
"langflow/services/cache/*" = [ | ||
"S301", # Use of pickle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pickle is not safe if data is untrusted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to use pickle anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, pickle is used in caches to serialize objects to bytes. Which means someone that has write access to the cache (disk, Redis, ...) can use it to execute arbitrary code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put an exemption for now. We can check how to replace pickle in another PR later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Add ruff rules for bandit (S)