Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce pydantic schema #5531

Closed
wants to merge 7 commits into from

Conversation

skshetry
Copy link
Member

@skshetry skshetry commented Mar 1, 2021

From the dvcyamlschema/gen.py, added some 2.0 stuff and improved schema in the hopes that this will replace the Voluptuous schema we have for dvc.yaml at least.

I'm still hesitant to go full-blown with Pydantic yet, as making both of these schemas compatible is harder than I thought previously, which means there might be some bugs lurking around. The tests do pass with this new schema, but they don't tests the negative cases well. And, we also need to consider performance (voluptuous is faster, but is it worth to continue using it as it's DX is very bad?).

This pydantic schema can be used to provide jsonschema as well, you can just use it as:

from dvc.schema.dvc_yaml import Schema

print(Schema.schema_json(indent=4))

The dvcyaml_schema repo will simply use this API if/when this gets merged, which will make it closer to where we work.
See also #5371 (comment), pydantic has a bit better error messages than Voluptuous.

Related: iterative/dvcyaml-schema#7

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

@skshetry skshetry added the enhancement Enhances DVC label Mar 1, 2021
@skshetry skshetry added this to the DVC 2.0 milestone Mar 1, 2021
@skshetry skshetry self-assigned this Mar 1, 2021
@@ -122,7 +123,7 @@ def _check_gitignored(self):
if self._is_git_ignored():
raise FileIsGitIgnored(self.path)

def _load(self):
def _load(self, use_pydantic: bool = False):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To test new schema, you can toggle this to True, and it should start using the new schema for validation.

dvc/dvcfile.py Outdated Show resolved Hide resolved
@skshetry
Copy link
Member Author

skshetry commented Mar 30, 2021

Closing it for now, will come back to it after UI is done. Unfortunately, I haven't been able to work more on this right now, so this needs more research.

As this PR stands, it is only meant to change the validation schema, so I can't provide any compelling argument other than to say that we are maintaining this schema already on dvcyaml-schema for generating schema.json and with this PR, it provides benefits of both generating schema.json and validating dvc.yaml file with a single schema definition.

It feels like cheating if I make other arguments just to make a point on its potential benefits that we might get in the future, which I am quite not sure of myself. I do believe we'll come to this again soon. 🙂

@skshetry skshetry closed this Mar 30, 2021
@skshetry skshetry deleted the pydantic-schema-2 branch March 30, 2021 12:03
@daavoo
Copy link
Contributor

daavoo commented Jul 26, 2021

Closing it for now, will come back to it after UI is done. Unfortunately, I haven't been able to work more on this right now, so this needs more research.

As this PR stands, it is only meant to change the validation schema, so I can't provide any compelling argument other than to say that we are maintaining this schema already on dvcyaml-schema for generating schema.json and with this PR, it provides benefits of both generating schema.json and validating dvc.yaml file with a single schema definition.

It feels like cheating if I make other arguments just to make a point on its potential benefits that we might get in the future, which I am quite not sure of myself. I do believe we'll come to this again soon. slightly_smiling_face

I'm not aware of the internal implications of this change but I think that having the schema inside the core repo makes a lot of sense. Issues related with having dvcyaml-schema outdated have been raised at least a few times. (And some discord messages, i.e. recently https://discord.com/channels/485586884165107732/485596304961962003/869072338778468373)

Probably not enough to be considered an argument at this point but I think that adding this feature (and introducing internal usage of pydantic) could open new possibilities regarding params validation (#5477) and enhancements to the overall params UI. I was working on a proposal about having internal params validation and potentially allowing users to provide it's own params schemas. Having this P.R. as foundation would make it way easier to implement.

@skshetry skshetry restored the pydantic-schema-2 branch April 27, 2022 03:56
@skshetry skshetry deleted the pydantic-schema-2 branch June 15, 2023 13:35
@SoyGema
Copy link

SoyGema commented Oct 20, 2023

Sorry. Better in #9606 (comment)

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants