Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate yaml schema validation #7954

Closed
1 task
wochinge opened this issue Feb 15, 2021 · 4 comments
Closed
1 task

Investigate yaml schema validation #7954

wochinge opened this issue Feb 15, 2021 · 4 comments
Assignees
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/training-data Issues focused around Rasa training data (stories, NLU, domain, etc.) effort:atom-squad/2 Label which is used by the Rasa Atom squad to do internal estimation of task sizes. feature:ux-cli+training-data Feature: Improve user experience with Rasa CLI and training data for developers type:maintenance 🔧 Improvements to tooling, testing, deployments, infrastructure, code style.

Comments

@wochinge
Copy link
Contributor

wochinge commented Feb 15, 2021

Description of Problem:
When we read YAML training data files, we validate them against a schema. This is currently taking a lot of time. Skipping this step e.g. brings down the training data reading times from 18 minutes to 8 minutes.

Overview of the Solution:

  • is there a faster library?
  • Can we accelerate our current library?
  • Should we add a flag to the CI to disable the schema validation?

Definition of Done:

  • Propose next steps to speed up the YAML schema validation
@wochinge wochinge added type:maintenance 🔧 Improvements to tooling, testing, deployments, infrastructure, code style. area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/training-data Issues focused around Rasa training data (stories, NLU, domain, etc.) labels Feb 15, 2021
@wochinge
Copy link
Contributor Author

related #7977

We should also double check if the schema extension here is needed or can be accelerated

@TyDunn TyDunn added the feature:ux-cli+training-data Feature: Improve user experience with Rasa CLI and training data for developers label Feb 17, 2021
@gausie
Copy link
Contributor

gausie commented Mar 5, 2021

We should consider using jsonschema (and making the schemas public)

@wochinge wochinge added the effort:atom-squad/2 Label which is used by the Rasa Atom squad to do internal estimation of task sizes. label Mar 5, 2021
@alwx
Copy link
Contributor

alwx commented Mar 22, 2021

We decided to drop this one because the performance is good already and because we have more important stuff to work on.

@alwx alwx closed this as completed Mar 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/training-data Issues focused around Rasa training data (stories, NLU, domain, etc.) effort:atom-squad/2 Label which is used by the Rasa Atom squad to do internal estimation of task sizes. feature:ux-cli+training-data Feature: Improve user experience with Rasa CLI and training data for developers type:maintenance 🔧 Improvements to tooling, testing, deployments, infrastructure, code style.
Projects
None yet
Development

No branches or pull requests

5 participants