Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve domain loading: Tech debt and associated bugs #10807

Closed
6 tasks done
joejuzl opened this issue Feb 3, 2022 · 4 comments · Fixed by #10841
Closed
6 tasks done

Improve domain loading: Tech debt and associated bugs #10807

joejuzl opened this issue Feb 3, 2022 · 4 comments · Fixed by #10841
Assignees
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework effort:atom-squad/4 Label which is used by the Rasa Atom squad to do internal estimation of task sizes.

Comments

@joejuzl
Copy link
Contributor

joejuzl commented Feb 3, 2022

Background

A few bugs were identified around the merging of multiple domain files.
While fixing these bugs relatively quickly we accrued some technical debt.
After further investigation into how domain loading and merging works some inconsistencies and bugs were found. This ticket encompasses the tech debt and bug fixes, but not the behaviour changes. As the tech debt and bug are very related, it makes sense to do them in one go.
More details: https://www.notion.so/rasa/Improve-domain-loading-b3998eafc5f7406dab0be88613dfd15d

Overview

Bugs

  • Domain merging fails when there is a dictionary entity.
    • This can be fixed by using the same logic currently used for intents
  • If a directory is passed to Domain.load it will use Domain.merge_domain_dicts, if a list of paths is passed it will use Domain.merge(which behaves differently). (And it can end up being a mix of both i.e. if one of the paths is a directory).
    • This should be addressed by the tech debt below.

Tech debt

  • We should combine Domain.merge_domain_dicts and Doman.merge into one method. (Or at least share the core functionality)
  • We call instance method merge_domain_dicts on a class. This should be fixed.
  • Internal methods such as merge_dicts should not be public. We are restricted by and have to support the public interfaces of classes like Domain.
    • Can some of these utility methods be moved into a shared utility package?
  • We store self.duplicates on the Domain instance. Can we just warn as we merge/load and then raise InvalidDomain without storing this data as a public instance variable?
    • Currently domain.duplicates is checked later by Validator.verify_domain_duplicates . Maybe we can remove this and just do it in the Domain class
  • Domain._check_domain_sanity appears to duplicate some of the duplication checks - can we remove/combine it?
  • Lots of the difficulty of merging domains comes from the fact that we change the internal representation e.g. _transform_intent_properties_for_internal_use. This is difficult to revert. Instead can we always keep the original domain dict on the instance, and use that for both domain persistence and merging?
    • We need to confirm that the domain is not mutated in anyway that would render the original dict out-of-date or invalid.

Definition of Done

  • sync with @usc-m to make sure everyone is aware of related ongoing work
  • Dictionary entities work with multiple domains
  • There is only one domain merging method
  • Merging utility methods are not public
  • Domain validation happens in one place and time
  • Domains dump using their initial dictionary representation (i.e. it is not transformed then transformed back again).
@joejuzl joejuzl added the area:rasa-oss 🎡 Anything related to the open source Rasa framework label Feb 3, 2022
@joejuzl joejuzl changed the title Improve domain loading Improve domain loading: Tech debt and associated bugs Feb 3, 2022
@TyDunn TyDunn added the effort:atom-squad/4 Label which is used by the Rasa Atom squad to do internal estimation of task sizes. label Feb 4, 2022
@ancalita
Copy link
Member

ancalita commented Feb 8, 2022

Domain merging fails when there is a dictionary entity.

This can be fixed by using the same logic currently used for intents

@joejuzl I would need more details on this bug, do you have an example or open ticket that I can look at? Not sure I understand what dictionary entity refers to and what the logic used for intents is.

@joejuzl
Copy link
Contributor Author

joejuzl commented Feb 9, 2022

@ancalita

Take this example:

entities:
  - GPE:
      roles:
        - destination
        - origin
  - name

The entity name is a string, whereas GPE is a mapping/dictionary.
If you are merging domain files and there is an entity like GPE it will fail.

@ancalita
Copy link
Member

ancalita commented Feb 10, 2022

@joejuzl I wanted to ask for your opinion on two approaches I thought for unifying the 2 merge methods (merge and merge_domain_dicts):

  • The first one requires quite a large change in the codebase:
  1. I explained in this PR description why it's currently hard to unify these methods and why in the current PR they're only sharing the core functionality.

  2. A different way is to modify the from_path, from_file ... from_yaml to return a domain dict, rather than a Domain instance. Then I could modify load method this way:

@classmethod
    def load(cls, paths: Union[List[Union[Path, Text]], Text, Path]) -> "Domain":
        if not paths:
            raise InvalidDomain(
                "No domain file was specified. Please specify a path "
                "to a valid domain file."
            )
        elif not isinstance(paths, list) and not isinstance(paths, set):
            paths = [paths]

        domain = Domain.empty().as_dict()
        for path in paths:
            other_domain_dict = cls.from_path(path)
            domain = cls.merge_domain_dicts(domain, other_domain_dict)

        return cls.from_dict(domain)

This however would require a large amount of tests that load domain paths to be changed.

  • I also explored keeping a self.data attribute storing the original domain dict in the constructor, then modifying the load method as such:
@classmethod
    def load(cls, paths: Union[List[Union[Path, Text]], Text, Path]) -> "Domain":
        ...

        domain_dict = Domain.empty().as_dict()
        for path in paths:
            other = cls.from_path(path)
            other_dict = other.as_dict()
            other_dict.update(other.data)
            domain_dict = cls.merge_domain_dicts(domain_dict, other_dict)

        return cls.from_dict(domain_dict)

This might require amendments in tests/codebase where Domain constructor is used directly to add the new positional argument data.
Which do you think it's a better option?

@joejuzl
Copy link
Contributor Author

joejuzl commented Feb 10, 2022

I also explored keeping a self.data attribute storing the original domain dict in the constructor, then modifying the load method as such

I think this is a good approach. I also think this could help make other things simpler. A lot of the bugs and complexities from merging come from the fact that we have to get back to the original representation to merge e.g. all the annoyance with use_entities -> used_entities etc. We also have to deal with this when we persist the domain, i.e. take it back to a dict/yaml form (see stuff like Domain.cleaned_domain() and .as_dict()).
If instead we keep a self.data, like you say, then we can always access this as the source of truth. And the rest of the domain is really just a window onto this original representation.

However I haven't looked deeply into the implications of this - so it may be harder than it sounds. Definitely worth exploring in my opinion though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework effort:atom-squad/4 Label which is used by the Rasa Atom squad to do internal estimation of task sizes.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants