Skip to content

Conversation

@boy2000-007man
Copy link
Contributor

What does this PR do?

  • implement AC automaton to supersede Trie to fix DisjunctiveConstraint edge case
  • add ConjunctiveDisjunctiveConstraint to handle the complex combinations between multiple conjunctive and disjunctive constraints
  • update stronger unit tests

Fixes # (issue)

#17831

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@patrickvonplaten, @cwkeam

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@patrickvonplaten
Copy link
Contributor

Hey @boy2000-007man,

Thanks for the fix proposal! @cwkeam could you take a look here as well? :-)

@boy2000-007man - it'd be really nice if you could add a test that would have failed without your fix, but will now pass.

Thanks a lot for working on this!

@boy2000-007man boy2000-007man marked this pull request as ready for review July 3, 2022 06:36
self.assertTrue(dc.current_seq == [1, 2, 5])
self.assertTrue(dc.advance() is None)

def test_dc_example_progression_mid_overlap_two(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickvonplaten , here is the updated test case for the reported edge case

@boy2000-007man boy2000-007man changed the title [WIP] fix DisjunctiveConstraint edge case and add ConjunctiveDisjunctiveConstraint Fix DisjunctiveConstraint edge case and add ConjunctiveDisjunctiveConstraint Jul 4, 2022
@patrickvonplaten
Copy link
Contributor

Hey @boy2000-007man,

Thanks a lot for the PR - I'm a bit worried about adding so much new code to main transformers to catch an edge case and wonder if it's really worth it. The problem is that this function will quickly become unmaintainable (it already sadly is to some extent) - in your opinion is it absolutely necessary to add this edge case? Also could you maybe provide a "real" generation example that shows how the current implementation fails?

@boy2000-007man
Copy link
Contributor Author

Hi, @patrickvonplaten, the current code implementation is complex to support both the existing DisjunctiveConstraint and newly added ConjunctiveDisjunctiveConstraint at the same time. I can add a much-simplified version dedicated to back DisjunctiveConstraint only, and the new ConjunctiveDisjunctiveConstraint is not used by the library default but requires manual import, so won't break any existing works by chance.
Finding a failure case is not that straightforward especially without deep understanding of specific model preference, but I can image some constraints like the small cat/small cats, the united states/united kingdom may be influenced.

@patrickvonplaten
Copy link
Contributor

Hey @boy2000-007man,

Sorry to reply so late here. Will gently ask @cwkeam in private if he could take a quick look because he's the most familiar with the current code. if there is no answer, I'll come back to it and dive deeper into the code to be able to better answer here.

Also cc @gante if you're feeling curios on complex code ;-)

@gante
Copy link
Contributor

gante commented Aug 2, 2022

Hi @boy2000-007man 👋 I was having a look into this PR, and one thing I noticed was that the objective of the PR was not immediately clear -- it says at the top that it fixes an edge case but... what edge case? We can find the answer to that in the code, especially in the docstring of ConjunctiveDisjunctiveConstraint.

I do think we should fix the edge case, as the documented behavior does not match the actual behavior. However, adding clear examples (as in #15761) will be extremely useful for our future selves 🙏 It will also helps the reviewers seeing the value of the PR :D

@boy2000-007man
Copy link
Contributor Author

Hi, @gante , Sorry for the late reply. The edge case is mentioned in the associated bug report, #17831. Do you mean to mention it again in the docstring?

@gante
Copy link
Contributor

gante commented Aug 8, 2022

Do you mean to mention it again in the docstring

Yes please, but with input strings (as opposed to tokens).

It's hard to justify adding so many new lines of code without a clear example of why it matters :) We have very limited maintenance capacity, so sometimes it's preferable to have an incomplete short solution that we can maintain than a complete long solution that will accumulate bugs as we introduce new features.

@huggingface huggingface deleted a comment from github-actions bot Sep 2, 2022
@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants