Skip to content

Conversation

@Narsil
Copy link
Contributor

@Narsil Narsil commented Jun 1, 2022

  • Changing Decoder trait to be more composable.

* Changing `Decoder` trait to be more composable.

Fix #872

* Fixing Python side.

* Fixing test.

* Updating cleanup signature, removing turbofish.
@Narsil Narsil requested a review from mishig25 June 1, 2022 12:22
@Narsil Narsil marked this pull request as draft June 1, 2022 12:22
@mishig25
Copy link
Contributor

mishig25 commented Jun 1, 2022

Got 2 questions:

  1. What is the difference between Changing Decoder trait to be more composable. #938 & Changing Decoder trait to be more composable. (#938) #1008 and why was Changing Decoder trait to be more composable. #938 reverted in the first place?
  2. Do you need to create tokenizers::decoders::Sequence?

@Narsil
Copy link
Contributor Author

Narsil commented Jun 1, 2022

Got 2 questions:

1. What is the difference between [Changing `Decoder` trait to be more composable. #938](https://github.com/huggingface/tokenizers/pull/938) & [Changing `Decoder` trait to be more composable. (#938) #1008](https://github.com/huggingface/tokenizers/pull/1008) and why was [Changing `Decoder` trait to be more composable. #938](https://github.com/huggingface/tokenizers/pull/938) reverted in the first place?

This one is not breaking

2. Do you need to create `tokenizers::decoders::Sequence`?

Yes.

@Narsil Narsil marked this pull request as ready for review June 1, 2022 15:50
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 1, 2022

The documentation is not available anymore as the PR was closed or merged.

fn decode(&self, tokens: Vec<String>) -> Result<String>;
fn decode(&self, tokens: Vec<String>) -> Result<String> {
let results = self.decode_chain(tokens)?;
Ok(results.join(""))
Copy link
Contributor

@mishig25 mishig25 Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this line cause any problem for the SequenceDecoder?
Specifically, one unlikekly edge-case I'm thinking is: one of the sequence decoders is another sequence decoder, and Ok(results.join("")) will cause an informaition loss (vec of str now just becomes a str)?

as a solution, we can implement fn decode for Sequence Decoder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK I think.

The parent Sequence would call the child Sequence.decode_chain NOT the decode function, so we're good, no ?

I could add a test to make sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh I see, so it is no problem in that case

Copy link
Contributor

@mishig25 mishig25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@Narsil Narsil merged commit 943b542 into main Jun 2, 2022
@Narsil Narsil deleted the decoder_trait branch June 2, 2022 12:43
Narsil added a commit that referenced this pull request Aug 23, 2022
* Changing `Decoder` trait to be more composable. (#938)

* Changing `Decoder` trait to be more composable.

Fix #872

* Fixing Python side.

* Fixing test.

* Updating cleanup signature, removing turbofish.

* Adding `Sequence` Decoder.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants