Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please include the license for averaged_perceptron_tagger #229

Open
Hiroshiba opened this issue Feb 11, 2025 · 2 comments · May be fixed by #233
Open

Please include the license for averaged_perceptron_tagger #229

Hiroshiba opened this issue Feb 11, 2025 · 2 comments · May be fixed by #233

Comments

@Hiroshiba
Copy link

The list of resources available for download from NLTK does not appear to mention the license for averaged_perceptron_tagger.

Could you clarify what license averaged_perceptron_tagger is distributed under?


I did some research on my own and hope this information might be helpful:

  • It seems likely that the license is MIT, coming from the sloria/textblob-aptagger repository.
  • Here are some relevant issues and pull requests:
    • Issue #1110 where POS improvements were discussed and a solution was proposed.
    • Issue #1122 proposing the use of TextBlob PerceptronTagger.
    • Pull Request #1143, which seems related to implementing the above proposal, though it may not have been merged directly or was replaced by a different commit.

From what I can tell, sloria/textblob-aptagger is MIT licensed, and NLTK might be distributing only the necessary pickle files from that repository as part of nltk_data. If that’s correct, it follows that averaged_perceptron_tagger would also be MIT licensed.

If so, it would be helpful to clearly state this in taggers/averaged_perceptron_tagger.xml. For reference on how to include such information in the XML, the cmudict.xml file might be a good example.

Thanks in advance for looking into this!

@ekaf
Copy link
Member

ekaf commented Feb 11, 2025

Thanks @Hiroshiba. Decompiling the TextBlob pickle reveals that it has different weights than the NLTK package.

Unfortunately, the conversation in PR #1143, and particularly this comment seems to only clarify the license for the Python code, and may not necessarily cover the data package.

@Hiroshiba
Copy link
Author

I didn't notice that the comment had been edited and ended up creating a pull request:
#233

[textblob-aptagger](https://github.com/sloria/textblob-aptagger) is under the MIT license, and typically, if nothing else is specified, the binary files included in it are also considered to be under the MIT license. Therefore, I interpreted the pickle files as being covered by the MIT license as well. (Of course, I'm not a legal expert, so I might be mistaken, but that's my general understanding.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants