We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transforms/Other
The goal of this transform is to provide token-specific metrics annotation that could serve as an indicator of document quality.
Consideration:
For each document, annotate with the:
doc_num_tokens
doc_size_kbs
doc_num_chars
tokens_per_doc_size
tokens_per_doc_num_chars
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Search before asking
Component
Transforms/Other
Feature
The goal of this transform is to provide token-specific metrics annotation that could serve as an indicator of document quality.
Consideration:
For each document, annotate with the:
doc_num_tokens
- number of tokens for the documentdoc_size_kbs
- document size in kbdoc_num_chars
- number of characters in the documenttokens_per_doc_size
- ratio of the number of tokens and document sizetokens_per_doc_num_chars
- ratio of the number of tokens and the number of characters in the documentAre you willing to submit a PR?
The text was updated successfully, but these errors were encountered: