-
Notifications
You must be signed in to change notification settings - Fork 399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Make the limit in the number of labels for multi-label classification tasks configurable (possibly higher than 250) #4615
Comments
🤖 It seems like you've got this under control, if you want help or have specific questions, let me know what I can do for you!
|
For what it's worth, I tried testing a workaround editing in the quickstart docker container the relevant variable To do so, within a root shell terminal in docker I increased by 10x the limit in the number of labels: sed -i 's/LABEL_SELECTION_OPTIONS_MAX_ITEMS = 250/LABEL_SELECTION_OPTIONS_MAX_ITEMS = 2500/' ./opt/venv/lib/python3.10/site-packages/argilla_server/schemas/v1/questions.py
sed -i 's/LABEL_SELECTION_OPTIONS_MAX_ITEMS = 250/LABEL_SELECTION_OPTIONS_MAX_ITEMS = 2500/' ./opt/venv/lib/python3.10/site-packages/argilla/server/schemas/v1/questions.py restarted the container, and executed the code in #4615 (comment) setting It's not a proper test (I haven't started annotations), but the frontend seems to work fine (it's responsive) with that number of labels: |
@davidefiocco , sad to hear that it is not plug-and-play but thanks a lot for the additional work on the topic :) |
…tion, multi_label_selection and span as configurable (#85) # Description This PR changes how questions of type `label_selection`, `multi_label_selection` and `span` configures their maximum number of options (labels) using environment variables instead of constants. The following environment variables has been added: * `ARGILLA_LABEL_SELECTION_OPTIONS_MAX_ITEMS` (default: `500`): allows to modify the number of maximum options for questions of type `label_selection` and `multi_label_selection`. * `ARGILLA_SPAN_OPTIONS_MAX_ITEMS` (default: `500`): allow to modify the number of maximum options for questions of type `span`. @damianpumar you can use the following one-liner to run the server using big numbers for these environment variables so you can do benchmarks and improvements (tell me if you need with this): ```sh $ ARGILLA_LABEL_SELECTION_OPTIONS_MAX_ITEMS=5000 ARGILLA_SPAN_OPTIONS_MAX_ITEMS=5000 pdm server ``` > [!NOTE] > Related documentation changes here: argilla-io/argilla#4713 Closes argilla-io/argilla#4615 **Type of change** (Please delete options that are not relevant. Remember to title the PR according to the type of change) - [ ] Bug fix (non-breaking change which fixes an issue) - [x] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] Refactor (change restructuring the codebase without changing functionality) - [ ] Improvement (change adding some improvement to an existing functionality) - [ ] Documentation update **How Has This Been Tested** (Please describe the tests that you ran to verify your changes. And ideally, reference `tests`) - [x] Tests are passing and manually testing changing values using new environment variables. **Checklist** - [ ] I added relevant documentation - [ ] follows the style guidelines of this project - [ ] I did a self-review of my code - [ ] I made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK) (see text above) - [ ] I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)
…r questions (#4713) # Description This PR is associated to the changes at argilla-io/argilla-server#85 adding two new environment variables that could be used to increase (or reduce) the maximum number of items that could be defined for label, multi label and span questions. Refs #4615 **Type of change** (Please delete options that are not relevant. Remember to title the PR according to the type of change) - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] Refactor (change restructuring the codebase without changing functionality) - [ ] Improvement (change adding some improvement to an existing functionality) - [x] Documentation update **How Has This Been Tested** (Please describe the tests that you ran to verify your changes. And ideally, reference `tests`) - [x] Checking that the markdown render as expected. **Checklist** - [x] I added relevant documentation - [ ] follows the style guidelines of this project - [ ] I did a self-review of my code - [ ] I made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK) (see text above) - [ ] I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)
@davidefiocco with the coming release v1.27.0 we added new environment variables so now it's possible to change the limits (at instance level) for label/multi-label and span questions. These changes will be added to the documentation once the release is published. In case you want to take a look to the docs changes previous the publishing you can see them here: https://github.com/argilla-io/argilla/pull/4713/files#diff-7985416b05c65eb8bdc26219fec4d817b1e9bb725a599d9e88841c64b1539a40R88 |
At the moment, when dealing with (multi-label) classification tasks with very many labels
num_labels
(>250), as indataset creation fails with a
I believe that this many-label scenario can appear e.g. in retail/product classification annotation workloads.
Ideally, I would like the
limit_value
to be configurable when running the argilla server, e.g. as an env var.A current workaround is to subset datasets into chunks whose examples are likely to contain only a limited <250 subset of labels. This is possible e.g. by exploiting the taxonomy of labels, but can be clunky and relies on the assumptions that none of the excluded labels appears in the various subsets.
Additional context
I am using
argilla=1.24.0
and the quickstart server pulled viadocker pull argilla/argilla-quickstart:latest
.This was discussed with @jfcalvo and @davidberenstein1957 over Slack on the 02-support-and-questions channel. Thank you so much!
The text was updated successfully, but these errors were encountered: