Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Make the limit in the number of labels for multi-label classification tasks configurable (possibly higher than 250) #4615

Closed
davidefiocco opened this issue Feb 29, 2024 · 5 comments · Fixed by argilla-io/argilla-server#85
Assignees
Labels
area: server Indicates that an issue or pull request is related to the server language: python Pull requests or issues that update Python code severity: major Indicates that the issue is blocking for users or needs to be addressed soon team: backend Indicates that the issue or pull request is owned by the backend team type: community request Indicates a feature requested by someone outside of the Argilla organization type: enhancement Indicates new feature requests
Milestone

Comments

@davidefiocco
Copy link
Contributor

davidefiocco commented Feb 29, 2024

At the moment, when dealing with (multi-label) classification tasks with very many labels num_labels (>250), as in

import argilla as rg
api_url = "http://localhost:6900"
api_key = "<api-key>"

rg.init(api_url=api_url, api_key=api_key)

#rg.Workspace.create('argilla')

rg.set_workspace("argilla")

num_labels = 251

labels = ["label " + str(i) for i in range(0, num_labels)]

dataset = rg.FeedbackDataset(
    fields=[
        rg.TextField(name="text", required= True, use_markdown=True),
    ],
    questions=[
        rg.MultiLabelQuestion(name="question-multi", title="Which labels are correct?", labels=labels, required=True),
    ]
)

records = []

record = rg.FeedbackRecord(
    fields={
        "text": "tierra finamente dividida, constituida por agregados de silicatos de aluminio hidratados"
    },
    suggestions=[{"question_name": "question-multi", 
                  "value":["label 28", "label 8", "label 53", "label 14", "label 5"], 
                  "agent": "my-first-model"}]
)
records.append(record)

dataset.add_records(records)

dataset.push_to_argilla(name="way-too-many-labels-dataset", workspace="argilla")

dataset creation fails with a

Exception: Failed while adding the question 'question-multi' to the `FeedbackDataset` in Argilla with exception: Argilla server returned an error with http status: 422. Error details: 
{'code': 'argilla.api.errors::ValidationError', 'params': {'errors': [{'loc': ['body', 'settings', 'MultiLabelSelectionQuestionSettingsCreate', 'options'], 'msg': 'ensure this value has 
at most 250 items', 'type': 'value_error.list.max_items', 'ctx': {'limit_value': 250}, 'value': {}}]}}

I believe that this many-label scenario can appear e.g. in retail/product classification annotation workloads.

Ideally, I would like the limit_value to be configurable when running the argilla server, e.g. as an env var.

A current workaround is to subset datasets into chunks whose examples are likely to contain only a limited <250 subset of labels. This is possible e.g. by exploiting the taxonomy of labels, but can be clunky and relies on the assumptions that none of the excluded labels appears in the various subsets.

Additional context
I am using argilla=1.24.0 and the quickstart server pulled via docker pull argilla/argilla-quickstart:latest.
This was discussed with @jfcalvo and @davidberenstein1957 over Slack on the 02-support-and-questions channel. Thank you so much!

Copy link

dosubot bot commented Feb 29, 2024

🤖

It seems like you've got this under control, if you want help or have specific questions, let me know what I can do for you!

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@davidefiocco davidefiocco changed the title [FEATURE] Make the limit for multi-label classification tasks configurable (possibly higher than 250) [FEATURE] Make the limit in the number of labels for multi-label classification tasks configurable (possibly higher than 250) Feb 29, 2024
@dosubot dosubot bot added area: server Indicates that an issue or pull request is related to the server language: python Pull requests or issues that update Python code severity: major Indicates that the issue is blocking for users or needs to be addressed soon team: backend Indicates that the issue or pull request is owned by the backend team type: enhancement Indicates new feature requests labels Feb 29, 2024
@jfcalvo jfcalvo self-assigned this Feb 29, 2024
@nataliaElv nataliaElv added the type: community request Indicates a feature requested by someone outside of the Argilla organization label Feb 29, 2024
@nataliaElv nataliaElv added this to the v.1.26.0 milestone Mar 5, 2024
@davidefiocco
Copy link
Contributor Author

davidefiocco commented Mar 6, 2024

For what it's worth, I tried testing a workaround editing in the quickstart docker container the relevant variable

https://github.com/argilla-io/argilla-server/blob/efe3bceaeeeff8d8738ed7afc271750994c94767/src/argilla_server/schemas/v1/questions.py#L49

To do so, within a root shell terminal in docker I increased by 10x the limit in the number of labels:

sed -i 's/LABEL_SELECTION_OPTIONS_MAX_ITEMS = 250/LABEL_SELECTION_OPTIONS_MAX_ITEMS = 2500/' ./opt/venv/lib/python3.10/site-packages/argilla_server/schemas/v1/questions.py
sed -i 's/LABEL_SELECTION_OPTIONS_MAX_ITEMS = 250/LABEL_SELECTION_OPTIONS_MAX_ITEMS = 2500/' ./opt/venv/lib/python3.10/site-packages/argilla/server/schemas/v1/questions.py

restarted the container, and executed the code in #4615 (comment) setting num_labels = 1500. The error I mentioned goes away.

It's not a proper test (I haven't started annotations), but the frontend seems to work fine (it's responsive) with that number of labels:

image

@davidefiocco
Copy link
Contributor Author

davidefiocco commented Mar 13, 2024

It's anecdotal, but when I experiment on a remote Argilla server, clicking on "Less" to fold/unfold the long list of labels, makes the UI fail to respond, and one may need to refresh the page.

image

Things work well when running locally on Docker instead.

So this issue may need more work than a variable change to work well in the frontend :(

@davidberenstein1957
Copy link
Member

@davidefiocco , sad to hear that it is not plug-and-play but thanks a lot for the additional work on the topic :)

@jfcalvo jfcalvo modified the milestones: v1.26.0, v1.27.0 Mar 18, 2024
jfcalvo added a commit to argilla-io/argilla-server that referenced this issue Apr 15, 2024
…tion, multi_label_selection and span as configurable (#85)

# Description

This PR changes how questions of type `label_selection`,
`multi_label_selection` and `span` configures their maximum number of
options (labels) using environment variables instead of constants.

The following environment variables has been added:
* `ARGILLA_LABEL_SELECTION_OPTIONS_MAX_ITEMS` (default: `500`): allows
to modify the number of maximum options for questions of type
`label_selection` and `multi_label_selection`.
* `ARGILLA_SPAN_OPTIONS_MAX_ITEMS` (default: `500`): allow to modify the
number of maximum options for questions of type `span`.

@damianpumar you can use the following one-liner to run the server using
big numbers for these environment variables so you can do benchmarks and
improvements (tell me if you need with this):

```sh
$ ARGILLA_LABEL_SELECTION_OPTIONS_MAX_ITEMS=5000 ARGILLA_SPAN_OPTIONS_MAX_ITEMS=5000 pdm server
```

> [!NOTE]
> Related documentation changes here:
argilla-io/argilla#4713

Closes argilla-io/argilla#4615

**Type of change**

(Please delete options that are not relevant. Remember to title the PR
according to the type of change)

- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [ ] Refactor (change restructuring the codebase without changing
functionality)
- [ ] Improvement (change adding some improvement to an existing
functionality)
- [ ] Documentation update

**How Has This Been Tested**

(Please describe the tests that you ran to verify your changes. And
ideally, reference `tests`)

- [x] Tests are passing and manually testing changing values using new
environment variables.

**Checklist**

- [ ] I added relevant documentation
- [ ] follows the style guidelines of this project
- [ ] I did a self-review of my code
- [ ] I made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK)
(see text above)
- [ ] I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)
jfcalvo added a commit that referenced this issue Apr 15, 2024
…r questions (#4713)

# Description

This PR is associated to the changes at
argilla-io/argilla-server#85 adding two new
environment variables that could be used to increase (or reduce) the
maximum number of items that could be defined for label, multi label and
span questions.

Refs #4615 

**Type of change**

(Please delete options that are not relevant. Remember to title the PR
according to the type of change)

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [ ] Refactor (change restructuring the codebase without changing
functionality)
- [ ] Improvement (change adding some improvement to an existing
functionality)
- [x] Documentation update

**How Has This Been Tested**

(Please describe the tests that you ran to verify your changes. And
ideally, reference `tests`)

- [x] Checking that the markdown render as expected.

**Checklist**

- [x] I added relevant documentation
- [ ] follows the style guidelines of this project
- [ ] I did a self-review of my code
- [ ] I made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK)
(see text above)
- [ ] I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)
@jfcalvo
Copy link
Member

jfcalvo commented Apr 18, 2024

@davidefiocco with the coming release v1.27.0 we added new environment variables so now it's possible to change the limits (at instance level) for label/multi-label and span questions.

These changes will be added to the documentation once the release is published. In case you want to take a look to the docs changes previous the publishing you can see them here: https://github.com/argilla-io/argilla/pull/4713/files#diff-7985416b05c65eb8bdc26219fec4d817b1e9bb725a599d9e88841c64b1539a40R88

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: server Indicates that an issue or pull request is related to the server language: python Pull requests or issues that update Python code severity: major Indicates that the issue is blocking for users or needs to be addressed soon team: backend Indicates that the issue or pull request is owned by the backend team type: community request Indicates a feature requested by someone outside of the Argilla organization type: enhancement Indicates new feature requests
Projects
None yet
5 participants