Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENT] Argilla SDK: Updating record fields and vectors #5026

Merged
merged 14 commits into from
Jun 18, 2024

Conversation

frascuchon
Copy link
Member

@frascuchon frascuchon commented Jun 14, 2024

Description

This PR reviews the record attributes and normalizes how to work with fields, vectors, and metadata. Now, all are treated as dictionaries and users can update in the same way that working with dictionaries or creating new attributes:

record = Record(fields={"name": "John"})

record.fields.update({"name": "Jane", "age": "30"})
record.fields.new_field = "value"

record.vectors["new-vector"] = [1.0, 2.0, 3.0]
record.vectors.vector = [1.0, 2.0, 3.0]

record.metadata["new-key"] = "new_value"
record.metadata.key = "new_value"

Once this approach is approved, I will create a new PR changing the docs.

Type of change

(Please delete options that are not relevant. Remember to title the PR according to the type of change)

  • New feature (non-breaking change which adds functionality)
  • Refactor (change restructuring the codebase without changing functionality)
  • Improvement (change adding some improvement to an existing functionality)

How Has This Been Tested

(Please describe the tests that you ran to verify your changes. And ideally, reference tests)

  • Test A
  • Test B

Checklist

  • I added relevant documentation
  • I followed the style guidelines of this project
  • I did a self-review of my code
  • I made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I filled out the contributor form (see text above)
  • I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

@frascuchon frascuchon changed the title Feat/updating all fields [ENHANCEMENT] Argilla SDK: Updating all fields Jun 14, 2024
@frascuchon frascuchon added this to the v2.0.0 milestone Jun 14, 2024
@burtenshaw
Copy link
Contributor

burtenshaw commented Jun 17, 2024

@frascuchon @davidberenstein1957

Looks nice, just some in line comments and queries.

argilla/src/argilla/records/_dataset_records.py Outdated Show resolved Hide resolved
def __init__(self, metadata: Optional[Dict[str, MetadataValue]] = None) -> None:
super().__init__(metadata or {})

def __getattr__(self, item: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we definitely going to support both of these strategies?

record.vectors["new-vector"] = [1.0, 2.0, 3.0]
record.vectors.vector = [1.0, 2.0, 3.0]

record.metadata["new-key"] = "new_value"
record.metadata.key = "new_value"

Personally, I'm not 100% sure about the attribute based approach because I don't like how the IDE behaves. Is there a way to improve the experience there?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is clearer to only support one, because otherwise, we might end up with these too fragmented approaches for doing the same thing again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I agree with this. I've aligned fields and vectors with metadata, supporting both approaches. Working with dict is fine for me, and is clearest to users since in the doc they are described as dict. Same feeling with IDE warnings when using attributes. I would say to keep the dict-like access and raise an error when accessing using attributes (I think the default behavior is do-nothing)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question here, should we do the same for responses and suggestions? Now, we support:

record.responses.label 
record.suggestions.text

which have the same IDE problems.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. So we'll open a new PR for suggestions and responses, where we'll use a dictionary style access as well? Nice.

return [Vector(name=name, values=value).api_model() for name, value in self.items()]

def to_dict(self) -> Dict[str, List[float]]:
return {key: value for key, value in self.items()}


class RecordResponses(Iterable[Response]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this class be used when users want to:

  • add a first response after retrieving records?
  • add subsequent responses after retrieving records?

Ideally, these should just be the same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, both cases should be something like this:

record.responses.add(rg.Response(...)..)

# or 
record.responses.add[rg.Response(...), rg.Response(...)])

Copy link
Member

@davidberenstein1957 davidberenstein1957 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.

argilla/src/argilla/records/_dataset_records.py Outdated Show resolved Hide resolved
def to_dict(self) -> Dict[str, Union[str, None]]:
return self.__fields
def to_dict(self) -> dict:
return {key: value for key, value in self.items()}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return {key: value for key, value in self.items()}
return dict(self.items())

def __init__(self, metadata: Optional[Dict[str, MetadataValue]] = None) -> None:
super().__init__(metadata or {})

def __getattr__(self, item: str):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is clearer to only support one, because otherwise, we might end up with these too fragmented approaches for doing the same thing again.

self[key] = value

def to_dict(self) -> dict:
return {key: value for key, value in self.items()}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return {key: value for key, value in self.items()}
return dict(self.items())

return [Vector(name=name, values=value).api_model() for name, value in self.items()]

def to_dict(self) -> Dict[str, List[float]]:
return {key: value for key, value in self.items()}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return {key: value for key, value in self.items()}
return dict(self.items())

vectors=[rg.Vector("text", [0, 0, 0])],
id=str(uuid.uuid4()),
vectors={"text": [0, 0, 0]},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@frascuchon frascuchon self-assigned this Jun 18, 2024
@frascuchon frascuchon changed the title [ENHANCEMENT] Argilla SDK: Updating all fields [ENHANCEMENT] Argilla SDK: Updating record fields, vectors and metadata Jun 18, 2024
@frascuchon frascuchon changed the title [ENHANCEMENT] Argilla SDK: Updating record fields, vectors and metadata [ENHANCEMENT] Argilla SDK: Updating record fields and vectors Jun 18, 2024
@frascuchon frascuchon merged commit 6452993 into feat/updating-record Jun 18, 2024
13 checks passed
@frascuchon frascuchon deleted the feat/updating-all-fields branch June 18, 2024 10:17
frascuchon added a commit that referenced this pull request Jun 18, 2024
<!-- Thanks for your contribution! As part of our Community Growers
initiative 🌱, we're donating Justdiggit bunds in your name to reforest
sub-Saharan Africa. To claim your Community Growers certificate, please
contact David Berenstein in our Slack community or fill in this form
https://tally.so/r/n9XrxK once your PR has been merged. -->

# Description

This PR addresses
[discussion](#5026 (comment))
from PR #5026 and removes
attribute-like access for record fields, vectors and metadata.



**Type of change**

(Please delete options that are not relevant. Remember to title the PR
according to the type of change)

- [ ] New feature (non-breaking change which adds functionality)
- [ ] Refactor (change restructuring the codebase without changing
functionality)
- [ ] Improvement (change adding some improvement to an existing
functionality)

**How Has This Been Tested**

(Please describe the tests that you ran to verify your changes. And
ideally, reference `tests`)

- [ ] Test A
- [ ] Test B

**Checklist**

- [ ] I added relevant documentation
- [ ] I followed the style guidelines of this project
- [ ] I did a self-review of my code
- [ ] I made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK)
(see text above)
- [ ] I have added relevant notes to the `CHANGELOG.md` file (See
https://keepachangelog.com/)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants