Skip to content

Conversation

@tomaarsen
Copy link
Member

@tomaarsen tomaarsen commented Apr 6, 2023

Hello!

Pull Request overview

  • Integrate with the SpanMarker library for Named Entity Recognition.

Details

SpanMarker is a model architecture for Named Entity Recognition that is tightly implemented on top of transformers. Consequently, loading a SpanMarker model, e.g. tomaarsen/span-marker-bert-base-fewnerd-fine-super is as simple as:

from span_marker import SpanMarkerModel

model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-bert-base-fewnerd-fine-super")

Feel free to try out this Space to get a feel for the power of the architecture.

I've followed the documentation here, but it wasn't extremely clear on which sections should be added for a succesful integration, so please let me know if I forgot any parts.

This PR is related to huggingface/api-inference-community#225, which adds support for SpanMarker models to API Inference endpoints.

  • Tom Aarsen

@tomaarsen
Copy link
Member Author

On a slightly related note - once I know that this is indeed all I have to update to integrate a library, I can certainly integrate 🤗 SetFit fully.

Copy link
Contributor

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! In general, we only merge new library PRs once we have a good number of models on the Hub, otherwise, we would have a filter button that would lead to 4 models at the moment. That said, we discussed with the team and would be ok moving this PR forward 🔥 Let's aim to grow this significantly and gets lots of models! 🥳

For completeness, we would also need to update some other sections

@tomaarsen
Copy link
Member Author

@osanseviero
That is wonderful news! I'm very excited to hear it. SpanMarker is still young, and I'm still experimenting with some ideas that could result in non-backwards compatible changes, so I haven't written about the library elsewhere yet. That is why there's only 4 models at the moment - I'll be sure to push for more (i.e. ones that aren't mine 😉) once I'm ready to start advertising the library.

On that topic, I was wondering how you handle versioning? I intend to release v1.0.0 this month, which may or may not be backwards compatible with v0.X.X models. Currently, most v0.X.X models are my own, so I'm not completely sure whether I should messy up my codebase for backwards compatibility. May I assume that the version used for e.g. Inference Endpoints is always the most recent version?

I'll add the other changes that you mention to this PR. Then, perhaps it is best if I convert this PR to a draft until v1.0.0 later this month, especially if the implementation of backwards compatibility for v0.X.X to v1.0.0 adds too much bloat.

  • Tom Aarsen

@osanseviero
Copy link
Contributor

On that topic, I was wondering how you handle versioning? I intend to release v1.0.0 this month, which may or may not be backwards compatible with v0.X.X models. Currently, most v0.X.X models are my own, so I'm not completely sure whether I should messy up my codebase for backwards compatibility. May I assume that the version used for e.g. Inference Endpoints is always the most recent version?

Ideally the library should maintain backwards compatibility, but it might be ok to break things if the library is in a very early stage and usage is not large yet. When we do backwards compatible changes, we go through a deprecation period so people get warnings and we provide proper time to update code. As for Inference API, it will use the version pinned in the requirements.txt file, so it depends on that

Copy link
Contributor

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks a lot! I left a question and then we can merge

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 12, 2023

The documentation is not available anymore as the PR was closed or merged.

@tomaarsen
Copy link
Member Author

I'll do my best to maintain backwards compatibility where possible, but I may go for some breaking changes before v1.0.0, i.e. before people find out about the project. On that topic, it may be best to keep this as a draft until the v1.0.0. That way, we'll limit the harm as people won't know about SpanMarker until the v1 release yet.
What are your thoughts on that?

@osanseviero
Copy link
Contributor

Up to you! I think it makes sense until official release before merging this

@tomaarsen
Copy link
Member Author

Agreed. I'll draft this and I'll report back when it's ready!
Same for huggingface/api-inference-community#225.

Thanks for the help so far ❤️

@tomaarsen tomaarsen marked this pull request as draft April 12, 2023 14:47
tomaarsen added 3 commits May 1, 2023 15:40
1. This is more in line with the other libraries, and
2. my codebase has been using 'span-marker' from the start.
If I stick with span_marker, then all models will get the 'span_marker' AND 'span-marker' tags.
@tomaarsen tomaarsen marked this pull request as ready for review May 1, 2023 13:57
@tomaarsen
Copy link
Member Author

SpanMarker just released with v1.0.0, so I'm ready to get this integration merged!

Copy link
Contributor

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! 🔥

@osanseviero osanseviero merged commit a958cee into huggingface:main May 2, 2023
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@tomaarsen tomaarsen deleted the integration/span_marker branch May 2, 2023 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants