Add LLM token classification example #4541

roym899 · 2023-12-14T16:48:00Z

What

Adds an example that tokenizes a text, visualizes the embeddings for each token (as a 3D UMAP embedding), logs the text tokens linking to the corresponding embedding, and classifies each token. Classification is into named entities (person, location, organization, and misc). The found, unique named entities are also logged.

Also removed some newlines in manifest.yml to make it more consistent.

Checklist

I have read and agree to Contributor Guide and the Code of Conduct
I've included a screenshot or gif (if applicable)
I have tested the web demo (if applicable):
- Full build: app.rerun.io
- Partial build: app.rerun.io - Useful for quick testing when changes do not affect examples in any way
The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG

…ding_example

teh-cmc

Works great, looks great.

Please don't spawn debugging shells without my consent though 😛

examples/python/llm_embedding_ner/main.py

teh-cmc · 2023-12-15T08:27:08Z

Also: at least on my machine, there's a bunch of wait time before the first logging calls arrive and again while computing the embeddings:

23-12-15_09.24.23.patched.mp4

It'd be nice if the script mentioned what it was doing in its standard output during those.

(Man i really wish we could log a spinner thing...)

Co-authored-by: Clement Rey <[email protected]>

…ding_example

roym899 · 2023-12-15T11:17:38Z

Added a print.

Regarding runtime, the embeddings are currently computed twice. Once for logging and once as part of the whole pipeline. Not sure if it's worth to change this. In the pipeline there is a bit of extra stuff going on beyond just another function call passing the embeddings. So it'd add some complexity to the example. I added a note for now.

teh-cmc · 2023-12-18T08:15:03Z

Any reason we're not merging this @roym899 ?

roym899 · 2023-12-18T09:14:31Z

Making some small adjustments after talking to @nikolausWest

Add llm embedding example

94057db

roym899 added examples Issues relating to the Rerun examples include in changelog labels Dec 14, 2023

roym899 added 3 commits December 14, 2023 18:26

Improve typing

71cfc01

Merge branch 'main' of github.com:rerun-io/rerun into leo/token_embed…

4fd3ab1

…ding_example

Sort requirements.txt file

a13b850

teh-cmc self-requested a review December 15, 2023 08:12

teh-cmc approved these changes Dec 15, 2023

View reviewed changes

teh-cmc mentioned this pull request Dec 15, 2023

A loading spinner component? #4552

Open

roym899 and others added 3 commits December 15, 2023 11:48

Apply suggestions from code review

b6cfddf

Co-authored-by: Clement Rey <[email protected]>

Merge branch 'main' of github.com:rerun-io/rerun into leo/token_embed…

124ca97

…ding_example

Improve variable naming, add print, add note on embedding computation

09ea403

roym899 mentioned this pull request Dec 15, 2023

Add a way to link entity-paths across spaces such as for point-correspondences #4379

Open

roym899 added 4 commits December 18, 2023 10:50

Use rr.AnyValues to log token and named_entity per point

0fce664

Use rr.AnnotationInfo to create AnnotationContext

7c4ff80

Fix formatting

917f339

Improve capitalization for AnyValues args

b2efaf9

roym899 merged commit 471af6d into main Dec 18, 2023
40 checks passed

roym899 deleted the leo/token_embedding_example branch December 18, 2023 12:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LLM token classification example #4541

Add LLM token classification example #4541

roym899 commented Dec 14, 2023 •

edited by github-actions bot

Loading

teh-cmc left a comment

teh-cmc commented Dec 15, 2023

roym899 commented Dec 15, 2023

teh-cmc commented Dec 18, 2023

roym899 commented Dec 18, 2023

Add LLM token classification example #4541

Add LLM token classification example #4541

Conversation

roym899 commented Dec 14, 2023 • edited by github-actions bot Loading

What

Checklist

teh-cmc left a comment

Choose a reason for hiding this comment

teh-cmc commented Dec 15, 2023

roym899 commented Dec 15, 2023

teh-cmc commented Dec 18, 2023

roym899 commented Dec 18, 2023

roym899 commented Dec 14, 2023 •

edited by github-actions bot

Loading