Demo / lightning talk for plankton image data flow #8

metazool · 2024-07-09T10:43:12Z

Updated the issue title to reflect this has grown some extra dimensions! Come back here after some shared discussion and outline what it is we'd like to show

The work in #5 and #6 serves as a proof of concept of minimal-effort approaches to learning from image collections without undertaking model training or costly labelling; but it's at the edge of what's meant to be a deeper investigation of pipelines and workflows that can apply to related projects - most immediately AMI-system. This Discussion on DataLabs computer vision needs for a combination physical sample / imaging field site shows likely demand.

Putting together a short show-and-tell / demo that can be presented to the Environmental Data Science group and the research group is a nice motivator to draw a line under the low-hanging ML parts, shift focus to architecture choices and cross-project common ground

model choice and overview
image similarity search by vector embeddings
unsupervised clustering approaches to the above

Of these, 2. needs expanded a bit to become more visually interesting and to probe for areas where the approach is weak. 3. we haven't tried at all, got lost in the wash between pipeline/workflow #9 on the one hand and experimental model choice #10 on the other, but it should be quick to try (DBScan etc)

See also the section on transfer learning / feature extraction in this workshop paper:
https://aslopubs.onlinelibrary.wiley.com/doi/full/10.1002/lno.12101#lno12101-sec-0025-title

metazool · 2024-08-05T09:26:39Z

Did a small rendering of k-means clustering of the plankton embeddings which had visually similar outcomes to the similarity search, this is on the clustering_visualisation branch.

It's outgrowing a notebook, wondering if streamlit is the right fit for this rather than shifting to Javascript - @matthewcoole 's demo of retrieval augmented generation document search has similar components (including chromadb) https://github.com/NERC-CEH/embeddings_app/ - either repurpose this or borrow from it

Focus of this is to show naively-minimal output to plankton researchers and enlist their help either in finding flaws, or in refining which path to take is actually useful to them. Should be quite timeboxed, ideally no more than a day, max 2...

metazool · 2024-08-05T09:36:44Z

Note to self that embeddings_app assumes some data that's generated by methods in discoverability

This shows use of UMAP to do dimensionality reduction on embeddings; which is probably worth trying in the notebook to see if that helps DBSCAN not to see everything as noise

metazool · 2024-08-08T08:13:49Z

Another note to self that while it's not necessary now, the next visit to this should involve

ease of pointing to a different image collection (it's already all driven from chromadb which uses URLs of objects in s3 as identifiers)
ease of pointing to a collection of different embeddings for the same image sources (whether that's BioCLIP or the more recent model the Turing Inst folks are releasing with the paper from @noushineftekhari ... )

metazool mentioned this issue Jul 18, 2024

Proof of concept of similarity search with the scivision model #5

Merged

metazool added the documentation Improvements or additions to documentation label Sep 11, 2024

metazool changed the title ~~Demo / lightning talk for plankton feature search~~ Demo / lightning talk for plankton image data flow Sep 11, 2024

metazool mentioned this issue Sep 12, 2024

Deploy and improve the streamlit demo app #33

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demo / lightning talk for plankton image data flow #8

Demo / lightning talk for plankton image data flow #8

metazool commented Jul 9, 2024 •

edited

Loading

metazool commented Aug 5, 2024

metazool commented Aug 5, 2024

metazool commented Aug 8, 2024 •

edited

Loading

Demo / lightning talk for plankton image data flow #8

Demo / lightning talk for plankton image data flow #8

Comments

metazool commented Jul 9, 2024 • edited Loading

metazool commented Aug 5, 2024

metazool commented Aug 5, 2024

metazool commented Aug 8, 2024 • edited Loading

metazool commented Jul 9, 2024 •

edited

Loading

metazool commented Aug 8, 2024 •

edited

Loading