Allow defining the point reference #62

flekschas · 2023-03-01T04:47:28Z

Currently, selected point are referenced by their index. E.g., scatter.selection([0, 1, 2]) selects the first three points of a dataframe. While this approach works fine, it'd be nice to reference points by some other column of the dataframe as well.

Use Case

Imagine you want to synchronously explore two embeddings with shared point references but non-matching indices. E.g.:

# DataFrame A
| x | y | id  |
| - | - | --- |
| 1 | 0 | 'a' |
| 1 | 1 | 'b' |
| 9 | 9 | 'c' |


# DataFrame B
| x | y | id  |
| - | - | --- |
| 2 | 1 | 'c' |
| 5 | 0 | 'b' |
| 0 | 7 | 'a' |

To synchronously explore the two datasets we'd have to tell jscatter to reference points by the id column

Proposal

Add a new property (called point_id) and method (called id()) that can either be a string referencing a column in the data or an array_like list of point IDs.

Example

Assuming we have the two data frames from above, with the new property/method we could synchronously explore the two dataset as follows:

from jscatter import Scatter, compose

config = dict(x='x', y='y', id='id')

jsc_a = Scatter(data=df_a, **config)
jsc_b = Scatter(data=df_b, **config)

compose([jsc_a, jsc_b], sync_selection=True, sync_hover=True)

Assuming we select the first point in first scatter plot instance, calling jsc_b.selection() would return 'c' (the ID of the first point in data frame B).

@manzt What do you think of this idea?

The text was updated successfully, but these errors were encountered:

manzt · 2023-03-02T22:56:12Z

I like the idea. However, I wonder if we could benefit from more deeply integrating with pandas indices rather than introducing a new index-like API. For example, I believe in the example above, you could reindex each dataframe by id and then translate selections using the shared index?

flekschas · 2023-03-03T00:54:41Z

💯% agreed. That makes a lot more sense! To rephrase your idea: make scatter.selection() return the index of the dataframe instead of always using the row index

manzt · 2023-03-03T01:09:31Z

Yeah exactly, or maybe we have have a flag in the selection getter/setter? cev relies on the selection being indicies.

flekschas · 2023-03-03T01:48:42Z

A flag is a good idea. Maybe we can add it to the proposed data() function from #61. I'd not want to add it to selection() simply because the upcoming filter() function should adhere to the same index.

flekschas · 2023-04-13T03:54:15Z

Implemented via #64

flekschas added the enhancement label Mar 1, 2023

flekschas mentioned this issue Apr 13, 2023

Use pandas index #64

Merged

flekschas closed this as completed Apr 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow defining the point reference #62

Allow defining the point reference #62

flekschas commented Mar 1, 2023 •

edited

Loading

manzt commented Mar 2, 2023

flekschas commented Mar 3, 2023

manzt commented Mar 3, 2023

flekschas commented Mar 3, 2023

flekschas commented Apr 13, 2023

Allow defining the point reference #62

Allow defining the point reference #62

Comments

flekschas commented Mar 1, 2023 • edited Loading

Use Case

Proposal

Example

manzt commented Mar 2, 2023

flekschas commented Mar 3, 2023

manzt commented Mar 3, 2023

flekschas commented Mar 3, 2023

flekschas commented Apr 13, 2023

flekschas commented Mar 1, 2023 •

edited

Loading