Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow defining the point reference #62

Closed
flekschas opened this issue Mar 1, 2023 · 5 comments
Closed

Allow defining the point reference #62

flekschas opened this issue Mar 1, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@flekschas
Copy link
Owner

flekschas commented Mar 1, 2023

Currently, selected point are referenced by their index. E.g., scatter.selection([0, 1, 2]) selects the first three points of a dataframe. While this approach works fine, it'd be nice to reference points by some other column of the dataframe as well.

Use Case

Imagine you want to synchronously explore two embeddings with shared point references but non-matching indices. E.g.:

# DataFrame A
| x | y | id  |
| - | - | --- |
| 1 | 0 | 'a' |
| 1 | 1 | 'b' |
| 9 | 9 | 'c' |


# DataFrame B
| x | y | id  |
| - | - | --- |
| 2 | 1 | 'c' |
| 5 | 0 | 'b' |
| 0 | 7 | 'a' |

To synchronously explore the two datasets we'd have to tell jscatter to reference points by the id column

Proposal

Add a new property (called point_id) and method (called id()) that can either be a string referencing a column in the data or an array_like list of point IDs.

Example

Assuming we have the two data frames from above, with the new property/method we could synchronously explore the two dataset as follows:

from jscatter import Scatter, compose

config = dict(x='x', y='y', id='id')

jsc_a = Scatter(data=df_a, **config)
jsc_b = Scatter(data=df_b, **config)

compose([jsc_a, jsc_b], sync_selection=True, sync_hover=True)

Assuming we select the first point in first scatter plot instance, calling jsc_b.selection() would return 'c' (the ID of the first point in data frame B).

@manzt What do you think of this idea?

@flekschas flekschas added the enhancement New feature or request label Mar 1, 2023
@manzt
Copy link
Collaborator

manzt commented Mar 2, 2023

I like the idea. However, I wonder if we could benefit from more deeply integrating with pandas indices rather than introducing a new index-like API. For example, I believe in the example above, you could reindex each dataframe by id and then translate selections using the shared index?

@flekschas
Copy link
Owner Author

💯% agreed. That makes a lot more sense! To rephrase your idea: make scatter.selection() return the index of the dataframe instead of always using the row index

@manzt
Copy link
Collaborator

manzt commented Mar 3, 2023

Yeah exactly, or maybe we have have a flag in the selection getter/setter? cev relies on the selection being indicies.

@flekschas
Copy link
Owner Author

A flag is a good idea. Maybe we can add it to the proposed data() function from #61. I'd not want to add it to selection() simply because the upcoming filter() function should adhere to the same index.

@flekschas
Copy link
Owner Author

Implemented via #64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants