Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable jitter in paired slopegraph plots to aid discrete data visualisations. #181

Open
mlotinga opened this issue Jul 11, 2024 · 10 comments

Comments

@mlotinga
Copy link

mlotinga commented Jul 11, 2024

For discrete data, the data plots generally overlay points or lines over each other, obscuring information.

A facility to add jitter to the data visualisation, without affecting the effect size calculations, would be a standard way to address this, as is implemented (for example) in seaborn.

Applying jitter to the input data is possible but would distort the effect size calculation, which is highly undesirable.

If you could give some advice on where this could be incorporated into the relevant objects, I could have a go at doing it.

Example of the motivating problem:

image

@JAnns98
Copy link
Collaborator

JAnns98 commented Jul 16, 2024

Hi @mlotinga !

For our unpaired plots our design approach is that we would like each data point to be clear and non-overlapping. For this reason, we use the same approach as the seaborn 'swarmplot' which aims to not overlap data points. This is why it (and swarmplot) do not include jitter as a parameter. These plots do, however, end up overlapping data points once they run out of space and hit the 'gutter'. We are currently working on tuning this gutter length. Users can also adjust the dot size which helps a lot for larger sample sizes.

For paired lines it is more challenging (design wise). Perhaps you could elaborate a little further and/or show a seaborn example for the use of jitter in paired line plots?

@mlotinga
Copy link
Author

Ok, thanks for responding. Yes, the issue here is related to paired slopegraph plots only — the illustration in the original post shows how it can be rather difficult to discern meaning from this if the data are discrete.

With regard to seaborn, in the API there is the object.Jitter() method that I presume could potentially be used to alter the placing of datapoints in a plot...

https://seaborn.pydata.org/generated/seaborn.objects.Jitter.html

@mlotinga mlotinga changed the title Enable jitter to aid discrete data visualisations. Enable jitter in paired slopegraph plots to aid discrete data visualisations. Jul 16, 2024
@Jacobluke-
Copy link
Collaborator

Hi @mlotinga , the paired slopegraph plotting is located at this file.

You will have to install nbdev for a easier development process.

@JAnns98
Copy link
Collaborator

JAnns98 commented Jul 18, 2024

I think it can easily be achieved with a simple line or two of code (even without seaborn; e.g., using np.random.uniform):

Screenshot 2024-07-18 at 3 23 26 PM

@mlotinga
Copy link
Author

mlotinga commented Jul 18, 2024

Thanks @Jacobluke- and @JAnns98 for the information. I now have it working.

I added the following plot_kwargs to _effsize_objects.py:

(lines 1009-1011)

slopegraph_xjitter=0,
slopegraph_yjitter=0,
jitter_seed=9876543210, 

I modified plotter.py with

(line 484)

rng = np.random.default_rng(plot_kwargs["jitter_seed"])

and

(lines 493-494)

x_points = [t + plot_kwargs["slopegraph_xjitter"]*rng.standard_t(df=6, size=None) for t in range(x_start, x_start + grp_count)]
y_points = np.array(observation[yvar].tolist()) + plot_kwargs["slopegraph_yjitter"]*rng.standard_t(df=6, size=len(observation[yvar].tolist()))

and I get output like (using slopegraph_xjitter=0, slopegraph_yjitter=0.07, jitter_seed=303):

image

@mlotinga
Copy link
Author

mlotinga commented Jul 18, 2024

Would this be a useful feature to add to the package?

My edits can be viewed here: https://github.com/mlotinga/DABEST-python_devMJBL

@JAnns98
Copy link
Collaborator

JAnns98 commented Jul 18, 2024

Thats great, glad you could get it done! We will discuss internally whether it could be useful to include in the next release :)

P.s. I would think only x-axis jitter would be appropriate?

@mlotinga
Copy link
Author

I guess it depends on the application and data - I think it's good to have the flexibility.

@JAnns98
Copy link
Collaborator

JAnns98 commented Jul 23, 2024

@mlotinga Thanks for this, we will aim to add it into the main package (at least the x-jitter) for the next major release!

@mlotinga
Copy link
Author

  • That's great. After a bit of experimenting I actually found for my application the best visualisation was achieved using a little jitter on both axes. I guess the hesitation on y-axis jittering expressed above might be a concern about data misrepresentation? For the main feature use case of discrete data, it should be easy for users to select a suitable parameter value for yjitter to ensure discrete groups remain visible without causing data confusion. Allowing the flexibility would provide a better set of options to output the clearest visualisation. Seaborn (e.g., regplot) provides this kind of flexibility, leaving the parameter choice to the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants