Sina plot instead of beeswarm plot? #58

lrq3000 · 2019-08-26T22:05:46Z

Hello,

I appreciate the approach taken by this framework a lot, and I would like to implement it in my publications. However, I would prefer to use a sina plot instead of a beeswarm, as it has 2 advantages:

1- apart from kernel density function estimation, it does not produce an artificial structuring on the data (ie, the "branch-like" lines in the beeswarm),

2- each class's sina plot's width is normalized across all classes, so that we can get an impression of the difference in sample size at a glance.

I think the last point in particular can very well complement the ideas put forward by the DABEST framework. There is a Python implementation of Sina plots in the plotnine package (geom_sina).

Also maybe it would be interesting, if possible at all, to generalize the possibility of using other kinds of plots, as I guess different users might have different preferences?

josesho · 2019-08-27T02:47:21Z

Hi @lrq3000,

I agree about the superiority of sinaplots, especially when the Ns get very large. Our R package features sina plots; you could consider using that in lieu of this Python package.

I'm not familiar with the underlying implementation of plotnine; we use matplotlib and seaborn under the hood. There is an implementation of sinaplots in seaborn which should work. Feel free to submit a pull request if you come up with a working prototype!

Also maybe it would be interesting, if possible at all, to generalize the possibility of using other kinds of plots, as I guess different users might have different preferences?

There are plenty of excellent general-purpose plotting packages for Python already. While we have plans to develop designs for other kinds of differences (e.g. differences in proportions), the DABEST suite will remain focussed on estimation plots: Gardner-Altman plots, and Cumming plots.

lrq3000 · 2019-08-30T03:26:44Z

(working on it, thank you very much, didn't know about seaborn's implementation)

josesho · 2019-08-30T04:12:04Z

Closing it for now; feel free to reference this issue when doing your pull request!

lrq3000 · 2019-09-05T12:21:58Z

About using other kinds of kde visualizations, for example if one has a LOT of samples, like 10000, per group, then a scatter-like plot becomes impractical, whereas a density plot such as a gradient plot or a violin plot would totally solve the issue. Furthermore, future visualizations may allow a better representation of the data, violin plots and sina plots are certainly not the end of it all. That's why I suggest to make the scatter-like plot generic, so that any kind of scatter-like plot can be plugged in. I'll see if this is possible, at worst I'll implement only the sina plot and violin plot.

Meanwhile, is it possible to reopen this issue please, to track the effort until it's done? Thank you :D

josesho · 2019-09-06T03:33:35Z

Furthermore, future visualizations may allow a better representation of the data, violin plots and sina plots are certainly not the end of it all. That's why I suggest to make the scatter-like plot generic, so that any kind of scatter-like plot can be plugged in. I'll see if this is possible, at worst I'll implement only the sina plot and violin plot.

Ah, thank you for clarifying.

Both the original Gardner-Altman and Cumming designs implemented a swarmplot, and I'm strongly inclined to stick to the "display all data" paradigm.

So I think a sina-plot (or some sort of force-directed layout, with dot size scaled to N size) is definitely a worthy effort. In terms of visual grammar, we should keep the half-violins for the bootstrap distributions of the differences, rather than conflating it with the raw data.

In summary, happy to accept a PR for sinaplots in DABEST-Python!

mje-nz · 2019-09-29T22:43:52Z

The Gardner-Altman figures use dot histograms, not swarm plots. In Cummings I can see violin plots (e.g. fig 6.4), strip plots (e.g. fig 6.6), and a whole lot of bare confidence intervals with no data (but I didn't look very hard).

I agree in principle that the data plots should show all the data, but swarm plots badly mis-represent the overall distribution for large samples. For example, compare these plots of two datasets:

For the small dataset I would argue only the box plot is really bad; most of the others give a reasonable impression of the distribution. For the large dataset I would argue the violin plot is the best, followed by the box plot and dot histogram; most of the others give a skewed impression of how big the tail is. The swarm plot is arguably the worst. You probably have different opinions, and that's fine! I expect a sina plot would do well on both datasets.

I think it would be reasonable to keep the default as a swarm plot (since sina plots seem hard) but let users pick a different data plot type. Would you accept a PR for that?

@lrq3000 are you still working on this?

josesho · 2019-09-30T09:11:52Z

Hi @mje-nz,

My own inclination is to somehow drag-drop in @mparker2's seaborn-based implementation. I suppose we could import the relevant .py file from that repo, and give the user an option between swarmplots and sinaplots.

Personally, I'm disinclined to use violin, box, and boxen plots to display rawdata. (Especially the violin plot, which we already use to display the bootstrap effect size.)

I should find some time to work on this in the next few weeks .... 🤞

If you or @lrq3000 can get @mparker2's code working within dabest, I'd be very happy to accept your PR!

lrq3000 · 2019-10-01T02:33:55Z

Yes I am still working on it, first I need to solve #67 :-) Sorry I'm taking some time, it's not complicated, it's just that I spent more time reading statistical literature to get up to date with good practices. I have several other features I'd like to implement, but this one first :-) Help is of course welcome anyway!

ValdarT · 2020-02-28T09:58:48Z

Any developments here? The beeswarm is a problem even with only thousands of data points which is not that many: it is not that informative in this case and takes A LOT of time to plot currently.

lrq3000 · 2020-02-28T13:29:15Z

@ValdarT Sorry I had unexpected events lately and had to stop working on it, but I still plan to, I'll try to finish it next week hopefully (as in any case I need it myself for my current main work).

JAnns98 · 2024-07-17T09:14:36Z

Hi @lrq3000, It's been a long time since this thread was active! We do agree that swarmplot has its own set of issues when dealing with larger sample sizes and are hoping to work on making changes. Do you still have any work-in-progress code for this?

lrq3000 · 2024-07-17T10:18:51Z

Unfortunately I don't have any progress to report on this feature and I don't have time to work on this in the foreseeable future. 17 juil. 2024 11:14:58 JAnns98 ***@***.***>:

…

Hi @lrq3000[https://github.com/lrq3000], It's been a long time since this thread was active! We do agree that swarmplot has its own set of issues when dealing with larger sample sizes and are hoping to work on making changes. Do you still have any work-in-progress code for this? — Reply to this email directly, view it on GitHub[#58 (comment)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AAIRFXWXL3EVIA7NBSNSJADZMYYZDAVCNFSM6AAAAABLAGPO6CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZSHAZDQOBYGM]. You are receiving this because you were mentioned. [Image de pistage][https://github.com/notifications/beacon/AAIRFXV4BBYV2PIFOOKLCS3ZMYYZDA5CNFSM6AAAAABLAGPO6CWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUFCZB5G.gif]

rhuszar · 2024-08-09T16:00:14Z

I ran into the same obstacle to using this in practice.
Plotting every datapoint is really impractical in the age of big data (I have 30k datapoints per group...)

It appears the package is still under development / improvement. Based on this thread, there are many who find this to be a major obstacle to their use case.

Thank you for the paper and this package !

@josesho

josesho added the enhancement label Aug 27, 2019

josesho closed this as completed Aug 30, 2019

josesho reopened this Sep 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sina plot instead of beeswarm plot? #58

Sina plot instead of beeswarm plot? #58

lrq3000 commented Aug 26, 2019

josesho commented Aug 27, 2019

lrq3000 commented Aug 30, 2019

josesho commented Aug 30, 2019

lrq3000 commented Sep 5, 2019

josesho commented Sep 6, 2019 •

edited

Loading

mje-nz commented Sep 29, 2019

josesho commented Sep 30, 2019 •

edited

Loading

lrq3000 commented Oct 1, 2019

ValdarT commented Feb 28, 2020

lrq3000 commented Feb 28, 2020

JAnns98 commented Jul 17, 2024

lrq3000 commented Jul 17, 2024 via email

rhuszar commented Aug 9, 2024 •

edited

Loading

Sina plot instead of beeswarm plot? #58

Sina plot instead of beeswarm plot? #58

Comments

lrq3000 commented Aug 26, 2019

josesho commented Aug 27, 2019

lrq3000 commented Aug 30, 2019

josesho commented Aug 30, 2019

lrq3000 commented Sep 5, 2019

josesho commented Sep 6, 2019 • edited Loading

mje-nz commented Sep 29, 2019

josesho commented Sep 30, 2019 • edited Loading

lrq3000 commented Oct 1, 2019

ValdarT commented Feb 28, 2020

lrq3000 commented Feb 28, 2020

JAnns98 commented Jul 17, 2024

lrq3000 commented Jul 17, 2024 via email

rhuszar commented Aug 9, 2024 • edited Loading

josesho commented Sep 6, 2019 •

edited

Loading

josesho commented Sep 30, 2019 •

edited

Loading

rhuszar commented Aug 9, 2024 •

edited

Loading