A statistics visualization library built on top of Chart inspired by Seaborn. Amby provides a high level interface to quickly display attractive visualizations. Amby also provides tools to display Charts from both Amby and the Chart package within GHCi.
The simplest plotting function is plot'
. Here's how you might plot the standard normal distribution.
λ> import Amby
λ> import qualified Statistics.Distribution.Normal as Stats
λ> let x = contDistrDomain Stats.standard 10000
λ> let y = contDistrRange Stats.standard x
λ> plot' x y
Notice the tick mark '
after plot'
. This indicates a function that accepts no optional arguments.
This tutorial mirrors the first section of Seaborn's python tutorial.
Use distplot
to view univariate distributions. By default this will create a histogram and fit a kernel density estimate.
λ> z <- random Stats.standard 100
λ> distPlot' z
The distPlot
histogram automatically chooses a reasonable number of bins and counts the data points in each bin. To view the position of each data point you can add a rugplot.
λ> distPlot z $ kde .= False >> rug .= True
Choosing a different number of bins for the histogram can reveal different patterns in the data.
λ> distPlot z $ bins .= 20 >> kde .= False >> rug .= True
Kernel density estimation can be a useful too for plotting the shape of the distribution.
λ> distplot z $ hist .= False >> rug .= True
A kernel density estimation is a summation of several normal distributions, each centered on each of the data points.
λ> import qualified Statistics.Sample as Stats
λ> import qualified Data.Vector.Unboxed as U
λ> let bandwidth = 1.059 * Stats.stdDev z * fromIntegral (U.length z) ** ((-1) / 5)
λ> let xs = linspace (-6) 6 200
λ> let a = U.take 30 z
λ> let foldFn _ b = plot xs (contDistrRange (Stats.normalDistr b bandwidth) xs)
λ> U.foldM foldFn () a >> rugPlot a (color .= K >> linewidth .= 3) >> xlim (-4, 4)
The resulting curve is normalized so the area under it is equal to 1. This is what is provided with the kdePlot
function.
λ> kdePlot z $ shade .= True
The bandwith (bw
) parameter of the KDE controls how tightly the estimation is fit to the data, much like the bin size in a histogram. The default behaviour tries to guess a good value, but it may be helpful to try larger or smaller values.
λ> kdePlot' z >> kdePlot z (bw .= BwScalar 0.2) >> kdePlot z (bw .= BwScalar 2)
You can also control how far past the range of your dataset the curve is drawn. However this only influences how the curve is drawn, not how it is fit.
λ> kdePlot z (cut .= 0 >> shade .= True) >> rugPlot' z
In this section we'll see how to visualize the relationship between a numeric variable and one or more categorical variables.
Boxplots can facilitate easy comparisons across category levels. This kind of plot shows the three quartile values of the distribution along with extreme values. The "whiskers" extend to points that lie within 1.5 IQRs (interquartile range) of the lower and upper quartile, and then observations that fall outside this range are displayed independently. Importantly, this means that each value in the boxplot corresponds to an actual observation in the data:
For convenience we'll use the loadDataset
method from Amby.Utils
to load datasets.
The simplest way to draw a boxplot is to use the boxPlot
function.
λ> ds <- loadDataset tips
λ> head ds
Tip
{ totalBill = 16.99
, tip = 1.01
, sex = "Female"
, smoker = "No"
, day = "Sun"
, time = "Dinner"
, tipSize = 2
}
λ> (b, p, s, k, d, t, _) <- getTipColumns ds
Draw a single horizontal boxplot.
λ> boxPlot' b
Draw a vertical boxplot grouped by a categorical variable.
λ> boxPlot b $ fac .= d >> axis .= YAxis
Draw a vertical boxplot with nested grouping by two categorical variables.
λ> boxPlot b $ fac .= s >> hue .= d >> axis .= YAxis >> color .= G
Draw a boxplot when some bins are empty.
λ> theme springTheme >> boxPlot b (fac .= d >> hue .= t)
Control box order.
λ> boxPlot p $ fac .= changeOrder t ["Dinner", "Lunch"]
If you want to compare more than two categorical variables you can use factorPlot
.
λ> gridTheme cleanTheme >> factorPlot b (fac .= s >> hue .= d >> col .= k)
We can add labels.
λ> factorPlot b $ fac .= s >> hue .= d >> col .= k >> colLabel .= "smoker"
You can compare up to four categorical variables using factorPlot
.
λ> factorPlot b $ fac .= s >> hue .= d >> col .= k >> row .= t
There are several ways to render plots.
First, Amby provides the helper functions save
and saveSvg
that will save a graph to the file .__amby.png
and .__amby.svg
respectively. save
uses the Cairo backend, while saveSvg
uses the Diagrams backend. The Diagrams backend produces better looking charts, but is slower.
λ> save $ distPlot' z
λ> saveSvg $ distPlot' z
Second, you can use any rendering methods that the underlying Chart library provides by converting an AmbyChart ()
or AmbyGrid ()
to a Renderable (LayoutPick Double Double Double)
with the getRenderable
function.
λ> import Graphics.Rendering.Chart.Easy (def)
λ> import Graphics.Rendering.Chart.Backend.Cairo as Cairo
λ> import Graphics.Rendering.Chart.Backend.Diagrams as Diagrams
λ> Cairo.renderableToFile def "myFile.png" $ getRenderable $ distPlot' z
λ> Diagrams.renderableToFile def "myFile.svg" $ getRenderable $ distPlot' z
Third—if you have a terminal that supports images such as iTerm2—you can display charts directly inside the GHCi repl. Just install the imgcat
executable, and the pretty-display
library. See here for further installation instructions.
λ> distPlot' z
You can also specify graphs using a domain and an equation.
λ> plotEq' [0,0.001..4] sqrt
Plotting functions work on both lists and generic vectors of doubles.
λ> plotEq' [0,0.001..4] sqrt
λ> plotEq' (linspace 0 4 4000) sqrt
λ> import Statistics.Distribution.Beta as Stats
λ> :set +m
λ> let plotBeta a b =
λ| let d = Stats.betaDistr a b
λ| x = contDistrDomain d 10000
λ| y = contDistrRange d x
λ| in plot' x y
λ> do
λ| theme cleanTheme
λ| plotBeta 0.5 0.5
λ| plotBeta 5 1
λ| plotBeta 1 3
λ| plotBeta 2 2
λ| plotBeta 2 5
λ| ylim (0.0, 2.5)
To use amby you'll first need to install Chart and gtk2hs if you don't already have them.
Mac OS X
Here are the instructions I used to install Chart and gtk2hs on OS X El Capitan with stack.
stack install Chart-diagrams
brew cask install xquartz
brew install glib cairo gtk gettext fontconfig freetype
Add the following environment variable export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
to .bashrc
or similar file.
stack install alex happy
stack install gtk2hs-buildtools
stack install glib
stack install -- gtk --flag gtk:have-quartz-gtk
stack install Chart-cairo
Linux and Windows
Instructions for installing gtk2hs on Linux and Windows can be found here.
Likewise, run
stack install Chart-diagrams
stack install Chart-cairo
To be able to display charts in ghci with a terminal such as iTerm2 you'll need imgcat
and pretty-display
.
Mac OS X
brew tap eddieantonio/eddieantonio
brew install imgcat
Linux and Windows
For more information visit imgcat's repository
- Add
pretty-display
to your cabal file. stack build
- Place the following in your
.ghci
file. If you're using stack you can put this file at the root of your project.
import Text.Display
:set -interactive-print=Text.Display.dPrint
:def pp (\_ -> return ":set -interactive-print=Text.Display.dPrint")
:def npp (\_ -> return ":set -interactive-print=print")
- Restart ghci.
If using the 'save' or 'saveSvg' functions because your terminal is unable to display images within GHCi you can use a tool such as entr to run a command like open
whenever the file is saved.
ls -d __amby.png | entr -r open /_