diff --git a/.Rbuildignore b/.Rbuildignore index 16ac7717f1..dbde8f70f3 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -26,3 +26,4 @@ visual_test ^vignettes/profiling.Rmd$ ^cran-comments\.md$ ^LICENSE\.md$ +^vignettes/articles$ diff --git a/DESCRIPTION b/DESCRIPTION index f6e8d727fe..28aac70572 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -269,3 +269,7 @@ VignetteBuilder: knitr RoxygenNote: 7.1.1 Roxygen: list(markdown = TRUE) Encoding: UTF-8 +Config/Needs/Website: + ggtext, + tidyr, + forcats diff --git a/_pkgdown.yml b/_pkgdown.yml index b1533480d0..5506b48ae0 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -243,6 +243,31 @@ reference: - fortify - map_data + +articles: +- title: Building plots + navbar: ~ + contents: + - ggplot2-specs + +- title: Developer + navbar: Developer + contents: + - extending-ggplot2 + - ggplot2-in-packages + - profiling + +- title: FAQ + navbar: FAQ + contents: + - articles/faq-axes + - articles/faq-faceting + - articles/faq-customising + - articles/faq-annotation + - articles/faq-reordering + - articles/faq-bars + + news: releases: - text: "Version 3.3.0" diff --git a/vignettes/articles/faq-annotation.Rmd b/vignettes/articles/faq-annotation.Rmd new file mode 100644 index 0000000000..8a3560a309 --- /dev/null +++ b/vignettes/articles/faq-annotation.Rmd @@ -0,0 +1,214 @@ +--- +title: "FAQ: Annotation" +--- + +```{=html} + +``` +```{r, include = FALSE} +library(ggplot2) +library(dplyr) +knitr::opts_chunk$set( + fig.dpi = 300, + collapse = TRUE, + comment = "#>", + fig.asp = 0.618, + fig.width = 6, + out.width = "80%" + ) +``` + + + +### Why is annotation created with `geom_text()` pixellated? How can I make it more crisp? + +You should use `annotate(geom = "text")` instead of `geom_text()` for annotation. + +
+ +See example + +In the following visualisation we have annotated a histogram with a red line and red text to mark the mean. Note that both the line and the text appears pixellated/fuzzy. + +```{r} +mean_hwy <- round(mean(mpg$hwy), 2) + +ggplot(mpg, aes(x = hwy)) + + geom_histogram(binwidth = 2) + + geom_segment( + x = mean_hwy, xend = mean_hwy, + y = 0, yend = 35, + color = "red" + ) + + geom_text( + x = mean_hwy, y = 40, + label = paste("mean\n", mean_hwy), + color = "red" + ) +``` + +This is because `geom_text()` draws the geom once per each row of the data frame, and plotting these on top of each other. For annotation (as opposed to plotting the data using text as geometric objects to represent each observation) use `annotate()` instead. + + +```{r} +ggplot(mpg, aes(x = hwy)) + + geom_histogram(binwidth = 2) + + annotate("segment", + x = mean_hwy, xend = mean_hwy, y = 0, yend = 35, + color = "red" + ) + + annotate("text", + x = mean_hwy, y = 40, + label = paste("mean =", mean_hwy), + color = "red" + ) +``` + +
+ +### How can I make sure all annotation created with `geom_text()` fits in the bounds of the plot? + +Set `vjust = "inward"` and `hjust = "inward"` in `geom_text()`. + +
+ +See example + +Suppose you have the following data frame and visualization. The labels at the edges of the plot are cut off slightly. + +```{r} +df <- tibble::tribble( + ~x, ~y, ~name, + 2, 2, "two", + 3, 3, "three", + 4, 4, "four" +) + +ggplot(df, aes(x = x, y = y, label = name)) + + geom_text(size = 10) +``` + +You could manually extend axis limits to avoid this, but a more straightforward approach is to set `vjust = "inward"` and `hjust = "inward"` in `geom_text()`. + +```{r} +ggplot(df, aes(x = x, y = y, label = name)) + + geom_text(size = 10, vjust = "inward", hjust = "inward") +``` + +
+ +### How can I annotate my bar plot to display counts for each bar? + +Either calculate the counts ahead of time and place them on bars using `geom_text()` or let `ggplot()` calculate them for you and then add them to the plot using `stat_coun()` with `geom = "text"`. + +
+ +See example + +Suppose you have the following bar plot and you want to add the number of cars that fall into each `drv` level on their respective bars. + +```{r} +ggplot(mpg, aes(x = drv)) + + geom_bar() +``` + +One option is to calculate the counts with `dplyr::count()` and then pass them to the `label` mapping in `geom_text()`. +Note that we expanded the y axis limit to get the numbers to fit on the plot. + +```{r} +mpg %>% + dplyr::count(drv) %>% + ggplot(aes(x = drv, y = n)) + + geom_col() + + geom_text(aes(label = n), vjust = -0.5) + + coord_cartesian(ylim = c(0, 110)) +``` + +Another option is to let `ggplot()` do the counting for you, and access these counts with `..count..` that is mapped to the labels to be placed on the plot with `stat_count()`. + +```{r} +ggplot(mpg, aes(x = drv)) + + geom_bar() + + stat_count(geom = "text", aes(label = ..count..), vjust = -0.5) + + coord_cartesian(ylim = c(0, 110)) +``` + +
+ +### How can I annotate my stacked bar plot to display counts for each segment? + +First calculate the counts for each segment (e.g. with `dplyr::count()`) and then place them on the bars with `geom_text()` using `position_stack(vjust = 0.5)` in the `position` argument to place the values in the middle of the segments. + +
+ +See example + +Suppose you have the following stacked bar plot. + +```{r} +ggplot(mpg, aes(x = class, fill = drv)) + + geom_bar() +``` + +You can first calculate the counts for each segment with `dplyr::count()`, which will place these values in a column called `n`. + +```{r} +mpg %>% + count(class, drv) +``` + +You can then pass this result directly to `ggplot()`, draw the segments with appropriate heights with `y = n` in the `aes`thetic mapping and `geom_col()` to draw the bars, and finally place the counts on the plot with `geom_text()`. + +```{r} +mpg %>% + count(class, drv) %>% + ggplot(aes(x = class, fill = drv, y = n)) + + geom_col() + + geom_text(aes(label = n), size = 3, position = position_stack(vjust = 0.5)) +``` + +
+ +### How can I display proportions (relative frequencies) instead of counts on a bar plot? + +Either calculate the prpportions ahead of time and place them on bars using `geom_text()` or let `ggplot()` calculate them for you and then add them to the plot using `stat_coun()` with `geom = "text"`. + +
+ +See example + +Suppose you have the following bar plot but you want to display the proportion of cars that fall into each `drv` level, instead of the count. + +```{r} +ggplot(mpg, aes(x = drv)) + + geom_bar() +``` + +One option is to calculate the proportions with `dplyr::count()` and then use `geom_col()` to draw the bars + +```{r} +mpg %>% + dplyr::count(drv) %>% + mutate(prop = n / sum(n)) %>% + ggplot(aes(x = drv, y = prop)) + + geom_col() +``` + +Another option is to let `ggplot()` do the calculation of proportions for you, and access these counts with `..prop..`. +Note that we also need to the `group = 1` mapping for this option. + +```{r} +ggplot(mpg, aes(x = drv, y = ..prop.., group = 1)) + + geom_bar() +``` + +
+ diff --git a/vignettes/articles/faq-axes.Rmd b/vignettes/articles/faq-axes.Rmd new file mode 100644 index 0000000000..ba718d7d79 --- /dev/null +++ b/vignettes/articles/faq-axes.Rmd @@ -0,0 +1,487 @@ +--- +title: "FAQ: Axes" +--- + +```{=html} + +``` +```{r, include = FALSE} +library(ggplot2) +knitr::opts_chunk$set( + fig.dpi = 300, + collapse = TRUE, + comment = "#>", + fig.asp = 0.618, + fig.width = 6, + out.width = "80%") +``` + +## Label placement + +### How can I rotate the axis tick labels in ggplot2 so that tick labels that are long character strings don't overlap? + +Set the angle of the text in the `axis.text.x` or `axis.text.y` components of the `theme()`, e.g. `theme(axis.text.x = element_text(angle = 90))`. + +
+ +See example + +In the following plot the labels on the x-axis are overlapping. + +```{r msleep-order-sleep-total} +ggplot(msleep, aes(x = order, y = sleep_total)) + + geom_boxplot() +``` + +- Rotate axis labels: We can do this by components of the `theme()`, specifically the `axis.text.x` component. Applying some vertical and horizontal justification to the labels centers them at the axis ticks. The `angle` can be set as desired within the 0 to 360 degree range, here we set it to 90 degrees. + +```{r msleep-order-sleep-total-rotate} +ggplot(msleep, aes(x = order, y = sleep_total)) + + geom_boxplot() + + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +``` + +- Flip the axes: Use the y-axis for long labels. + +```{r msleep-order-sleep-total-flip} +ggplot(msleep, aes(y = order, x = sleep_total)) + + geom_boxplot() +``` + +- Dodge axis labels: Add a `scale_*()` layer, e.g. `scale_x_continuous()`, `scale_y_discrete()`, etc., and customise the `guide` argument with the `guide_axis()` function. In this case we want to customise the x-axis, and the variable on the x-axis is discrete, so we'll use `scale_x_continuous()`. In the `guide` argument we use the `guide_axis()` and specify how many rows to dodge the labels into with `n.dodge`. This is likely a trial-and-error exercise, depending on the lengths of your labels and the width of your plot. In this case we've settled on 3 rows to render the labels. + +```{r msleep-order-sleep-total-dodge} +ggplot(msleep, aes(x = order, y = sleep_total)) + + geom_boxplot() + + scale_x_discrete(guide = guide_axis(n.dodge = 3)) +``` + +- Omit overlapping labels: Alternatively, you can set `guide_axis(check.overlap = TRUE)` to omit axis labels that overlap. ggplot2 will prioritize the first, last, and middle labels. Note that this option might be more preferable for axes representing variables that have an inherent ordering that is obvious to the audience of the plot, so that it's trivial to guess what the missing labels are. (This is not the case for the following plot.) + +```{r msleep-order-sleep-total-check-overlap} +ggplot(msleep, aes(x = order, y = sleep_total)) + + geom_boxplot() + + scale_x_discrete(guide = guide_axis(check.overlap = TRUE)) +``` + +
+ +### How can I remove axis labels in ggplot2? + +Add a `theme()` layer and set relevant arguments, e.g. `axis.title.x`, `axis.text.x`, etc. to `element_blank()`. + +
+ +See example + +Suppose we want to remove the axis labels entirely. + +```{r ref.label="msleep-order-sleep-total"} +``` + +- Remove x or y axis labels: If you want to modify just one of the axes, you can do so by modifying the components of the `theme()`, setting the elements you want to remove to `element_blank()`. You would replace `x` with `y` for applying the same update to the y-axis. Note the distinction between `axis.title` and `axis.ticks` -- `axis.title` is the name of the variable and `axis.text` is the text accompanying each of the ticks. + +```{r} +ggplot(msleep, aes(x = order, y = sleep_total)) + + geom_boxplot() + + theme( + axis.title.x = element_blank(), + axis.text.x = element_blank(), + axis.ticks.x = element_blank() + ) +``` + +- Remove all axis labels: You can use `theme_void()` to remove all theming elements. Note that this might remove more features than you like. For finer control over the theme, see below. + +```{r} +ggplot(msleep, aes(x = order, y = sleep_total)) + + geom_boxplot() + + theme_void() +``` + +
+ +### How can I add multi-row axis labels with a grouping variable? + +You can do this by either by using `interaction()` to map the interaction of the variable you're plotting and the grouping variable to the `x` or `y` aesthetic. + +
+ +See example + +Suppose you have the following data on sales for each quarter across two years. + +```{r} +library(tibble) + +sales <- tribble( + ~value, ~quarter, ~year, + 10, "Q1", 2020, + 15, "Q2", 2020, + 15, "Q3", 2020, + 20, "Q4", 2020, + 10, "Q1", 2021, + 25, "Q2", 2021, + 30, "Q3", 2021, + 30, "Q4", 2021 +) +``` + +You can create a line plot of these data and facet by `year` to group the quarters for each year together. + +```{r} +ggplot(sales, aes(x = quarter, y = value, group = 1)) + + geom_line() + + facet_wrap(~year) +``` + +However it might be preferable to plot all points in a single plot and indicate on the x-axis that the first Q1 to Q4 are in 2020 and the second are in 2021. + +To achieve this, map the `interaction()` of `quarter` and `year` to the `x` aesthetic. + +```{r} +ggplot(sales, aes(x = interaction(quarter, year), y = value, group = 1)) + + geom_line() +``` + +This achieves the desired result for the line, however the labeling in the x-axis is very busy and difficult to read. +To clean this up (1) clip the plotting area with `coord_cartesian()`, (2) remove the axis labels and add a wider margin at the bottom of the plot with `theme()`, (3) place axis labels indicating quarters underneath the plot, and (4) underneath those labels, place annotation indicating years. +Note that the x-coordinates of the year labels are manually assigned here, but if you had many more years, you might write some logic to calculate their placement. + +```{r} +ggplot(sales, aes(x = interaction(quarter, year), y = value, group = 1)) + + geom_line() + + coord_cartesian(ylim = c(9, 32), expand = FALSE, clip = "off") + + theme( + plot.margin = unit(c(1, 1, 3, 1), "lines"), + axis.title.x = element_blank(), + axis.text.x = element_blank() + ) + + annotate(geom = "text", x = seq_len(nrow(sales)), y = 8, label = sales$quarter, size = 3) + + annotate(geom = "text", x = c(2.5, 6.5), y = 6, label = unique(sales$year), size = 4) +``` + +This approach works with other geoms as well. +For example, you might can create a bar plot representing the same data using the following. + +```{r} +ggplot(sales, aes(x = interaction(quarter, year), y = value)) + + geom_col() + + coord_cartesian(ylim = c(0, 32), expand = FALSE, clip = "off") + + annotate(geom = "text", x = seq_len(nrow(sales)), y = -1, label = sales$quarter, size = 3) + + annotate(geom = "text", x = c(2.5, 6.5), y = -3, label = unique(sales$year), size = 4) + + theme( + plot.margin = unit(c(1, 1, 3, 1), "lines"), + axis.title.x = element_blank(), + axis.text.x = element_blank() + ) +``` + +If it's undesirable to have the bars flush against the edges of the plot, a similar result can be achieved by leveraging faceting and removing the space between facets to create the appearance of a single plot. +However note that the space between the bars for 2020 Q4 and 2021 Q1 is greater than the space between the other bars. + +```{r} +ggplot(sales, aes(x = quarter, y = value)) + + geom_col() + + facet_wrap(~year, strip.position = "bottom") + + theme( + panel.spacing = unit(0, "lines"), + strip.background = element_blank(), + strip.placement = "outside" + ) + + labs(x = NULL) +``` + +
+ +## Label formatting and customization + +### How can I customize the text shown on the axis labels? + +Add a `scale_*()` layer, e.g. `scale_x_continuous()`, `scale_y_discrete()`, etc., and add custom labels to the `labels` argument. + +
+ +See example + +Suppose you want to give more informative labels for the type of drive train. + +```{r} +ggplot(mpg, aes(y = drv)) + + geom_bar() +``` + +- Use the `labels` argument in the appropriate `scale_*()` function. You can find a list of these functions [here](https://ggplot2.tidyverse.org/reference/index.html#section-scales). Type of drive train (`drv`) is a discrete variable on the y-axis, so we'll adjust the labels in `scale_y_discrete()`. One option is to list the labels in the same order as the levels. Note that we start from the bottom and go up, just like we would if the variable was numeric/continuous. + +```{r} +ggplot(mpg, aes(y = drv)) + + geom_bar() + + scale_y_discrete( + labels = c("Front wheel drive", "Rear wheel drive", "Four wheel drive") + ) +``` + +- Another approach is to use a named list. This approach not only makes the relabelling more explicit, but it also means you don't need to worry about the order of the levels. + +```{r} +ggplot(mpg, aes(y = drv)) + + geom_bar() + + scale_y_discrete( + labels = c( + "f" = "Front wheel drive", + "r" = "Rear wheel drive", + "4" = "Four wheel drive" + ) + ) +``` + +
+ +### How can I stop R from using scientific notation on axis labels? + +Use `scales::label_number()` to force decimal display of numbers. +You will first need to add a `scale_*()` layer (e.g. `scale_x_continuous()`, `scale_y_discrete()`, etc.) and customise the `labels` argument within this layer with this function. + +
+ +See example + +By default, large numbers on the axis labels in the following plot are shown in scientific notation. + +```{r} +ggplot(txhousing, aes(x = median, y = volume)) + + geom_point() +``` + +The [**scales**](https://scales.r-lib.org/) package offers a large number of functions to control the formatting of axis labels and legend keys. +Use `scales::label_number()` to force decimal display of numbers rather than using scientific notation or use `scales::label_comma()` to insert a comma every three digits. + +```{r} +library(scales) +ggplot(txhousing, aes(x = median, y = volume)) + + geom_point() + + scale_x_continuous(labels = label_number()) + + scale_y_continuous(labels = label_comma()) +``` + +
+ +### How can I change the number of decimal places on axis labels? + +Set the `accuracy` in `scales::label_number()` to the desired level of decimal places, e.g. 0.1 to show 1 decimal place, 0.0001 to show 4 decimal places, etc. +You will first need to add a `scale_*()` layer (e.g. `scale_x_continuous()`, `scale_y_discrete()`, etc.) and customise the `labels` argument within this layer with this function. + +
+ +See example + +Suppose you want to increase/decrease the number of decimal spaces shown in the axis text in the following plot. + +```{r} +ggplot(seals, aes(x = delta_long, y = delta_lat)) + + geom_point() +``` + +The [**scales**](https://scales.r-lib.org/) package offers a large number of functions to control the formatting of axis labels and legend keys. +Use `scales::label_number()` where the `accuracy` argument indicates the number to round to, e.g. 0.1 to show 1 decimal place, 0.0001 to show 4 decimal places, etc. + +```{r} +library(scales) +ggplot(seals, aes(x = delta_long, y = delta_lat)) + + geom_point() + + scale_x_continuous(labels = label_number(accuracy = 0.1)) + + scale_y_continuous(labels = label_number(accuracy = 0.0001)) +``` + +
+ +### How can I add percentage symbols (%) to axis labels? + +Use `scales::label_percent()`, which will place a `%` *after* the number, by default. +You can customise where `%` is placed using the `prefix` and `suffix` arguments, and also `scale` the numbers if needed. +You will first need to add a `scale_*()` layer (e.g. `scale_x_continuous()`, `scale_y_discrete()`, etc.) and customise the `labels` argument within this layer with this function. + +
+ +See example + +The variable on the y-axis of the following line plot (`psavert`) indicates the personal savings rate, which is in percentages. + +```{r} +ggplot(economics, aes(x = date, y = psavert, group = 1)) + + geom_line() +``` + +With `scales::label_percent()` you can add `%`s after the numbers shown on the axis to make the units more clear. + +```{r} +ggplot(economics, aes(x = date, y = psavert, group = 1)) + + geom_line() + + scale_y_continuous(labels = scales::label_percent(scale = 1, accuracy = 1)) +``` + +where the `accuracy` argument indicates the number to round to, e.g. 0.1 to show 1 decimal place, 0.0001 to show 4 decimal places, etc. + +```{r} +library(scales) +ggplot(seals, aes(x = delta_long, y = delta_lat)) + + geom_point() + + scale_x_continuous(labels = label_number(accuracy = 0.1)) + + scale_y_continuous(labels = label_number(accuracy = 0.0001)) +``` + +
+ +### How can I add superscripts and subscripts to axis labels? + +You can either use `bquote()` to parse mathematical expressions or use the [**ggtext**](https://wilkelab.org/ggtext/) package to write the expression using Markdown or HTML syntax. + +
+ +See example + +In the following plot `cty` is squared and `hwy` is log transformed. + +```{r} +ggplot(mpg, aes(x = cty^2, y = log(hwy))) + + geom_point() +``` + +- Use `bquote()` function to parse mathematical expressions. + +```{r} +ggplot(mpg, aes(x = cty^2, y = log(hwy, base = 10))) + + geom_point() + + labs( + x = bquote(cty^2), + y = bquote(paste(log[10], "(hwy)")) + ) +``` + +- If you're already familiar with Markdown and HTML, you might prefer using the [ggtext](https://wilkelab.org/ggtext/) package instead. In Markdown we can write the axis labels as `cty2` and `log10(hwy)` for x and y axes, respectively. Then, we tell ggplot2 to interpret the axis labels as Markdown and not as plain text by setting `axis.title.x` and `axis.title.y` to `ggtext::element_markdown()`. + +```{r} +ggplot(mpg, aes(x = cty^2, y = log(hwy, base = 10))) + + geom_point() + + labs( + x = "cty2", + y = "log10(hwy)" + ) + + theme( + axis.title.x = ggtext::element_markdown(), + axis.title.y = ggtext::element_markdown() + ) +``` + +
+ +## Custom breaks + +### How can I increase / decrease the number of axis ticks? + +Customise the `breaks` and `minor_breaks` in `scale_x_continuous()`, `scale_y_continuous()`, etc. + +
+ +See example + +Suppose you want to customise the major and minor grid lines on both the x and the y axes of the following plot. + +```{r} +ggplot(mpg, aes(x = cty, y = hwy)) + + geom_point() +``` + +You can set `breaks` and `minor_breaks` in `scale_x_continuous()` and `scale_y_continuous()` as desired. +For example, on the x-axis we have major and minor grid breaks defined as a sequence and on the y-axis we have explicitly stated where major breaks should appear as a vector (the value stated are randomly selected for illustrative purposes only, they don't follow a best practice) and we have completely turned off minor grid lines by setting `minor_breaks` to `NULL`. + +```{r} +ggplot(mpg, aes(x = cty, y = hwy)) + + geom_point() + + scale_x_continuous( + breaks = seq(9, 35, 3), + minor_breaks = seq(8.5, 35.5, 1) + ) + + scale_y_continuous( + breaks = c(12, 23, 36, 41), + minor_breaks = NULL + ) +``` + +
+ +### How can I control the number of major and minor grid lines shown on the plot? + +Customise the `breaks` and `minor_breaks` in `scale_x_continuous()`, scale_y\_continuous()\`, etc. +See [How can I increase / decrease the number of axis ticks?](#how-can-i-increase-decrease-the-number-of-axis-ticks-) +for more detail. + +
+ +See example + +Note that the question was about grid lines but we answered it using breaks. +This is because ggplot2 will place major grid lines at each break supplied to `breaks` and minor grid lines at each break supplied to `minor_breaks`. + +
+ +### How can I remove the space between the plot and the axis? + +Remove the padding around the data entirely using by setting `expand = c(0, 0)` within the `scale_x_continuous()`, `scale_y_discrete()`, etc. layers. + +
+ +See example + +- Remove all padding: Suppose you want to remove the padding around the heat map so it's flush against the axes. + +```{r} +ggplot(faithfuld, aes(waiting, eruptions)) + + geom_raster(aes(fill = density)) +``` + +Since both x and y variables are continuous, we set `expand = c(0, 0)` in both `scale_x_continuous()` and `scale_y_continuous()`. + +```{r} +ggplot(faithfuld, aes(waiting, eruptions)) + + geom_raster(aes(fill = density)) + + scale_x_continuous(expand = c(0, 0)) + + scale_y_continuous(expand = c(0, 0)) +``` + +- Remove some of the padding: Suppose you want to remove the padding below the bars and the x-axis only. + +```{r} +ggplot(mpg, aes(drv)) + + geom_bar() +``` + +You would make this adjustment on `scale_y_continuous()` since that padding is in the vertical direction. + +```{r} +ggplot(mpg, aes(drv)) + + geom_bar() + + scale_y_continuous(expand = c(0, 0)) +``` + +However note that this removes the padding at the bottom of the bars as well as on top. +By default, ggplot2 The expands the scale by 5% on each side for continuous variables and by 0.6 units on each side for discrete variables. +To keep the default expansion on top while removing it at the bottom, you can use the following. +The `mult` argument in `expansion()` takes a multiplicative range expansion factors. +Given a vector of length 2, the lower limit is expanded by `mult[1]` (in this case 0) and the upper limit is expanded by `mult[2]` (in this case 0.05). + +```{r} +ggplot(mpg, aes(drv)) + + geom_bar() + + scale_y_continuous(expand = expansion(mult = c(0, 0.05))) +``` + +
diff --git a/vignettes/articles/faq-bars.Rmd b/vignettes/articles/faq-bars.Rmd new file mode 100644 index 0000000000..3fbd501ce9 --- /dev/null +++ b/vignettes/articles/faq-bars.Rmd @@ -0,0 +1,374 @@ +--- +title: "FAQ: Barplots" +--- + +```{=html} + +``` +```{r, include = FALSE} +library(ggplot2) +library(dplyr) +library(tidyr) + +knitr::opts_chunk$set( + fig.dpi = 300, + collapse = TRUE, + comment = "#>", + fig.asp = 0.618, + fig.width = 6, + out.width = "80%" + ) +``` + +## Colors + +### How can I change the color of the bars in my bar plot? + +If using the same color for all bars, define the `fill` argument in `geom_bar()` (or `geom_col()`). +If assigning color based on another variable, map the variable to the `fill` `aes`thetic, and if needed, use one of the `scale_fill_*()` functions to set colors. + +
+ +See example + +You can set all bars to be a given color with the `fill` argument of `geom_bar()`. + +```{r} +ggplot(mpg, aes(x = drv)) + + geom_bar(fill = "blue") +``` + +Alternatively, if the colors should be based on a variable, this should be should happen in the `aes()` mapping. + +```{r} +ggplot(mpg, aes(x = drv, fill = drv)) + + geom_bar() +``` + +And if you want to then customize the colors, one option is `scale_fill_manual()`, which allows you to manually assign colors to each bar. +See other `scale_fill_*()` functions for more options for color choices. + +```{r} +ggplot(mpg, aes(x = drv, fill = drv)) + + geom_bar() + + scale_fill_manual(values = c("purple", "orange", "darkblue")) +``` + +
+ +## Spacing and widths + +### How can I increase the space between the bars in my bar plot? + +Set the `width` of `geom_bar()` to a small value to obtain narrower bars with more space between them. + +
+ +See example + +By default, the `width` of bars is `0.9` (90% of the resolution of the data). +You can set this argument to a lower value to get bars that are narrower with more space between them. + +```{r} +ggplot(mpg, aes(x = drv)) + + geom_bar(width = 0.5) + +ggplot(mpg, aes(x = drv)) + + geom_bar(width = 0.1) +``` + +
+ +### How can I remove the space between the bars and the x-axis? + +Adjust the `expand` argument in `scale_y_continuous()`, e.g. add `scale_y_continuous(expand = expansion(mult = c(0, 0.05)))` to remove the expansion on the lower end of the y-axis but keep the expansion on the upper end of the y-axis at 0.05 (the default expansion for continuous scales). + +
+ +See example + +By default ggplot2 expands the axes so the geoms aren't flush against the edges of the plot. + +```{r} +ggplot(mpg, aes(x = drv)) + + geom_bar() +``` + +To remove the spacing between the bars and the x-axis, but keep the spacing between the bars and the top of the plot, use the following. + +```{r} +ggplot(mpg, aes(x = drv)) + + geom_bar() + + scale_y_continuous(expand = expansion(mult = c(0, 0.05))) +``` + +To achieve the opposite, switch the values in `mult`. +Note that the tallest bar is now flush against top of the plot. + +```{r} +ggplot(mpg, aes(x = drv)) + + geom_bar() + + scale_y_continuous(expand = expansion(mult = c(0.05, 0))) +``` + +To adjust spacing around the x-axis, adjust the `expand` argument in `scale_x_discrete()`. +Note that this places the bars flush against the left side and leaves some space on the right side. + +```{r} +ggplot(mpg, aes(x = drv)) + + geom_bar() + + scale_x_discrete(expand = expansion(add = c(0, 0.6))) +``` + +The default look of a bar plot can be achieved with the following. + +```{r} +ggplot(mpg, aes(x = drv)) + + geom_bar() + + scale_x_discrete(expand = expansion(add = 0.6)) + + scale_y_continuous(expand = expansion(mult = 0.05)) +``` + +
+ +### How do I ensure that bars on a dodged bar plot have the same width? + +Set `position = position_dodge2(preserve = "single")` in `geom_bar()`. + +
+ +See example + +In the following plot the bars have differing widths within each level of `drv` as there are differing levels of `class` represented. + +```{r} +ggplot(mpg, aes(x = drv, fill = class)) + + geom_bar(position = "dodge") +``` + +You can use `position_dodge2()` with `preserve = "single"` to address this. + +```{r} +ggplot(mpg, aes(x = drv, fill = class)) + + geom_bar(position = position_dodge2(preserve = "single")) +``` + +
+ +## Stacked bar plots + +### How can I create a stacked bar plot displaying a conditional distribution where each stack is scaled to sum to 100%? + +Use `position = "fill"` in `geom_bar()` or `geom_col()`. +If you also want to show percentages on the axis, use `scales::label_percent()`. + +
+ +See example + +The following plot is useful for comparing counts but not as useful for comparing proportions, which is what you need if you want to be able to make statements like "in this sample, it's more likely to have a two-seater car that has rear-wheel drive than an SUV that has rear-wheel drive". + +```{r} +ggplot(mpg, aes(y = class, fill = drv)) + + geom_bar() +``` + +`position = "fill"` will generate a bar plot with bars of equal length and the stacks in each bar will show the proportion of `drv` for that particular `class`. + +```{r} +ggplot(mpg, aes(y = class, fill = drv)) + + geom_bar(position = "fill") +``` + +If you want to show percentages instead of proportions on the x-axis, you can define this in `scale_x_continuous()` with `scales::label_percent()`. + +```{r} +ggplot(mpg, aes(y = class, fill = drv)) + + geom_bar(position = "fill") + + scale_x_continuous(name = "percentage", labels = scales::label_percent(accuracy = 1)) +``` + +
+ +### How can I create a stacked bar plot based on data from a contingency table of to categorical variables? + +First reshape the data (e.g. with `tidyr::pivot_longer()`) so that there is one row per each combination of the levels of the categorical variables, then use `geom_col()` to draw the bars. + +
+ +See example + +Suppose you have the following data from an opinion poll, where the numbers in the cells represent the number of responses for each party/opinion combination. + +```{r} +poll <- tribble( + ~party, ~agree, ~disagree, ~no_opinion, + "Democrat", 20, 30, 20, + "Republican", 15, 20, 10, + "Independent", 10, 5, 0 +) +``` + +You can first pivot the data longer to obtain a data frame with one row per party/opinion combination and a new column, `n`, for the number of responses that fall into that category. + +```{r} +poll_longer <- poll %>% + pivot_longer( + cols = -party, + names_to = "opinion", + values_to = "n" + ) + +poll_longer +``` + +Then, you can pass this result to `ggplot()` and create a bar for each `party` on the `y` (or `x`, if you prefer vertical bars) axis and fill the bars in with number of responses for each `opinion`. + +```{r} +ggplot(poll_longer, aes(y = party, fill = opinion, x = n)) + + geom_col() +``` + +To plot proportions (relative frequencies) instead of counts, use `position = "fill"` in `geom_col()`. + +```{r} +ggplot(poll_longer, aes(y = party, fill = opinion, x = n)) + + geom_col(position = "fill") + + xlab("proportion") +``` + +
+ +### How can I make a grouped bar plot? + +Map the variable you want to group by to the `x` or `y` `aes`thetic, map the variable you want to color the vars by to the `fill` aesthetic, and set `position = "dodge"` in `geom_bar()`. + +
+ +See example + +Suppose you have data from a survey with three questions, where respondents select "Agree" or "Disagree" for each question. + +```{r} +survey <- tibble::tribble( + ~respondent, ~q1, ~q2, ~q3, + 1, "Agree", "Agree", "Disagree", + 2, "Disagree", "Agree", "Disagree", + 3, "Agree", "Agree", "Disagree", + 4, "Disagree", "Disagree", "Agree" +) +``` + +You'll first want to reshape these data so that each row represents a respondent / question pair. +You can do this with `tidyr::pivot_longer()`. +Then, pass the resulting longer data frame to `ggplot()` group responses for each question together. + +```{r} +survey %>% + tidyr::pivot_longer( + cols = -respondent, + names_to = "question", + values_to = "response" + ) %>% + ggplot(aes(x = question, fill = response)) + + geom_bar(position = "dodge") +``` + +
+ +### How can I make a bar plot of group means? + +Either calculate the group means first and use `geom_col()` to draw the bars or let ggplot2 calculate the means with `stat_summary()` with `fun = "mean"` and `geom = "bar"`. + +
+ +See example + +One option for calculating group means is using `dplyr::group_by()` followed by `dplyr::summarise()`. +Then, you can pass the resulting data frame to `ggplot()` and plot bars using `geom_col()`. + +```{r} +mpg %>% + group_by(drv) %>% + summarise(mean_hwy = mean(hwy)) %>% + ggplot(aes(x = drv, y = mean_hwy)) + + geom_col() +``` + +Alternatively, you can use `stat_summary()` to let ggplot2 calculate and plot the means. + +```{r} +ggplot(mpg, aes(x = drv, y = hwy)) + + stat_summary(fun = "mean", geom = "bar") +``` + +
+ +## Axes and axis limits + +### Why do the bars on my plot disappear when I specify an axis range with `ylim()`? How can I get the bars to show up within a given axis range? + +`ylim()` is a shortcut for supplying the `limits` argument to individual scales. +When either of these is set, any values outside the limits specified are replaced with `NA`. +Since the bars naturally start at `y = 0`, replacing part of the bars with `NA`s results in the bars entirely disappearing from the plot. +For changing axis limits without dropping data observations, set limits in `coord_cartesian()` instead. +Also note that this will result in a deceiving bar plot, which should be avoided in general. + +
+ +See example + +In the following plot the y-axis is limited to 20 to 120, and hence the bars are not showing up. + +```{r} +ggplot(mpg, aes(x = drv)) + + geom_bar() + + ylim(c(20, 120)) +``` + +In order to obtain a bar plot with limited y-axis, you need to instead set the limits in `coord_cartesian()`. + +```{r} +ggplot(mpg, aes(x = drv)) + + geom_bar() + + coord_cartesian(ylim = c(20,110)) +``` + +This is, indeed, a deceiving plot. +If you're using a bar plot to display values that could not take the value of 0, you might choose a different geom instead. +For example, if you have the following data and plot. + +```{r} +df <- tibble::tribble( + ~x, ~y, + "A", 1050, + "B", 1100, + "C", 1150 +) + +ggplot(df, aes(x = x, y = y)) + + geom_col() +``` + +Also suppose that you want to cut off the bars at `y = 1000` since you know that the variable you're plotting cannot take a value less than 1000, you might use `geom_point()` instead. + +```{r} +# don't do this +ggplot(df, aes(x = x, y = y)) + + geom_col() + + coord_cartesian(ylim = c(1000, 1150)) + +# do this +ggplot(df, aes(x = x, y = y)) + + geom_point(size = 3) +``` + +
diff --git a/vignettes/articles/faq-customising.Rmd b/vignettes/articles/faq-customising.Rmd new file mode 100644 index 0000000000..a8780e3ad7 --- /dev/null +++ b/vignettes/articles/faq-customising.Rmd @@ -0,0 +1,450 @@ +--- +title: "FAQ: Customising" +--- + +```{=html} + +``` +```{r, include = FALSE} +library(ggplot2) +library(tibble) +knitr::opts_chunk$set( + fig.dpi = 300, + collapse = TRUE, + comment = "#>", + fig.asp = 0.618, + fig.width = 6, + out.width = "80%" + ) +``` + +## Legends + +### How can I change the legend title? + +Change the label for the aesthetic the legend is drawn for in `labs()`. + +
+ +See example + +By default your legend label will be the name of the variable that is mapped to the aesthetic the legend is drawn for. +You can change the title of your legend using `labs()`. + +```{r} +ggplot(mpg, aes(x = hwy, y = cty, color = drv)) + + geom_point() + + labs(color = "Drive train") +``` + +If a legend is drawn for multiple aesthetics, you'll want to update the title for all of them. + +```{r} +# not this +ggplot(mpg, aes(x = hwy, y = cty, color = drv, shape = drv)) + + geom_point() + + labs(color = "Drive train") + +# but this +ggplot(mpg, aes(x = hwy, y = cty, color = drv, shape = drv)) + + geom_point() + + labs(color = "Drive train", shape = "Drive train") +``` + +
+ +### How can I increase the spacing between legend keys? + +Increase the horizontal space between legend keys with `legend.spacing.x` in `theme()`. +This argument takes a unit object created with `grid::unit()`. + +
+ +See example + +If you have a horizontal legend, generally placed on top or bottom of the plot with `legend.position = "top"` or `"bottom"`, you can change the spacing between legend keys with `legend.spacing.x`. +You can supply a unit object to this argument, e.g. `unit(1.0, "cm")` for 1 cm space between legend keys. +See the documentation for `grid::unit()` for more options for units. + +```{r} +ggplot(mpg, aes(x = hwy, y = cty, color = drv)) + + geom_point() + + theme( + legend.position = "bottom", + legend.spacing.x = unit(1.0, "cm") + ) +``` + +For vertical legends changing `legend.spacing.y` changes the space between the legend title and the keys, but not between the keys, e.g. see the large space between the legend title and keys. + +```{r} +ggplot(mpg, aes(x = hwy, y = cty, color = drv)) + + geom_point() + + theme(legend.spacing.y = unit(3.0, "cm")) +``` + +In order to change the space between the legend keys, you can first make the key size bigger with `legend.key.size` and then remove the grey background color with `legend.key`. + +```{r} +ggplot(mpg, aes(x = hwy, y = cty, color = drv)) + + geom_point() + + theme( + legend.key.size = unit(1.5, "cm"), + legend.key = element_rect(color = NA, fill = NA) + ) +``` + +Note that the legend title is no longer aligned with the keys with this approach. +You can also shift it over with `legend.title.align`. + +```{r} +ggplot(mpg, aes(x = hwy, y = cty, color = drv)) + + geom_point() + + theme( + legend.key.size = unit(1.5, "cm"), + legend.key = element_rect(color = NA, fill = NA), + legend.title.align = 0.5 + ) +``` + +
+ +### How can I change the key labels in the legend? + +If you don't want to change the levels of the variable the legend is being drawn for, you can change the key labels at the time of drawing the plot using the `labels` argument in the appropriate `scale_*()` function, e.g. `scale_colour_discrete()` if the legend is for a discrete variable mapped to the fill aesthetic. + +
+ +See example + +The `labels` argument of `scale_*` functions takes named vectors, which what we would recommend using for relabeling keys in a legend. +Using named lists allows you to declare explicitly which label is assigned to which level, without having to keep track of level order. + +```{r} +ggplot(mpg, aes(x = hwy, y = cty, color = drv)) + + geom_point() + + scale_color_discrete( + labels = c("4" = "4-wheel drive", + "f" = "Front-wheel drive", + "r" = "Rear-wheel drive") + ) +``` + +
+ +### How can I change the font sizes in the legend? + +Set your preference in `legend.text` for key labels and `legend.title` in `theme()`. +In both cases, set font size in the `size` argument of `element_text()`, e.g. `legend.text = element_text(size = 14)`. + +
+ +See example + +Font characteristics of a legend can be controlled with the `legend.text` and `legend.title` elements of `theme()`. +You can use the following for 14 pts text for legend key labels and 10 pts text for legend title. +(Note that this doesn't result in a visually pleasing legend, by default ggplot2 uses a larger font size for the legend title than the legend text.) + +```{r} +ggplot(mpg, aes(x = hwy, y = cty, color = class)) + + geom_point() + + theme( + legend.text = element_text(size = 14), + legend.title = element_text(size = 10) + ) +``` + +For further customization of legend text, see the documentation for `element_text()`, e.g. you can change font colors or font face as well. + +```{r} +ggplot(mpg, aes(x = hwy, y = cty, color = class)) + + geom_point() + + theme( + legend.text = element_text(size = 14, color = "red"), + legend.title = element_text(size = 10, face = "bold.italic") + ) +``` + +
+ +## Colours + +### How can I change the background colour of plot? + +Set the color in `panel.background` element of `theme()` with `element_rect()`, which takes arguments like `fill` (for background fill color) and `colour` (for background border color. + +
+ +See example + +You can set the background colour of the plot with `panel.backgroun` in `theme()`. +In the following example the border is made thicker with `size = 3` to + +```{r} +ggplot(mpg, aes(x = hwy, y = cty)) + + geom_point() + + theme(panel.background = element_rect(fill = "lightblue", colour = "red", size = 3)) +``` + +If you want to change the colour of the plotting area but not the panel where the panel, you can so the same thing with `plot.background`. + +```{r} +ggplot(mpg, aes(x = hwy, y = cty)) + + geom_point() + + theme(plot.background = element_rect(fill = "lightblue", colour = "red", size = 3)) +``` + +Note that ggplot2 has a variety of [complete themes](https://ggplot2.tidyverse.org/reference/ggtheme.html) that might already do what you're hoping to accomplish. +For example, if you prefer a more minimal look to your plots, without the grey background, you might try `theme_minimal()`. + +```{r} +ggplot(mpg, aes(x = hwy, y = cty)) + + geom_point() + + theme_minimal() +``` + +And you can continue customization based on one of these themes. + +```{r} +ggplot(mpg, aes(x = hwy, y = cty)) + + geom_point() + + theme_minimal() + + theme(plot.background = element_rect(colour = "red", size = 3)) +``` + +You might also find the [**thematic**](https://rstudio.github.io/thematic/) package useful for simplified theming of your plots. + +
+ +### How can I change the colour NAs are represented with in a plot? + +You can set the color of `NA` with the `na.value` argument in the appropriate `scale_*()` function, e.g. `scale_fill_discrete(na.value = "purple")` to make `NA`s purple. + +
+ +See example + +Suppose you have the following data frame with two discrete variables, one of which has an `NA`. + +```{r} +df <- tibble::tribble( + ~group, ~outcome, + 1, "yes", + 1, "no", + 2, "yes", + 2, "no", + 2, "no", + 2, NA +) +``` + +By default, ggplot2 uses grey to represent `NA`s. + +```{r} +ggplot(df, aes(x = group, fill = outcome)) + + geom_bar() +``` + +You can change the color of `NA` with `scale_fill_discrete()` in this case, e.g. make it purple. + +```{r} +ggplot(df, aes(x = group, fill = outcome)) + + geom_bar() + + scale_fill_discrete(na.value = "purple") +``` + +You can also set the color to `"transparent"`. +In the plot below this is shown with `theme_minimal()` to demonstrate how that looks on a plot with a transparent background. +Note that while this is possible, setting the colour to transparent as such wouldn't be recommended in this particular case as it gives the appearance of a floating bar. + +```{r} +ggplot(df, aes(x = group, fill = outcome)) + + geom_bar() + + scale_fill_discrete(na.value = "transparent") + + theme_minimal() +``` + +
+ +## Fonts + +### How can I change the default font size in ggplot2? + +Set `base_size` in the theme you're using, which is `theme_gray()` by default. + +
+ +See example + +The base font size is 11 pts by default. +You can change it with the `base_size` argument in the theme you're using. +See the [complete theme documentation](https://ggplot2.tidyverse.org/reference/ggtheme.html) for more high level options you can set. + +```{r} +ggplot(mpg, aes(x = hwy, y = cty, color = class)) + + geom_point() + + theme_gray(base_size = 18) +``` + +If you would like all plots within a session/document to use a particular base size, you can set it with `theme_set()`. +Run the following at the beginning of your session or include on top of your R Markdown document. + +```{r eval = FALSE} +theme_set(theme_gray(base_size = 18)) +``` + +
+ +### How can I change the font size of the plot title and subtitle? + +Set your preference in `plot.title` and `plot.subtitle` in `theme()`. +In both cases, set font size in the `size` argument of `element_text()`, e.g. `plot.title = element_text(size = 20)`. + +
+ +See example + +Font characteristics of plot titles and subtitles can be controlled with the `plot.title` and `plot.subtitle` elements of `theme()`. +You can use the following for 20 pts text for the plot title and 15 pts text for the plot subtitle. + +```{r} +ggplot(mpg, aes(x = hwy, y = cty)) + + geom_point() + + labs( + title = "This is the plot title", + subtitle = "And this is the subtitle" + ) + + theme( + plot.title = element_text(size = 20), + plot.subtitle = element_text(size = 15) + ) +``` + +For further customization of plot title and subtitle, see the documentation for `element_text()`, e.g. you can change font colors or font face as well. + +```{r} +ggplot(mpg, aes(x = hwy, y = cty)) + + geom_point() + + labs( + title = "This is the plot title", + subtitle = "And this is the subtitle" + ) + + theme( + plot.title = element_text(size = 20, color = "red"), + plot.subtitle = element_text(size = 15, face = "bold.italic") + ) +``` + +
+ +### How can I change the font size of axis labels? + +Set your preference in `axis.title`. +`axis.title.x`, or `axis.title.y` in `theme()`. +In both cases, set font size in the `size` argument of `element_text()`, e.g. `axis.text = element_text(size = 14)`. + +
+ +See example + +Font characteristics of axis labels can be controlled with `axis.title.x` or `axis.title.y` (or `axis.title` if you the same settings for both axes). + +```{r} +ggplot(mpg, aes(x = hwy, y = cty)) + + geom_point() + + labs( + x = "This is HUGE", + y = "This is small" + ) + + theme( + axis.title.x = element_text(size = 20), + axis.title.y = element_text(size = 10) + ) +``` + +For further customization of plot title and subtitle, see the documentation for `element_text()`, e.g. you can change font colors or font face as well. + +```{r} +ggplot(mpg, aes(x = hwy, y = cty)) + + geom_point() + + labs( + x = "This is HUGE", + y = "This is tiny" + ) + + theme( + axis.title.x = element_text(size = 20, color = "red"), + axis.title.y = element_text(size = 10, face = "bold.italic") + ) +``` + +You can also change the size of the axis text (e.g. numbers at the axis ticks) using `axis.text` (or `axis.text.x` and `axis.text.y` if you want to set different sizes). + +```{r} +ggplot(mpg, aes(x = hwy, y = cty)) + + geom_point() + + labs( + x = "The axis labels are the same size", + y = "The axis labels are the same size" + ) + + theme( + axis.title = element_text(size = 16), + axis.text = element_text(size = 20, color = "blue") + ) +``` + +
+ +### What is the default size of `geom_text()` and how can I change the font size of `geom_text()`? + +The default font size of `geom_text()` is 3.88. + +```{r} +GeomLabel$default_aes$size +``` + +You can change the size using the `size` argument in `geom_text()` for a single plot. If you want to use the same updated size, you can set this with `update_geom_defaults()`, e.g. `update_geom_defaults("text", list(size = 6))`. + +
+ +See example + +Suppose you have the following data frame and visualization. + +```{r} +df <- tibble::tribble( + ~x, ~y, ~name, + 2, 2, "two", + 3, 3, "three", + 4, 4, "four" +) + +ggplot(df, aes(x = x, y = y, label = name)) + + geom_text() +``` + +You can set the size of the text with the following. + +```{r} +ggplot(df, aes(x = x, y = y, label = name)) + + geom_text(size = 6) +``` + +Or you can map it to the `size` `aes`thetic. In the following size is determined by the `x` value with `scale_size_identity()`. + +```{r} +ggplot(df, aes(x = x, y = y, label = name)) + + geom_text(aes(size = x)) + + scale_size_identity() +``` + +If you want to use the same updated size for `geom_text()` in a series of plots in a session/R Markdown document, you can set use `update_geom_defaults()` to update the default size, e.g. if you want the size for all `geom_text()` to be 6, use `update_geom_defaults("text", list(size = 6))`. + +
diff --git a/vignettes/articles/faq-faceting.Rmd b/vignettes/articles/faq-faceting.Rmd new file mode 100644 index 0000000000..d84d1da19a --- /dev/null +++ b/vignettes/articles/faq-faceting.Rmd @@ -0,0 +1,268 @@ +--- +title: "FAQ: Faceting" +--- + +```{=html} + +``` +```{r, include = FALSE} +library(ggplot2) +knitr::opts_chunk$set( + fig.dpi = 300, + collapse = TRUE, + comment = "#>", + fig.asp = 0.618, + fig.width = 6, + out.width = "80%") +``` + +## Panes + +### What is the difference between `facet_wrap()` and `facet_grid()`? + +The simplest answer is that you should use `facet_wrap()` when faceting by a single variable and `facet_grid()` when faceting by two variables and want to create a grid of panes. + +
+ +See example + +`facet_wrap()` is most commonly used to facet by a plot by a single categorical variable. + +```{r} +ggplot(mpg, aes(x = cty)) + + geom_histogram() + + facet_wrap(~ drv) +``` + +And `facet_grid()` is commonly used to facet by a plot by two categorical variables. + +```{r} +ggplot(mpg, aes(x = cty)) + + geom_histogram() + + facet_grid(cyl ~ drv) +``` + +Notice that this results in some empty panes (e.g. 4-wheel drive and 5 cylinders) as there are no cars in the `mpg` dataset that fall into such categories. + +You can also use `facet_wrap()` with to facet by two categorical variables. +This will only create facets for combinations of the levels of variables for which data exists. + +```{r} +ggplot(mpg, aes(x = cty)) + + geom_histogram() + + facet_wrap(cyl ~ drv) +``` + +In `facet_wrap()` you can control the number of rows and/or columns of the resulting plot layout using the `nrow` and `ncol` arguments, respectively. +In `facet_grid()` these values are determined by the number of levels of the variables you're faceting by. + +Similarly, you can also use `facet_grid()` to facet by a single categorical variable as well. +In the formula notation, you use a `.` to indicate that no faceting should be done along that axis, i.e. `cyl ~ .` facets across the y-axis (within a column) while `. ~ cyl` facets across the x-axis (within a row). + +```{r out.width = "50%"} +ggplot(mpg, aes(x = cty)) + + geom_histogram() + + facet_grid(cyl ~ .) + +ggplot(mpg, aes(x = cty)) + + geom_histogram() + + facet_grid(. ~ cyl) +``` + +
+ +### How can I place a vertical lines (`geom_vline()`) in each pane of a faceted plot? + +First, calculate where the lines should be placed and save this information in a separate data frame. +Then, add a `geom_vline()` layer to your plot that uses the summarized data. + +
+ +See example + +Suppose you have the following plot, and you want to add a vertical line at the mean value of `hwy` (highway mileage) for each pane. + +```{r} +ggplot(mpg, aes(x = hwy)) + + geom_histogram(binwidth = 5) + + facet_wrap(~ drv) +``` + +First, calculate these means and save them in a new data frame. + +```{r} +library(dplyr) + +mpg_summary <- mpg %>% + group_by(drv) %>% + summarise(hwy_mean = mean(hwy)) + +mpg_summary +``` + +Then, add a `geom_vline()` layer to your plot that uses the summary data. + +```{r} +ggplot(mpg, aes(x = hwy)) + + geom_histogram(binwidth = 5) + + facet_wrap(~ drv) + + geom_vline(data = mpg_summary, aes(xintercept = hwy_mean)) +``` + +
+ +## Axes + +### How can I set individual axis limits for facets? + +Either let ggplot2 determine custom axis limits for the facets based on the range of the data you're plotting using the `scales` argument in `facet_wrap()` or `facet_grid()` or, if that is not sufficient, use `expand_limits()` to ensure limits include a single value or a range of values. + +
+ +See example + +Suppose you have the following faceted plot. +By default, both x and y scales are shared across the facets. + +```{r} +ggplot(mpg, aes(x = cty, y = hwy)) + + geom_point() + + facet_grid(cyl ~ drv) +``` + +You can control this behaviour with the `scales` argument of faceting functions: varying scales across rows (`"free_x"`), columns (`"free_y"`), or both rows and columns (`"free"`), e.g. + +```{r} +ggplot(mpg, aes(x = cty, y = hwy)) + + geom_point() + + facet_grid(cyl ~ drv, scales = "free") +``` + +If you also want to make sure that a particular value or range is included in each of the facets, you can set this with `expand_limits()`, e.g. ensure that 10 is included in the x-axis and values between 20 to 25 are included in the y-axis: + +```{r} +ggplot(mpg, aes(x = cty, y = hwy)) + + geom_point() + + facet_grid(cyl ~ drv, scales = "free") + + expand_limits(x = 10, y = c(20, 25)) +``` + +
+ +## Facet labels + +### How can I remove the facet labels entirely? + +Set the `strip.text` element in `theme()` to `element_blank()`. + +
+ +See example + +Setting `strip.text` to `element_blank()` will remove all facet labels. + +```{r} +ggplot(mpg, aes(x = cty, y = hwy)) + + geom_point() + + facet_grid(cyl ~ drv) + + theme(strip.text = element_blank()) +``` + +You can also remove the labels across rows only with `strip.x.text` or across columns only with `strip.y.text`. + +```{r} +ggplot(mpg, aes(x = cty, y = hwy)) + + geom_point() + + facet_grid(cyl ~ drv) + + theme(strip.text.x = element_blank()) +``` + +
+ +### The facet labels in my plot are too long so they get cut off. How can I wrap facet label text so that long labels are spread across two rows? + +Use `label_wrap_gen()` in the `labeller` argument of your faceting function and set a `width` (number of characters) for the maximum number of characters before wrapping the strip. + +
+ +See example + +In the data frame below we have 100 observations, 50 of them come from one group and 50 from another. +These groups have very long names, and so when you facet the ploy by group, the facet labels (strips) get cut off. + +```{r} +df <- data.frame( + x = rnorm(100), + group = c(rep("A long group name for the first group", 50), + rep("A muuuuuuuuuuuuuch longer group name for the second group", 50)) +) + +ggplot(df, aes(x = x)) + + geom_histogram(binwidth = 0.5) + + facet_wrap(~ group) +``` + +You can control the maximum width of the facet label by setting the `width` in the `label_wrap_gen()` function, which is then passed to the `labeller` argument of your faceting function. + +```{r} +ggplot(df, aes(x = x)) + + geom_histogram(binwidth = 0.5) + + facet_wrap(~ group, labeller = labeller(group = label_wrap_gen(width = 25))) +``` + +
+ +### How can I set different axis labels for facets? + +Use `as_labeller()` in the `labeller` argument of your faceting function and then set `strip.background` and `strip.placement` elements in the `theme()` to place the facet labels where axis labels would go. +This is a particularly useful solution for plotting data on different scales without the use of double y-axes. + +
+ +See example + +Suppose you have data price data on a given item over a few years from two countries with very different currency scales. + +```{r} +df <- data.frame( + year = rep(2016:2021, 2), + price = c(10, 10, 13, 12, 14, 15, 1000, 1010, 1200, 1050, 1105, 1300), + country = c(rep("US", 6), rep("Japan", 6)) +) + +df +``` + +You can plot `price` versus `time` and facet by `country`, but the resulting plot can be a bit difficult to read due to the shared y-axis label. + +```{r warning = FALSE} +ggplot(df, aes(x = year, y = price)) + + geom_smooth() + + facet_wrap(~ country, ncol = 1, scales = "free_y") + + scale_x_continuous(breaks = 2011:2020) +``` + +With the following you can customize the facet labels first with `as_labeller()`, turn off the default y-axis label, and then place the facet labels where the y-axis label goes (`"outside"` and on the `"left"`). + +```{r} +ggplot(df, aes(x = year, y = price)) + + geom_smooth() + + facet_wrap(~ country, ncol = 1, scales = "free_y", + labeller = as_labeller( + c(US = "US Dollars (USD)", Japan = "Japanese Yens (JPY)")), + strip.position = "left" + ) + + scale_x_continuous(breaks = 2011:2020) + + labs(y = NULL) + + theme(strip.background = element_blank(), strip.placement = "outside") +``` + +
diff --git a/vignettes/articles/faq-reordering.Rmd b/vignettes/articles/faq-reordering.Rmd new file mode 100644 index 0000000000..fc2a9a4a38 --- /dev/null +++ b/vignettes/articles/faq-reordering.Rmd @@ -0,0 +1,220 @@ +--- +title: "FAQ: Reordering" +--- + +```{=html} + +``` +```{r, include = FALSE} +library(ggplot2) +library(dplyr) +library(tibble) + +knitr::opts_chunk$set( + fig.dpi = 300, + collapse = TRUE, + comment = "#>", + fig.asp = 0.618, + fig.width = 6, + out.width = "80%" + ) +``` + +## Bar plots + +### How can I reorder the bars in a bar plot by their value? + +Change the order of the levels of the factor variable you're creating the bar plot for in the `aes`thetic `mapping`. +The forcats package offers a variety of options for doing this, such as `forcats::fct_infreq()` for ordering by the number of observations within each level. + +
+ +See example + +The following bar plot shows the number of cars that fall into each `class` category. +Classes are ordered alphabetically. +You might prefer them to be ordered by the number of cars in each class. + +```{r} +ggplot(mpg, aes(y = class)) + + geom_bar() +``` + +To do this, you can use `forcats::fct_infreq()`. + +```{r} +ggplot(mpg, aes(y = forcats::fct_infreq(class))) + + geom_bar() +``` + +If you'd like to plot the highest value first, you can also reverse the order with `forcats::fct_rev()`. You might also want to simplify the axis label. + +```{r} +ggplot(mpg, aes(y = forcats::fct_rev(forcats::fct_infreq(class)))) + + geom_bar() + + labs(y = "class") +``` + +
+ +### How can I reorder the stacks in a stacked bar plot? + +Change the order of the levels of the factor variable you're creating the stacks with in the `aes`thetic `mapping`. +The forcats package offers a variety of options for doing this, such as `forcats::fct_reorder()` to reorder the levels or `forcats::fct_rev()` to reverse their order. + +
+ +See example + +Suppose you have the following stacked bar plot of `clarity` of `diamonds` by their `cut`. + +```{r} +ggplot(diamonds, aes(x = cut, fill = clarity)) + + geom_bar() +``` + +You can revers the order `clarity` levels are displayed in the bars with `forcats::fct_rev()`. +This will also change the order they're presented in the legend so the two orders match. + +```{r} +ggplot(diamonds, aes(x = cut, fill = forcats::fct_rev(clarity))) + + geom_bar() + + labs(fill = "clarity") +``` + +
+ +## Box plots + +### How can I control the order of boxes in a side-by-side box plot? + +Change the order of the levels of the factor variable you're faceting by. +The forcats package offers a variety of options for doing this, such as `forcats::fct_relevel()` for manual reordering or `forcats::fct_reorder()` for ordering by a particular value, e.g. group median. + +
+ +See example + +The order of the boxes is determined by the order of the levels of the variable you're grouping by. +If the faceting variable is character, this order is alphabetical by default. + +```{r} +ggplot(mpg, aes(x = class, y = hwy)) + + geom_boxplot() +``` + +Suppose you'd like the boxes to be ordered in ascending order of their medians. +You can do this in a data transformation step prior to plotting (e.g. with `dplyr::mutate()`) or you can do it directly in the plotting code as shown below. +You might then want to customize the x-axis label as well. + +```{r} +ggplot(mpg, aes(x = forcats::fct_reorder(class, hwy, .fun = median), y = hwy)) + + geom_boxplot() + + labs(x = "class") +``` + +
+ +## Facets + +### How can I control the order of panes created with `facet_wrap()` or `facet_grid()`? + +Change the order of the levels of the factor variable you're faceting by. +The forcats package offers a variety of options for doing this, such as `forcats::fct_relevel()`. + +
+ +See example + +The order of the panes is determined by the order of the levels of the variable you're faceting by. +If the faceting variable is character, this order is alphabetical by default. + +```{r} +ggplot(mpg, aes(x = displ, y = hwy)) + + geom_point() + + facet_wrap(~drv) +``` + +Suppose you'd like the panes to be in the order `"r"`, `"f"` , `"4"`. +You can use `forcats::fct_relevel()` to reorder the levels of `drv`. +You can do this in a data transformation step prior to plotting (e.g. with `dplyr::mutate()`) or you can do it directly in the plotting code as shown below. + +```{r} +ggplot(mpg, aes(x = displ, y = hwy)) + + geom_point() + + facet_wrap(~forcats::fct_relevel(drv, "r", "f", "4")) +``` + +
+ +## Overplotting + +### How can I control the order of the points plotted? + +If there is a specific point (or group of points) you want to make sure is plotted on top of others, subset the data for those observations and add as a new layer to your plot. + +
+ +See example + +Suppose you have the following data frame. + +```{r} +df <- tibble::tribble( + ~id, ~x, ~y, ~shape, ~fill, + 1, 0.01, 0, "circle filled", "blue", + 2, 1, 0, "square filled", "red", + 3, 0.99, 0, "asterisk", "black", + 4, 0, 0, "triangle filled", "yellow" +) +``` + +By default, this is how a scatterplot of these looks. +Note that the blue circle is partially covered by the yellow triangle since that observation comes later in the dataset. +Similarly the black asterisk appears on top of the red square. + +```{r} +ggplot(df, aes(x = x, y = y, fill = fill, shape = shape)) + + geom_point(size = 8) + + scale_shape_identity() + + scale_fill_identity() +``` + +Suppose you arranged your data in ascending order of the x-coordinates and plotted again. +Now the blue circle is over the yellow triangle since 0.01 comes after 0 and similarly the red square is over the black asterisk since 1 comes after 0.99. + +```{r} +df_arranged <- df %>% dplyr::arrange(x) + +df_arranged %>% + ggplot(aes(x = x, y = y, fill = fill, shape = shape)) + + geom_point(size = 8) + + scale_shape_identity() + + scale_fill_identity() +``` + +If you wanted to make sure that the observation identified with an asterisk is always plotted on top, regardless of how the data are arranged in the data frame, you can create an additional layer for that observation. + +```{r} +ggplot(mapping = aes(x = x, y = y, fill = fill, shape = shape)) + + geom_point(data = df %>% filter(shape != "asterisk"), size = 8) + + geom_point(data = df %>% filter(shape == "asterisk"), size = 8) + + scale_shape_identity() + + scale_fill_identity() + +ggplot(mapping = aes(x = x, y = y, fill = fill, shape = shape)) + + geom_point(data = df_arranged %>% filter(shape != "asterisk"), size = 8) + + geom_point(data = df_arranged %>% filter(shape == "asterisk"), size = 8) + + scale_shape_identity() + + scale_fill_identity() +``` + +
diff --git a/vignettes/extending-ggplot2.Rmd b/vignettes/extending-ggplot2.Rmd index 90583f2451..6bb38183c0 100644 --- a/vignettes/extending-ggplot2.Rmd +++ b/vignettes/extending-ggplot2.Rmd @@ -1,6 +1,8 @@ --- title: "Extending ggplot2" output: rmarkdown::html_vignette +description: | + Official extension mechanism provided in ggplot2. vignette: > %\VignetteIndexEntry{Extending ggplot2} %\VignetteEngine{knitr::rmarkdown} diff --git a/vignettes/ggplot2-in-packages.Rmd b/vignettes/ggplot2-in-packages.Rmd index 20a17adde9..5d10efb7cd 100644 --- a/vignettes/ggplot2-in-packages.Rmd +++ b/vignettes/ggplot2-in-packages.Rmd @@ -1,6 +1,8 @@ --- title: "Using ggplot2 in packages" output: rmarkdown::html_vignette +description: | + Customising how aesthetic specifications are represented on your plot. vignette: > %\VignetteIndexEntry{Using ggplot2 in packages} %\VignetteEngine{knitr::rmarkdown} diff --git a/vignettes/ggplot2-specs.Rmd b/vignettes/ggplot2-specs.Rmd index e65a6d61c2..fa6d235d01 100644 --- a/vignettes/ggplot2-specs.Rmd +++ b/vignettes/ggplot2-specs.Rmd @@ -1,6 +1,8 @@ --- title: "Aesthetic specifications" output: rmarkdown::html_vignette +description: | + Customising how aesthetic specifications are represented on your plot. vignette: > %\VignetteIndexEntry{Aesthetic specifications} %\VignetteEngine{knitr::rmarkdown} diff --git a/vignettes/profiling.Rmd b/vignettes/profiling.Rmd index 57e085e8a5..a0a77340df 100644 --- a/vignettes/profiling.Rmd +++ b/vignettes/profiling.Rmd @@ -2,6 +2,8 @@ title: "Profiling Performance" author: "Thomas Lin Pedersen" output: rmarkdown::html_vignette +description: | + Monitoring the performance of your plots. vignette: > %\VignetteIndexEntry{Profiling Performance} %\VignetteEngine{knitr::rmarkdown}