Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: pillar.round = FALSE #97

Closed
charliejhadley opened this issue Jan 31, 2018 · 22 comments · Fixed by #359
Closed

Feature Request: pillar.round = FALSE #97

charliejhadley opened this issue Jan 31, 2018 · 22 comments · Fixed by #359
Assignees
Milestone

Comments

@charliejhadley
Copy link

charliejhadley commented Jan 31, 2018

It would be beneficial to have an option pillar.round to control the rounding of numbers that straddle 0 - 1.

In the following example the straddle column has been subject to rounding as pillar.sigfig = 3 by default:

library("tidyverse")
#> ── Attaching packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 2.2.1     ✔ purrr   0.2.4
#> ✔ tibble  1.4.2     ✔ dplyr   0.7.4
#> ✔ tidyr   0.8.0     ✔ stringr 1.2.0
#> ✔ readr   1.1.1     ✔ forcats 0.2.0
#> ── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
my_numbers <-  c(233, 486, 565, 785)
straddle_data <- tibble(
  big = 1000 * my_numbers + 23,
  stradle = my_numbers / 1000 + 100,
  small = my_numbers / 100000
)
straddle_data
#> # A tibble: 4 x 3
#>      big straddle   small
#>    <dbl>   <dbl>   <dbl>
#> 1 233023     100 0.00233
#> 2 486023     100 0.00486
#> 3 565023     101 0.00565
#> 4 785023     101 0.00785

In order to display the fractional component of straddle straddle we need to massively pad the small column with pillar.sigfig = 7

options(pillar.sigfig = 7)
straddle_data
#> # A tibble: 4 x 3
#>      big straddle       small
#>    <dbl>    <dbl>       <dbl>
#> 1 233023  100.233 0.002330000
#> 2 486023  100.486 0.004860000
#> 3 565023  100.565 0.005650000
#> 4 785023  100.785 0.007850000

It would be useful for the following to work (note the use of *233* to indicate the portion of the output that is highlighted thanks to pillar.subtle = TRUE)

options(pillar.round = FALSE)
straddle_data
#> # A tibble: 4 x 3
#>      big straddle       small
#>    <dbl>    <dbl>   <dbl>
#> 1 *233*023      *100*.233 0.00*233*
#> 2 *486*023      *100*.486 0.00*486*
#> 3 *565*023      *100*.565 0.00*565*
#> 4 *785*023      *100*.785 0.00*785*

I appreciate the comment from @hadley here #40 (comment) about using a trailing . but I think this is a different problem to what was discussed in that issue.

@krlmlr
Copy link
Member

krlmlr commented Feb 7, 2018

Thanks. I agree we need to think more about how to represent numbers of the same magnitude but with subtle differences.

@kabuhr
Copy link

kabuhr commented Apr 25, 2018

+1 on this feature. I understand the motivation for showing only significant digits, and I'd even be prepared to agree that this makes sense when the emphasis is on analysis of measured, physical data. However, when we're working with certain types of data (currency values) or when we're working on a pipeline of functions to perform some data manipulation (rather than just an "analysis") and trying to verify that the logic is working correctly at each step, this default of aggressive, three-sigdig rounding is both surprising and confusing.

This is a case where vanilla R basically got it right. The logic in print.default() makes sense, rarely produces surprising results, and only occasionally produces ugly output. From the help page:

The same number of decimal places is used throughout a vector. This means that [the global option] ‘digits’ specifies the minimum number of significant digits to be used, and that at least one entry will be encoded with that minimum number. However, if all the encoded elements then have trailing zeroes, the number of decimal places is reduced until at least one element has a non-zero final digit. Decimal points are only included if at least one decimal place is selected.

I would suggest that the proposed pillar.round=FALSE should switch to the vanilla R method, using the global options("digits") option (with its relatively high default of 7) together with the same logic quoted above. Combined with pillar.subtle=TRUE to de-emphasize trailing digits beyond pillar.sigfig, this would be just about perfect, combining the "no surprises" approach from print.default() with a gentle "hey that 5th significant digit probably shouldn't be included in your final results" reminder.

In fact, if I had may way, this would be the default behavior, though I understand why some people would prefer the current default with its pre-rounded answers.

@StephieLaPugh
Copy link

Curious as to the status of this. I've been struggling all morning trying to make a summary tibble display a consistent number of decimal places (i.e., simply match the data as read in from Excel with two decimals). I am working with a group of consultants who are ride-or-die Excel folks, and if I can't prove to them that R (+tidyverse) will make their life better rather than more confusing and complicated, I will struggle to get traction. This is an example of something that is effortless in Excel ("I want to see these numbers with three decimals" click, click done), but that has now taken me an hour to learn that other than changing pillar.sigfig=x -- which is not really what I want -- I can't even do it.

My only option as I see it is to produce a gt (or similar) table -- a whole other step and chunk of code -- just to control something that is a click or two in Excel. Now when a colleague asks me, "why is it rounding this column but not those?" "can you make it match the spreadsheet?" and I have to say, "no, we don't get to do that, because... something." I will get a snort and a pat on the head and they will go back to their rectangles. Which makes me sad. Rounding, please!

@yimingli
Copy link

yimingli commented Jul 3, 2019

Spent an hour trying to understand where all the decimal points are gone, before the trailing . caught my eyes, and googled what it means. Maybe we can at least show some decimal points in RStudio's table view, where column width are adjustable by dragging left and right?

@krlmlr krlmlr added this to the 1.5.0 milestone Jul 8, 2020
@DasHammett

This comment has been minimized.

@rsuhada

This comment has been minimized.

@krlmlr krlmlr modified the milestones: 1.5.0, 1.5.1 Feb 19, 2021
@krlmlr krlmlr modified the milestones: 1.5.1, 1.5.2 Mar 17, 2021
@krlmlr

This comment has been minimized.

@krlmlr krlmlr modified the milestones: 1.5.2, 1.6.1 Apr 11, 2021
@rsuhada

This comment has been minimized.

@DasHammett

This comment has been minimized.

@krlmlr

This comment has been minimized.

@krlmlr

This comment has been minimized.

@DasHammett

This comment has been minimized.

@krlmlr

This comment has been minimized.

@hadley

This comment has been minimized.

@DasHammett

This comment has been minimized.

@hadley

This comment has been minimized.

@DasHammett

This comment has been minimized.

@krlmlr
Copy link
Member

krlmlr commented Apr 13, 2021

Thanks for the clarification. It looks like many comments in this thread are discussing the difference between vector printing in pillar and in base R: we focus on significant digits, base R always shows

The same number of decimal places ... throughout a vector.

I opened #312 to discuss the issue of fixed decimal digits, and hid the most recent comments.

The original post here is about a different problem: show the details in the straddle column without increasing the significant figures.

my_numbers <- c(233, 486, 565, 785)
df <- data.frame(
  big = 1000 * my_numbers + 23,
  straddle = my_numbers / 1000 + 100,
  small = my_numbers / 100000
)
tbl <- tibble::as_tibble(df)

options(pillar.sigfig = 7)
df
#>      big straddle   small
#> 1 233023  100.233 0.00233
#> 2 486023  100.486 0.00486
#> 3 565023  100.565 0.00565
#> 4 785023  100.785 0.00785
tbl
#> # A tibble: 4 x 3
#>      big straddle   small
#>    <dbl>    <dbl>   <dbl>
#> 1 233023  100.233 0.00233
#> 2 486023  100.486 0.00486
#> 3 565023  100.565 0.00565
#> 4 785023  100.785 0.00785

options(pillar.sigfig = NULL)
options(digits = 3)
df
#>      big straddle   small
#> 1 233023      100 0.00233
#> 2 486023      100 0.00486
#> 3 565023      101 0.00565
#> 4 785023      101 0.00785
tbl
#> # A tibble: 4 x 3
#>      big straddle   small
#>    <dbl>    <dbl>   <dbl>
#> 1 233023     100. 0.00233
#> 2 486023     100. 0.00486
#> 3 565023     101. 0.00565
#> 4 785023     101. 0.00785

Created on 2021-04-12 by the reprex package (v1.0.0)

@krlmlr
Copy link
Member

krlmlr commented Jul 26, 2021

It looks like the following approach might work:

  • We look at the data and compute the number of extra significant digits needed to show the relevant details
  • We add that number to our sigfig value
  • This could be a new argument to num(), and perhaps become the default at some point if it's useful

The following reprex proposes an implementation. We're sorting the data, looking at adjacent differences and determine the magnitude of the difference relative to the value. The decimal logarithm translates this to a number of extra significant digits.

Does that make sense? Can you think about cases where extra_digits() doesn't work as intended?

library(pillar)

extra_digits <- function(x) {
  x <- sort(abs(x))
  delta <- diff(x)
  x <- x[-1]

  keep <- which((delta != 0) & is.finite(delta))
  if (length(keep) == 0) {
    return(0)
  }

  x <- x[keep]
  delta <- delta[keep]

  ceiling(log10(max(x / delta)))
}

num_with_extra_digits <- function(x) {
  num(x, sigfig = 3 + extra_digits(x))
}

my_numbers <- c(233, 486, 565, 785)

tibble::tibble(
  big = num_with_extra_digits(1000 * my_numbers + 23),
  straddle = num_with_extra_digits(my_numbers / 1000 + 100),
  straddle2 = num_with_extra_digits(my_numbers / 1000 + 1000),
  straddle3 = num_with_extra_digits(my_numbers / 10000 + 10000),
  small = num_with_extra_digits(my_numbers / 100000)
)
#> # A tibble: 4 × 5
#>       big straddle straddle2  straddle3   small
#>   <num:4>  <num:7>   <num:8>   <num:10> <num:4>
#> 1  233023  100.233  1000.233 10000.0233 0.00233
#> 2  486023  100.486  1000.486 10000.0486 0.00486
#> 3  565023  100.565  1000.565 10000.0565 0.00565
#> 4  785023  100.785  1000.785 10000.0785 0.00785

Created on 2021-07-26 by the reprex package (v2.0.0.9000)

@charliejhadley
Copy link
Author

Hey @krlmlr as I originally opened this issue I wanted to answer your question - other folks who've commented may have additional comments.

This solves my needs. I'll think about cases where it doesn't work as expected.

I see that num() is currently lifecycle::signal_experimental() but there is a vignette for it. I wonder how user's would find this functionality? Would it be expensive/too intrusive to generate a message for users the first time (per session) that the 100. formatting is used by {pillar}?

@krlmlr
Copy link
Member

krlmlr commented Jul 27, 2021

Thanks. I checked the documentation, there is a path that leads from ?tibble via ?tbl_df-class, ?formatting, and ?num to ?pillar:num. I agree it's a bit long, happy to revisit in the tibble repository. Would you like to open an issue there?

In the meantime I'll extend num() here.

krlmlr added a commit that referenced this issue Jul 28, 2021
- `num()` gains `extra_sigfig` argument to automatically show more significant figures for numbers of the same magnitude with subtle differences (#97).
@github-actions
Copy link
Contributor

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

@github-actions github-actions bot locked and limited conversation to collaborators Jul 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants