Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trailing insignificant digits not printed? #40

Closed
krlmlr opened this issue Sep 4, 2017 · 23 comments
Closed

Trailing insignificant digits not printed? #40

krlmlr opened this issue Sep 4, 2017 · 23 comments

Comments

@krlmlr
Copy link
Member

krlmlr commented Sep 4, 2017

colformat::colformat(c(1000.34, 0.34567))
#>    <dbl>
#> 1000    
#>    0.346

@hadley: Is this intended?

@hadley
Copy link
Member

hadley commented Sep 5, 2017

Yes

@krlmlr
Copy link
Member Author

krlmlr commented Sep 5, 2017

Maybe we could print them if there's enough space?

@hadley
Copy link
Member

hadley commented Sep 5, 2017

It was a deliberate choice. Maybe it's worth rethinking, as it does seem a bit arbitrary to not display digits when space is available, and sigfigs are highlighted using colour so it's still scannable.

@dpeterson71
Copy link

I believe the numbers should definitely be displayed in full if there's space, or at least otherwise notify the user that they have been modified. Wasn't one of the founding principles of plyr (and thus the genesis of the tidyverse in general) to not surprise the user (i.e. provide output consistent with input)? If I enter 1000.34 in my data entry, I certainly don't expect to see "1000".

@hadley
Copy link
Member

hadley commented Jan 23, 2018

@dpeterson71 what do you expect sqrt(2) ^ 2 to print?

@dpeterson71
Copy link

In this case, sqrt(2)^2 should be just 2, as in base R. I would expect sqrt(2) to provide the precision I've requested by base-R's digits option. That's the crux of the problem, though, isn't it? The computer doesn't know a-priori whether I have entered in specific digits (or read them from a manually generated file) or computed something that could potentially be an irrational number.

If the computer is going to modify or change the data I have given it, it should at least have the courtesy to notify me that it has done so rather than blindly dropping information.

@hadley
Copy link
Member

hadley commented Jan 23, 2018

My point is that no floating point number is exact - I don't think it's unreasonable for tibble to not print .34 when it's only a small part of the value.

(BTW I don't like the principle of avoiding surprise; because different things surprise different people based on what they know)

@huftis
Copy link

huftis commented Jan 24, 2018

I don’t if my opnion is worth much, but FWIW, I too find the current behaviour very misleading. It makes it look like there are no non-zero decimals (up to the precision/width used). I’m OK with hiding trailing zeros (up to the precision used), but hiding trailing non-zeros is confusing.

The current behaviour is:

pillar::pillar(c(1000.34, 1000, 0.34567))
#>    <dbl>
#> 1000    
#> 1000    
#>    0.346

I would be happy with this being rendered as either

#>    <dbl>
#> 1000.34    
#> 1000    
#>    0.346

or

#>    <dbl>
#> 1000.340    
#> 1000.000    
#>    0.346

But perhaps dropping the decimals could be restricted to integers (defined as numbers x where x == round(x)), e.g.:

pillar::pillar(c(1000.34, 1000.000078, 1000, 0.34567))
#>    <dbl>
#> 1000.340    
#> 1000.000    
#> 1000    
#>    0.346

or (preferably?)

pillar::pillar(c(1000.34, 1000.000078, 1000, 0.34567))
#>    <dbl>
#> 1000.34    
#> 1000.
#> 1000    
#>    0.346

That is, omitting the . indicates real integers. Or, in other words, having a decimal point is the formatting function telling the user ‘there is something after the decimal point – even though I might not display it (due to lack of space/precision)’.

@hadley
Copy link
Member

hadley commented Jan 24, 2018

I like the idea of using a trailing . to indicate that there's more there

@dpeterson71
Copy link

@huftis has two very good suggestions, in my opinion. My personal preference would be the first example, where the entries for doubles are justified:

pillar::pillar(c(1000.34, 1000.000078, 1000, 0.34567))
  #>    <dbl>
  #> 1000.340    
  #> 1000.000    
  #> 1000    
  #>    0.346

The last option at least solves part of the issue where the careful observer might notice that the data has been modified by the subtle visual cue of a decimal point with missing digits. However, even though I could eventually learn to deal with that format, it is still harder to read and interpret with the uneven formatting and ragged edges. Our research group would never be allowed to present data that way in a public forum where readability and policy decisions matter.

@dpeterson71
Copy link

One last thought. Cleveland's seminal work on visualizing data led to many improvements in graphing parameters and paradigms. The excellent lattice and ggplot2 packages make use of many of his concepts. Similarly, Brewer's extensive work in cartography and color theory guides optimal use of color in visualizations. I wonder if there exists some cognitive research on effective presentation of tabular data? If not, perhaps there's something for data that's analogous to the Chicago or Oxford Manuals of Style that could guide default format choices?

@randomgambit
Copy link

in my opinion this is extremely dangerous. I mean, I could honestly lose my job if I think I have 100 in my dataframe whereas I have 100.2

Formatting and color are fun, but this is way beyond that.

@krlmlr
Copy link
Member Author

krlmlr commented Mar 1, 2018

How do you like the current output with the decimal dot always printed?

@randomgambit
Copy link

This is what I get when I dont specify any option

> pillar::pillar(c(1000.34, 1000.000078, 1000, 0.34567))
   <dbl>
1000.   
1000.   
1000.   
   0.346

and now if I run

> options(pillar.sigfig=10)
> pillar::pillar(c(1000.34, 1000.000078, 1000, 0.34567))
         <dbl>
1.000340000e+3
1.000000078e+3
1.000000000e+3
3.456700000e-1

Damn... I just want to see my full number 1000.000078.
Lets try again

> options(pillar.sigfig=5)
> pillar::pillar(c(1000.34, 1000.000078, 1000, 0.34567))
     <dbl>
1000.3    
1000.0    
1000.0    
   0.34567

which is still rounding my numbers :(

How can I disable this rounding + forced scientific formatting altogether? Again, rounding numbers like this is misleading and dangerous (if enabled by default). Perhaps some users may like that, I am pretty sure most people wont.

Please let me know
Thanks!

@randomgambit
Copy link

actually setting pillar.sigfig = 7 seems to be a good compromise here. 👍

@krlmlr
Copy link
Member Author

krlmlr commented Mar 1, 2018

I'm glad that pillar.sigfig = 7 works for you:

data.frame(x = 1000.000078)
#>      x
#> 1 1000
sprintf("%.23f", 1000.000078)
#> [1] "1000.00007800000003044260666"

Created on 2018-03-01 by the reprex package (v0.2.0).

@randomgambit
Copy link

interesting. I think it would be worthwhile to educate the user about floating-point approximations here. Like you could share a link to http://floating-point-gui.de/basic/ on the main tibble page as a reminder/warning.

@charliejhadley
Copy link

@randomgambit I think it's wholly unfair to have folks need an understanding of floating point approximations in the beautification of tibble output. There's only once mention of floating points in the entirety of http://r4ds.had.co.nz/ and that's as wooly as possible.

@randomgambit
Copy link

@martinjhnhadley come on, seriously? anybody can understand that, the point would be to say - look - you can control the sigfig parameter and do all sort of funny stuff with color/shading. However, keep in mind that there is a physical limit on how accurate a number can be in the computer's memory. The reprex from @krlmlr is a nice example/reminder.

@huftis
Copy link

huftis commented Mar 19, 2018

I was the one who proposed the ‘trailing decimal point’ feature, but FWIW, I’m not happy with the way it has been implemented. The idea was to use the dot to indicate that ‘there is more here, but we’re not displaying all of it (because of lack of space)’. But the way it’s implemented is to add a trailing decimal dot for all double numbers, regardless of whether they are integers (i.e. x %% 1 == 0, or x == round(x)).

So now only integer values are shown without a dot. I don’t think that’s useful, and it clutters the display of tibbles. To see if a number is an integer or a double, it’s enough to look at the column header, so the extra dot doesn’t add any information. And, at least in my experience, it’s very common that integer values are stored in numeric (double) columns.

I still think the original idea made sense. It’s useful to see if a number is an integer (not necessarily an integer) or if it has been truncated for display purposes. Having a trailing . shown only for truncated numbers (x %% 1 != 0) would give this information, and would make it easy to spot hard-to-find floating-point related issues (e.g. code that assumes that (.1 + .2) * 10 produces the number 3, something it doesn’t (it produces a number slightly larger than 3), but which R by default hides from you).

@randomgambit
Copy link

I really like the idea of the dot meaning there is more - but we dont see it. However, in practice, i will likely set enough significance digits so that I would always see a few digits in the decimal space. So that option would not impact me as much as the other ones.

@krlmlr
Copy link
Member Author

krlmlr commented Apr 9, 2018

Closing in favor of #105. The dot will be shown only if x %% 1 != 0.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2020

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants