Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add max string length option #104

Closed
wants to merge 1 commit into from

Conversation

jankatins
Copy link

If set to a number, strings longer than this number will be shortend with ....

E.g. `options(tibble.print_string_max = 10) will only print up to 10 characters
of each string.

if set to a number, strings longer than this number will be shortend with `...`.

E.g. `options(tibble.print_string_max = 10) will only print up to 10 characters
of each string.
@jankatins jankatins mentioned this pull request Jun 20, 2016
@codecov-io
Copy link

Current coverage is 99.83%

Merging #104 into master will increase coverage by <.01%

@@             master       #104   diff @@
==========================================
  Files            14         14          
  Lines           590        595     +5   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits            589        594     +5   
  Misses            1          1          
  Partials          0          0          

Powered by Codecov. Last updated by 7c00abf...bf131a6

@jankatins
Copy link
Author

Unfortunately, <data.frame [4 x 2]> is already 20 chars long...So the string formatting should probably done before the obj_sum function is applied...?

@krlmlr
Copy link
Member

krlmlr commented Jun 20, 2016

I'm not generally opposed to limiting string length, but I think #89 was about adaptive shortening that is applied as needed if the output is too wide.

@jankatins
Copy link
Author

@krlmlr so the algo would be if printed_width > available_width: set string columns to max(nchar(names),min_string_length) and apply the "shortener" (=don't redistribute the leftover space), but this could potentially lead to a bit of leftover space. But that's already happening now...

I will do the changes, if that's what you want :-)

@krlmlr
Copy link
Member

krlmlr commented Jun 20, 2016

It would help to see an example, see the expect_output() calls in the tests.

@krlmlr krlmlr added the ready label Jun 20, 2016
@jankatins
Copy link
Author

This is an example with the current implementation and options(tibble.print_string_max = 20)...

Source: local data frame [216 x 23]
Groups: tags [7]

# A tibble: 216 x 23
   longitude.x latitude.x                    date  company eyeColor   age                  guid.x index product
         <chr>      <chr>                   <chr>    <chr>    <chr> <int>                   <chr> <int>   <int>
1    27.339158 -85.314797 2000-03-17T23:45:01....  FLYBOYZ    brown    20 d3982b08-2a5e-4abc-a...     0       2
2    27.339158 -85.314797 2000-03-17T23:45:01....  FLYBOYZ    brown    20 d3982b08-2a5e-4abc-a...     0       2
3    27.339158 -85.314797 2000-03-17T23:45:01....  FLYBOYZ    brown    20 d3982b08-2a5e-4abc-a...     0       2
4   -72.514214  46.225443 2002-03-11T04:09:28....  BIOSPAN    brown    38 9afd44ba-5b0c-4302-8...     1       2
5   -72.514214  46.225443 2002-03-11T04:09:28....  BIOSPAN    brown    38 9afd44ba-5b0c-4302-8...     1       2
6   -72.514214  46.225443 2002-03-11T04:09:28....  BIOSPAN    brown    38 9afd44ba-5b0c-4302-8...     1       2
7   -31.964069  38.553859 1980-02-19T05:23:31....     ZOID    green    23 f1f4aa36-15e8-4208-a...     2       0
8   -31.964069  38.553859 1980-02-19T05:23:31....     ZOID    green    23 f1f4aa36-15e8-4208-a...     2       0
9   -31.964069  38.553859 1980-02-19T05:23:31....     ZOID    green    23 f1f4aa36-15e8-4208-a...     2       0
10  150.363342 -85.232234 1996-05-18T20:27:20.... PHOTOBIN     blue    37 b23bbf56-f6ae-4bb7-b...     3       1

@krlmlr
Copy link
Member

krlmlr commented Jun 22, 2016

Nice. If you could enhance the output tests, so that the "known output" becomes part of this PR? Just add a test to test-trunc-mat.r, the test file will be generated if it's missing.

See #100 for problems with "wide" characters in certain scripts. For instance, I'm getting:

> sprintf("%.*s...", 4, "合同录入日期")
[1] "\xe5..."

We need a better method to shorten a string to a given "visible" width.

@hadley: Should string columns be limited to width 20 by default?

@krlmlr krlmlr added in progress and removed ready labels Jun 22, 2016
@hadley
Copy link
Member

hadley commented Jun 22, 2016

Shortening by default feels a bit too aggressive to me - I'd prefer to do only if there's not enough space. Ideally a one column df with long string would be truncated to screen width.

@krlmlr
Copy link
Member

krlmlr commented Jul 29, 2016

@JanSchulz: Would you like to contribute more to this PR? See #104 (comment).

@jankatins
Copy link
Author

@krlmlr I should have time to work on this from wednesday next week... Sorry if that's too late :-/

Regarding the unicode stuff: is nchar(x, type = "width") the solutions? #100 mentions that this fails on windows?

So, to recap the requirements:

  • don't shorten if everything fits the space
  • one column should be shortend to exactly the screen width
  • min width of a column is max(nchar(names),min_string_length)
  • min_string_length is a setting

Algo:

  • if columns do not fit, set str columns to min width
  • calculate space and the maximum number of "fit in" (=visible) columns
  • calculate "leftover" space and redistribute to the string visible columns

@krlmlr
Copy link
Member

krlmlr commented Jul 30, 2016

Thanks. No hurry.

nchar(type = "width") works in RStudio and RGui, at least for the examples shown in #100. It works the same in the R terminal on Windows, but the output is printed unhelpfully as "<U+....>", which renders useless the width calculation. At some point we may need to invent our own nchar() that takes this into account, so it's probably a good idea to encapsulate the width calculation.

The "shortening to screen width" part looks interesting to me in the context of #100. If everything else fails, you could split the strings to code points and calculate the width for each; I'd generally exclude code points with zero width if the next codepoint doesn't fit:

> c("成交日期", "合同录入日期") %>% strsplit("") %>% lapply(nchar, type = "width")
[[1]]
[1] 2 2 2 2

[[2]]
[1] 2 2 2 2 2 2

Otherwise, the requirements look good to me.

@krlmlr
Copy link
Member

krlmlr commented Jul 30, 2016

Two more points:

  • You could use a Unicode ellipsis "\u2026", it renders as a dot in R terminal on Windows and as a single-char wide ellipsis elsewhere.
  • Are you going to look into shortening of column names, too?

@jankatins
Copy link
Author

regarding windows <U+....> printing: r-lib/evaluate#66 This problem seems to happen quite deep in R. We had quite a lot of fun in IRkernel (or better in repr) with that because we use sink (or better evaluate does) to get the output of a computation: https://github.com/IRkernel/repr/blob/master/R/repr_matrix_df.r#L16-L27

@krlmlr
Copy link
Member

krlmlr commented Aug 19, 2016

Shelving this for now, I think strings with limited width will be easier with #144.

@randomgambit
Copy link

hello everyone, thanks for your great work!

Just wondering if there are any plans about implementing that? I think its pretty useful. For instance, in Pandas one could simply do:

In [43]: df = pd.DataFrame(np.array([['foo', 'bar', 'bim', 'uncomfortably long string'],
   ....:                             ['horse', 'cow', 'banana', 'apple']]))
   ....: 

In [44]: pd.set_option('max_colwidth',40)

In [45]: df
Out[45]: 
       0    1       2                          3
0    foo  bar     bim  uncomfortably long string
1  horse  cow  banana                      apple

In [46]: pd.set_option('max_colwidth', 6)

In [47]: df
Out[47]: 
       0    1      2      3
0    foo  bar    bim  un...
1  horse  cow  ba...  apple

In [48]: pd.reset_option('max_colwidth')

which is really helpful when one prints a tibble that contains both text and numeric values.

@randomgambit
Copy link

hello there! any updates on this? I tried options(tibble.print_string_max = 20) but does not seem to work. Thanks!

1 similar comment
@randomgambit
Copy link

hello there! any updates on this? I tried options(tibble.print_string_max = 20) but does not seem to work. Thanks!

@krlmlr
Copy link
Member

krlmlr commented May 31, 2017

Development of better column formatting has moved to https://github.com/hadley/colformat. Would you like to contribute there?

Comments to a closed issue are not very effective, because it is easy to ignore them.

@randomgambit
Copy link

thanks @krlmrl ! i ll have a look at it

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants