max_num_of_rows not taking effect in html backend #217

juliohm · 2023-09-05T22:53:46Z

Hi @ronisbr , hope you are doing well,

We are facing an issue with the html backend because it is materializing entire columns even when we pass the option max_num_of_rows. Is there a quickfix or workaround to avoid building the entire columns in this backend?

The text backend is working just fine and we can create very large tables and display just a few rows quickly. The problem is in the creation of the objects, not the display itself.

The text was updated successfully, but these errors were encountered:

ronisbr · 2023-09-05T23:01:04Z

Hi @juliohm !

Everything if fine here, how about you?

Can you provide a MWE? Because this option is working as expected here and also in DataFrames.jl. Take a look at this example:

julia> a = rand(10_000, 10);

julia> @time pretty_table(IOBuffer(), a; tf = tf_html_default, max_num_of_rows = 10)
  0.078409 seconds (350.23 k allocations: 24.639 MiB, 99.56% compilation time)

julia> @time pretty_table(IOBuffer(), a; tf = tf_html_default, max_num_of_rows = -1)
  0.178180 seconds (4.59 M allocations: 228.235 MiB)

PrettyTables.jl uses the same internal handling (a type I call ProcessedTable) in all backends to limit the number of data that will be rendered.

Can it be something specific to your type that somehow triggers the entire rendering when a cell is called with MIME("text/html") ?

ronisbr · 2023-09-05T23:05:07Z

The last time I saw something like this was caused due to the Tables.jl implementation of the type, that was leading to entire column rendering. Take a look here please:

#162

juliohm · 2023-09-06T00:26:12Z

I am doing fine, thanks! :)

Here is a MWE that you can try on Jupyter (I tried it on VSCode directly because it supports notebooks):

using GeoTables

georef((a=rand(1000,1000),))

It takes 43s for me.

juliohm · 2023-09-06T00:40:17Z

Here is the precise location of the call:

https://github.com/JuliaEarth/GeoTables.jl/blob/7bc56e5135e48900861d99f4b1a5599bed1eeeb6/src/abstractgeotable.jl#L361-L363

juliohm · 2023-09-06T00:51:14Z

I think it is our fault! Splatting on the _common_kwargs is causing the problem 🤦🏽‍♂️

juliohm · 2023-09-06T00:56:40Z

But I don't know why. The _common_kwargs function returns very small objects, it shouldn't be a bottleneck.

juliohm · 2023-09-06T09:32:18Z

@ronisbr can you please confirm the semantics of the option max_num_of_rows? We understood that this option can be used to omit rows in very tall tables to save screen space and avoid materialization of all related objects.

Apparently, if we set the maximum number to 10 and the table has 1000000 rows, no information is shown to indicate the omitted rows:

ronisbr · 2023-09-06T12:09:57Z

Hi @juliohm !

Yes, this is the expected behavior. There is some room for improvement when using max_num_of_* in the text backend. The biggest issue is that this backend has two ways of limiting the number of data to be printed: max_num_of_* and the display size. The internal algorithm should change significantly to solve those inconsistencies.

The advice is to use max_num_of_* only in the LaTeX and HTML, and let the text backend limit the data using the display size.

I will check why it is taking too much time in that function.

ronisbr · 2023-09-06T12:24:09Z

Hi @juliohm !

I understood the problem!

When you use vcrop_mode = :middle, we render the initial and the last rows. To obtain the last rows using Tables.jl API, we need to iterate the rows in each column:

PrettyTables.jl/src/tables.jl

Line 106 in b72c085

it, state = iterate(rtable.table, state)

This process is taking too long for GeoTables. If you change vcrop_mode = :bottom, everything is very fast.

The vast majority of the time, the algorithm is in this function here:

https://github.com/JuliaEarth/GeoTables.jl/blob/7bc56e5135e48900861d99f4b1a5599bed1eeeb6/src/abstractgeotable.jl#L129

juliohm · 2023-09-06T12:31:28Z

Thank you @ronisbr ! Implementing the suggestions now. Will report back if the issue is still present.

Feel free to change the title of the issue to a more appropriate name :)

ronisbr · 2023-09-06T12:35:35Z

Thanks for the report! I think I can greatly improve everything if I keep a cache of the state. I will try to do some modifications here and hope it will not introduce a lot of type instabilities.

My ideia is:

Every time we access a row in Tables.jl (row access), I will store the states.
If we have a state close to the row we want, we just use the cache instead of iterating everything.

This algorithm will greatly improve things for this case. However, my concern is that it will allocate a lot more for the other scenarios in which the iteration is much faster.

ronisbr · 2023-09-06T13:21:49Z

Hi @juliohm!

I could reduce the time from several seconds to 0.001s considering that GeoTables state in the iteration is the row number. I think we should open a issue in Tables.jl to add a information that a RowTable state is the row number, which will drastically reduce the iterations, leading to a substancial gain.

ronisbr · 2023-09-06T15:43:53Z

By the way, if Tables.jl did not add this hint, I will add a keyword probably called: row_tables_jl_use_row_id_as_state, which will solve the problems for GeoTables!

juliohm · 2023-09-06T15:45:40Z

Thank you Ronan! That is very helpful! This package is truly amazing and only gets better! Em qua., 6 de set. de 2023 12:44, Ronan Arraes Jardim Chagas < ***@***.***> escreveu:

…

By the way, if Tables.jl did not add this hint, I will add a keyword probably called: row_tables_jl_use_row_id_as_state, which will solve the problems for GeoTables! — Reply to this email directly, view it on GitHub <#217 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZQW3P5Q7OIOTY2B4OXCGDXZCK4JANCNFSM6AAAAAA4MOJ5AA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Closes #217, Closes #220

ronisbr · 2023-11-03T00:29:16Z

I think this issue is solved with the new code that uses Table.subset, leading to a huge gain when printing with middle cropping.

ronisbr mentioned this issue Sep 6, 2023

Add a hint to tell that the row iterator state is the row number in tables with row access JuliaData/Tables.jl#342

Open

ronisbr added a commit that referenced this issue Nov 3, 2023

🔧 Improve how we handle Tables.jl objects

6ac8366

Closes #217, Closes #220

ronisbr closed this as completed Nov 3, 2023

ronisbr reopened this Nov 3, 2023

ronisbr closed this as completed in bed73b6 Nov 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

max_num_of_rows not taking effect in html backend #217

max_num_of_rows not taking effect in html backend #217

juliohm commented Sep 5, 2023 •

edited

Loading

ronisbr commented Sep 5, 2023

ronisbr commented Sep 5, 2023

juliohm commented Sep 6, 2023 •

edited

Loading

juliohm commented Sep 6, 2023

juliohm commented Sep 6, 2023

juliohm commented Sep 6, 2023

juliohm commented Sep 6, 2023

ronisbr commented Sep 6, 2023

ronisbr commented Sep 6, 2023 •

edited

Loading

juliohm commented Sep 6, 2023

ronisbr commented Sep 6, 2023

ronisbr commented Sep 6, 2023 •

edited

Loading

ronisbr commented Sep 6, 2023

juliohm commented Sep 6, 2023 via email

ronisbr commented Nov 3, 2023

max_num_of_rows not taking effect in html backend #217

max_num_of_rows not taking effect in html backend #217

Comments

juliohm commented Sep 5, 2023 • edited Loading

ronisbr commented Sep 5, 2023

ronisbr commented Sep 5, 2023

juliohm commented Sep 6, 2023 • edited Loading

juliohm commented Sep 6, 2023

juliohm commented Sep 6, 2023

juliohm commented Sep 6, 2023

juliohm commented Sep 6, 2023

ronisbr commented Sep 6, 2023

ronisbr commented Sep 6, 2023 • edited Loading

juliohm commented Sep 6, 2023

ronisbr commented Sep 6, 2023

ronisbr commented Sep 6, 2023 • edited Loading

ronisbr commented Sep 6, 2023

juliohm commented Sep 6, 2023 via email

ronisbr commented Nov 3, 2023

juliohm commented Sep 5, 2023 •

edited

Loading

juliohm commented Sep 6, 2023 •

edited

Loading

ronisbr commented Sep 6, 2023 •

edited

Loading

ronisbr commented Sep 6, 2023 •

edited

Loading