Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XGrid] Make Infinite loading support rowCount #1715

Closed

Conversation

DanailH
Copy link
Member

@DanailH DanailH commented May 20, 2021

Fixes #1247

Preview: https://deploy-preview-1715--material-ui-x.netlify.app/components/data-grid/rows/#server-side-infinite-loading

This is a proposed solution and an initial draft version of the feature. There were some problems around the implementation and an additional clean-up is required.

Regarding the potential solution - the way I did it now is by allocating space in advance for the rows (if rowCount is provided) and fill that array with null values (can be discussed). Once we have that we have a total of 2 scenarios:

Scenario 1 - user starts scrolling from top to bottom:

  • there is a new option on the grid that will be called when the virtual pages change - loadRows (I need a better name for it). It will be called with 2 params - startIndex (that is the index from which rows need to be loaded) and the viewportPageSize. Once those rows are received they will be converted to grid rows state and merged with the existing rows. This will continue until all rows are loaded and there are no more null values in the rows array.

Scenario 2 - user goes to the bottom very fast and starts scrolling from bottom to top:

  • Similar to the first option but the difference is that the user is likely to see empty rows (skeleton rows) when scrolling up. The same thing applies if the user scrolls to any virtual page of the grid - only those rows will be requested to be loaded and they will be surrounded by skeleton rows or null values in the array.

Problems - there are a couple of problems with the feature in general:

  • In order to render selection rows they need to be valid rows, meaning they need to have auto-generated unique IDs that are easily identifiable and fiterable. my current solution, a temporary one is to have a static prefix and random number but I will just replace it with the ID helper we already have.
  • Sorting - there were some problems with the sorting but the main one is what happens if you have rows that have skeleton rows between them - currently the skeleton rows are ignored and sorting just removes the null values from the rows array and filters it as usual. The problem is when the sorting is removed that the initial state is lost, meaning that the rows will no longer be surrounded by skeleton rows.
  • Filtering - same as sorting
  • CSV export - that one is not really a problem, just something that we need to be aware of.
  • Keyboard navigation - currently the keyboard navigation is broken because of the skeleton rows. I don't really have a solution to that problem because essentially if there are 10 rows loaded but the rowCount is 50 the other 40 are actual rows that are rendered. I'm open to suggestions.

All and all that's it. Cleanup, dead code removal, commented-out code removal, and renaming are to come. Tests and docs need to be added once the implementation is finalized.

TODO:

  • Cleanup, dead code removal, commented-out code removal, and renaming
  • Tests
  • Docs

Benchamark

@DanailH DanailH added new feature New feature or request components: XGrid labels May 20, 2021
@DanailH DanailH self-assigned this May 20, 2021
@DanailH DanailH changed the title [xGrid] Make Infinite loading support rowCount [XGrid] Make Infinite loading support rowCount May 20, 2021
Copy link
Member

@oliviertassinari oliviertassinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a benchmark section. Doing so, the first thing that I could see that is not working is the signature of loadRows. It should contain the sorting and filtering state as params. For instance, once sorting is applied, the previous loaded rows are no longer relevant. We need to discard them and restart from the new viewport and sort model (from what I understand).

Comment on lines 123 to 138
setGridState((state) => {
const allRowsUpdated = state.rows.allRows.map((row, index) => {
if (index >= startIndex && index < startIndex + pageSize) {
return newRowsToState.allRows[index - startIndex];
}
return row;
});

internalRowsState.current = {
allRows: allRowsUpdated,
idRowsLookup: { ...state.rows.idRowsLookup, ...newRowsToState.idRowsLookup },
totalRowCount: state.rows.totalRowCount,
};

return { ...state, rows: internalRowsState.current };
});
Copy link
Member

@oliviertassinari oliviertassinari May 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section seems weird. Why not use the apiRef.current.updateRows() API? It supports additions.

Copy link
Member Author

@DanailH DanailH May 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I didn't understand the updateRows but from what I can see it supports update, delete, and adding new rows but it can't replace existing rows, which is what I need here. but now thinking about it maybe it would be better to extend updateRows 🤔. The only thing missing is the start and end indexes from which the update needs to happen.

@DanailH
Copy link
Member Author

DanailH commented May 26, 2021

Ok, quick update - I fixed the scrolling, filtering, and sorting. The API is also more clear now. I have some unit tests failing which I will fix (probably they are failing because of the new array structure) but all and all the logic will be ready for review soon.

@dtassone
Copy link
Member

We should not add the empty rows in state, can they just be visual elements. We have the row count, so I actually don't see why we actually need to fill the allRows array...
If we do, we could have a lot of memory allocated to empty object, and we would always have to filter the empty rows all over the places, I don't think it will end up being a great design

@DanailH
Copy link
Member Author

DanailH commented May 27, 2021

We should not add the empty rows in state, can they just be visual elements. We have the row count, so I actually don't see why we actually need to fill the allRows array...
If we do, we could have a lot of memory allocated to empty object, and we would always have to filter the empty rows all over the places, I don't think it will end up being a great design

The problem with not keeping them in the state is how would you know which indexes are loaded and which aren't? How does the rowCount help us know what is there and what is not? Also, having allocated the space in advance isn't so bad actually because finding and replacing elements by indexes is a linear operation now.

Also as far as I understood the rows are virtualized anyway no matter if they are empty or now. It's the same as loading normal rows.

@dtassone
Copy link
Member

dtassone commented May 27, 2021

We should not add the empty rows in state, can they just be visual elements. We have the row count, so I actually don't see why we actually need to fill the allRows array...
If we do, we could have a lot of memory allocated to empty object, and we would always have to filter the empty rows all over the places, I don't think it will end up being a great design

The problem with not keeping them in the state is how would you know which indexes are loaded and which aren't? How does the rowCount help us know what is there and what is not? Also, having allocated the space in advance isn't so bad actually because finding and replacing elements by indexes is a linear operation now.

Also as far as I understood the rows are virtualized anyway no matter if they are empty or now. It's the same as loading normal rows.

Ok so I'm not in favour of adding skeleton rows in the state for the list of problems you already identified. There will be more and more when we continue adding features. This will stay a pain as we keep building on top of it.

Therefore we have different ways of proceedings?
First half of the issue is to display skeleton rows on scroll when we have totalRowCount > rows.length. Looks done.
We could split the issue/PR if you want...

Second half is fetching rows when we scroll.

const firstRowIdx = virtualPage * apiRef.current.state.containerSizes.viewportPageSize

how would you know which indexes are loaded and which aren't
Scenario
Given:
virtualPageSize = 10
TotalRowCount = 60
virtualPage = 3

  • At this stage my array of allRows contains 10 items, same for the lookup

  • We could publish every time there is a change, or we could check that there are some rows and if not we publish the event
    fetchRows({startIndex: 30, virtualPageSize: 10}) // Add columns in there as well, useful for graphQL users

    • if we publish every time there is a change, then users can stream row updates and make sure you always see the latest data. They would also need to cache the response in case it is not changing over time.
    • if we don't and we only publish when we there is no rows, then no need to cache, and users can use the streaming API for updates...
      We can start by publishing all the time as it's easier to manage.
  • User should call a method on apiRef to insert rows at a given index.

onFetchRows(params=> ajax.getJson(resp=> apiRef.current.insertRows(startIndex, arraysOfNewRows)));
// insertRows.ts
(index, rows)=> {
    setGridState(state=> {
      const allRows =  [...state.rows.allRows]
      allRows.splice(startIndex, 0, arraysOfNewRows.map(row=> row.id));
      const idRowsLookup = {...state.rows.idRowsLookup}
      arraysOfNewRows.reduce(lookup, row =>  {
        //Something like that, check useRows to do it properly with getRowId...
        lookup[row.id] = row
      }, idRowsLookup);
      
      return {  ...state, rows: {
          ...state.rows,
          allRows, idRowsLookup, 
       }
    }));
}

Then in viewport

...
 return renderedRows.map(([id, row], idx) => !row ? <SkeletonRows /> : ...) 
...

@DanailH DanailH marked this pull request as ready for review June 2, 2021 09:25
@dtassone
Copy link
Member

dtassone commented Jun 2, 2021

Great progress 👍

Could you add a story in the storybook for this as well please. We could use large number of rows.

I think something is not quite right yet in terms of UX. IMO, if you hook to the virtual page change, it might be too late as the skeleton are already visible and you start the fetch. We need to fetch before that, when we reach a certain threshold.

@dtassone
Copy link
Member

dtassone commented Jun 2, 2021

For complex feature like this, we could use a feature branch on upstream so it is easier for everyone to get involve and create PRs on top of it.

Copy link
Member

@oliviertassinari oliviertassinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This demo doesn't seem to work https://deploy-preview-1715--material-ui-x.netlify.app/components/data-grid/rows/#server-side-infinite-loading. I mean, when filtering and sorting, the data shown are incoherent.

Comment on lines +37 to +41
params.api.current.insertRows({
startIndex: params.startIndex,
pageSize: params.viewportPageSize,
newRows: newRowsBatch,
});
Copy link
Member

@oliviertassinari oliviertassinari Jun 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong. The data grid is the best one to know if the resolved value is still relevant or not, not the developer at this level. I think that we should simply yield:

Suggested change
params.api.current.insertRows({
startIndex: params.startIndex,
pageSize: params.viewportPageSize,
newRows: newRowsBatch,
});
return {
startIndex: params.startIndex,
pageSize: params.viewportPageSize,
newRows: newRowsBatch,
};

Or at least, if we really want to keep this API, we need to update this demo to force the requests to resolve in the right order (ignore outdated requests). Which we don't do here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial version was like the one you are suggesting but after discussing it with @dtassone I reworked it to match the rest of the grid's APIs. In that case the responsibility of loading the rows falls on to the developer.

Copy link
Member

@oliviertassinari oliviertassinari Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I think that it's important we update the demo to showcase a sound implementation of data fetching. It will be a good forcing function to see the complexity we are pushing to developers, it will help us evaluate if we are happy with this or prefer to internalize it. If we decide to internalize the complexity in the future, the current API is not flying, we would need to add a "token" if we want to keep an api call, or to switch to a promise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is what you mean #1715 (comment). The current demo does show how to use the feature in action, it just doesn't do the filtering and sorting part.

docs/src/pages/components/data-grid/rows/rows.md Outdated Show resolved Hide resolved
docs/src/pages/components/data-grid/rows/rows.md Outdated Show resolved Hide resolved
Comment on lines +68 to +79
{id.toString().indexOf('null-') === 0 ? (
<GridSkeletonRowCells
columns={visibleColumns}
firstColIdx={renderState.renderContext!.firstColIdx!}
lastColIdx={renderState.renderContext!.lastColIdx!}
hasScrollX={scrollBarState.hasScrollX}
hasScrollY={scrollBarState.hasScrollY}
showCellRightBorder={!!options.showCellRightBorder}
extendRowFullWidth={!options.disableExtendRowFullWidth}
rowIndex={renderState.renderContext!.firstRowIdx! + idx}
/>
) : (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hardcoding different types of rows. How about we add an API to turn the loading state programmatically on some cells?

Suggested change
{id.toString().indexOf('null-') === 0 ? (
<GridSkeletonRowCells
columns={visibleColumns}
firstColIdx={renderState.renderContext!.firstColIdx!}
lastColIdx={renderState.renderContext!.lastColIdx!}
hasScrollX={scrollBarState.hasScrollX}
hasScrollY={scrollBarState.hasScrollY}
showCellRightBorder={!!options.showCellRightBorder}
extendRowFullWidth={!options.disableExtendRowFullWidth}
rowIndex={renderState.renderContext!.firstRowIdx! + idx}
/>
) : (

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would even argue that we could breakdown this effort once we get a good enough POC of the integration to have the skeleton standalone, and maybe use it for the infinite loading use case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that that is not the best solution but there is already a ticket for changing the loading visuals once this initial version is merged #1685

Copy link
Member

@oliviertassinari oliviertassinari Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had forgotten about #1685. In this case, I propose we do #1685 first or we start here without any skeleton, no preferences. But whatever is necessary to focus on one problem at the time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that without skeleton rows this feature doesn't work at all because it will just display empty rows. The UX wouldn't be great.

Comment on lines +158 to +163

apiRef.current.publishEvent(GRID_VIRTUAL_PAGE_CHANGE, {
currentPage: page,
nextPage,
api: apiRef,
});
Copy link
Member

@oliviertassinari oliviertassinari Jun 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be an implementation detail of the virtualization. I don't think that we should expose it.

Suggested change
apiRef.current.publishEvent(GRID_VIRTUAL_PAGE_CHANGE, {
currentPage: page,
nextPage,
api: apiRef,
});

Instead, prefer listening to the viewport or anything that is resilient to a more standard virtualization implementation like in react-window, react-virtuoso, react-virtual.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can call the API here directly but that would deviate from the established pattern. I added that event because of it. We can not document the event and keep it private?

Copy link
Member

@oliviertassinari oliviertassinari Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we listen to the scroll event instead? Or anything close to https://ag-grid.com/react-grid/viewport/. But avoid at ALL cost to depend on any virtualization implementation detail? Based on #1911 and #1903, I think that there is a nonnegligible chance that this "page" notion won't stick around.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can listen to any of the scroll events we currently have but I would still need to calculate where the last currently virtualized row is and when the new one comes in.
I know about the current limitations of the virtualization, we discussed them in the last meeting with @dtassone and @m4theushw (the scrollToIndex jump problem). The way I see it even if the logic of the virtualization changes would still be a notion of what is currently not virtualized so a potential refactoring would not be much. That is the only part that potentially wound need to change.

@DanailH
Copy link
Member Author

DanailH commented Jun 15, 2021

This demo doesn't seem to work https://deploy-preview-1715--material-ui-x.netlify.app/components/data-grid/rows/#server-side-infinite-loading. I mean, when filtering and sorting, the data shown are incoherent.

I see, yes that is because the x-grid-data-generator doesn't support sorting or filtering. The data is just being prefetch. To handle that demo I would need to build the filter and sorting in the data generator.

@oliviertassinari
Copy link
Member

oliviertassinari commented Jun 16, 2021

I see, yes that is because the x-grid-data-generator doesn't support sorting or filtering. The data is just being prefetch. To handle that demo I would need to build the filter and sorting in the data generator.

@DanailH Ah right. I wonder if it's not time to introduced a new notion, a server faker. AG Grid has that, they have a light SQL client to create WHERE SORT queries, it seems quite smart considering it's meant to talk to a database, and developers will like need to rebuild this on the server-side language of their choice.

@DanailH
Copy link
Member Author

DanailH commented Jun 22, 2021

Ok, I spent some time working on the issues we discussed yesterday. I did fix the problem with the pagination and the skeleton rows. I added a new grid option infniteLoadingMode (the same as the ones we have for sorting, filtering, and pagination). This helps to clearly draw a line between allocating the rows array in advance and keeping it as it was.

Regarding the filter and sorting - I looked again but I wasn't able to find a way to make it work on the client. The problem is that the sorting for example uses getAllRowIds and then basically sorts the IDs, which in my case contains the null rows IDs. Now sorting is not the problem, I can filter out the empty rows, the problem comes when you want to return to the original state - you don't know the original position of the rows (because there may be empty rows between them). The filtering has the same issue.

I think for now it is ok to have the constraint that the developer should mark the sortingMode and filterMode to be server.

In addition, I added Unstable in the feature's title.

Copy link
Member

@oliviertassinari oliviertassinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a close look at the changes. I have spent +2 hours reviewing this as the feature is important. On the surface, this is a simple problem, and there are not a lot of changes in this PR (less than 1k LOCs for a 4 weeks old effort), but looking close, it surfaces a lot of structural questions.

What I could find:

  1. The initial data set. When it's less than the viewport height, the rows stay in the loading state. We miss a call to onFetchRows to load missing rows in the initial viewport. It's easier to see with an empty initial rows array.
  2. infniteLoadingMode (not spelled correctly). I can't make the use case of this prop. Please detail.
  3. The demo needs a lot more than 50 rows to be meaningful. The canonical use case is for when developers can't load all the rows. A request that asks for more than 100 rows starts to be challenging for the backend to process. I think that at 1,000 rows, the example starts to make sense.
  4. The demo doesn't allow us to demonstrate that the filtering and sorting still work correctly. IMHO it's important. We have talked about how the problem is also present in https://material-ui.com/components/data-grid/pagination/#server-side-pagination. It's true and false. In the case of the server-side pagination, we can argue that the sorting only works for the page. Here, we can't argue, it's broken. So I reiterate this point [XGrid] Make Infinite loading support rowCount #1715 (comment).
    What we could do is have multiple demos: one simple (not sorting/filtering), to be easier to grasp for developers, and then more real-life ones.
  5. The demo doesn't handle the concurrency conflicts. This becomes important once we solve 4. There is nothing that suggests we will be able to easily nor hard to solve this problem userland. I think that we should try in the demo. My guess is that controlling all the states that change the order will be enough: filtering & sorting, so far. If we can't, we should explore a built-in API.
  6. I think that the GRID_VIRTUAL_PAGE_CHANGE event is not cutting it. First, I don't think that we should EVER fire an event specific to the virtualization, it should stay an implementation detail, e.g. if disabled. Second, The startIndex is computed with params.nextPage * containerSizes.viewportPageSize which is very specific to the current implementation. In [RFC] Row virtualization overscanning tradeoff #270 I defend that it's wrong. If we change it, we will have to update two different places.
    Instead, we could have the virtualization fire a GRID_VIEWPORT_ROW_CHANGE, with the startIndex and stopIndex, as in https://ag-grid.com/javascript-grid/viewport/.
  7. We say in the documentation that this feature is XGrid only. How do we enforce it in the implementation? It seems available in DataGrid too.
  8. The new onFetchRows prop is not documented on the API page. Why not?
  9. In the API onFetchRows, I don't think that there is a place for a viewportPageSize argument. This seems to be an implementation detail. Who knows how many rows the grid needs to load? Maybe we only need to fetch 1 row because we quickly scroll between two sections. Maybe we need to load 2 * viewportPageSize because we have configured the grid to over fetch.
  10. Speaking of loading configuration. AG Grid has many interesting options, which gives a taste of the complexity of the problem.
  11. insertRows() is calling apiRef.current.updateViewport(); and apiRef.current.applySorting(); directly. This seems far from OK. Why aren't we firing the GRID_ROWS_SET event instead?
  12. insertRows is doing an upsert would renaming it accordingly be more accurate?
  13. The current page approach fails to load the whole area of the page. I can systematically find areas that are loading forever when scrolling and jumping 200 rows.

Regarding the next step. At first, this feature might not look like much, but it's a BIG deal. See how much content AG Grid has for it.

I strongly disagree with the proposal to push it to feature branch. To me, it says: let's give up on the feature. Somebody will resume the effort later in the future, likely from scratch.
I also don't like the idea that we document this feature as experimental, it will lead developers to open bugs and report issues for problems we already know.
It seems to be a better course of action to get it right, to save them time, for more quality feedback in the future.

So my proposal would be to close this PR and use it as a POC. Now, we can spin off the effort into smaller, less ambitious problems, that we solve right.

  1. We can fix the viewport problem. How can developers know what are the visible rows on the viewport? Effectively, we can fire a new event with the start row and the stop rows of the given viewport. It would fire each time it changes. One possible name GRID_VIEWPORT_ROW_CHANGE.
  2. We implement the row loading state. We can figure out an API that makes it easy and built-in. For instance, should it only display if the row takes more than x ms to resolve?
  3. At this point, we can try to rebuild the same demo, using the existing API, and see if it work. It should work, we have all we need, AFAIK. This will also be a great opportunity to fix this filter & sorting problem. We can even explore a SQL-like client to simulate the server filtering/sorting, but maybe overkill, to benchmark.
  4. We leave a 🚧 section in the docs, after the demo of 3., for a more built-in solution. We create an issue with the different solutions that we are considering. We link it and gather feedback from the community. Effectively, we have multiple possible approaches. For instance, we can have a user-facing hook that does the interface between, say react-query and the data grid to manage the fetching logic. Bringing the following options: https://ag-grid.com/javascript-grid/server-side-model-configuration/. At this point, the complexity is about: how much do we cache? how much in chunk do we load? how many requests in parallel do we run? how do we ignore outdated responses? how do we flush the data when sorting/filtering/grouping/pivoting/tree-data changes?

@DanailH Thanks for the exploration

return newData.rows;
};

export default function InfiniteLoadingGrid() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export default function InfiniteLoadingGrid() {
export default function ServerInfiniteLoadingGrid() {

@DanailH
Copy link
Member Author

DanailH commented Jun 23, 2021

@oliviertassinari thanks for the detailed review.

Because I would still like to deliver value for this ticket I will close that one and work to open a PR with the first point:

  1. We can fix the viewport problem. How can developers know what are the visible rows on the viewport? Effectively, we can fire a new event with the start row and the stop rows of the given viewport. It would fire each time it changes. One possible name GRID_VIEWPORT_ROW_CHANGE.

My question about it is will it work on a per-row basis - what I mean is that if the current viewport has rows between 5-10 and I scroll one row should it emit a change 6-11 or should it emit when all the rows in the current viewport are changed - 10-15?

@DanailH DanailH closed this Jun 23, 2021
@oliviertassinari
Copy link
Member

oliviertassinari commented Jun 26, 2021

e and work to open a PR with the first point:

@DanailH Thanks for following me on this. I'm convinced that it will lead to a great outcome down this path 👌. If it doesn't, then we can reconsider the approach the next time we face a similar challenge.

will it work on a per-row basis

I think that it should publish the event each time the range changes. So 6-11 in your example. This would allow developers to implement throttling. For instance, if after 500ms, nothing changes, then they can fetch row 11. Does it make sense?

InfiniteLoader is almost implemented on the same primitive. I think that the composition pattern that has used is really interesting. It's built on top of the existing API, clearly isolated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[data grid] Lazy load data with virtualization
4 participants