Skip to content

Conversation

@charlesbluca
Copy link
Collaborator

@charlesbluca charlesbluca commented May 10, 2022

Following up on dask/dask#8911, this PR switches over to from_map for the computation of limit/offsets instead of from_delayed.

Additionally, I opted to consolidate the logic of map_on_partition_index directly into limit.py, as it doesn't seem to be used anywhere else within the codebase - if this stops being the case later on, we can explore breaking the function off into a separate utility again.

cc @rjzamora

def select_from_to(df, partition_index, partition_borders):
partition_borders = partition_borders.cumsum().to_dict()
def select_from_to(index, partition, partition_borders):
partition_borders = partition_borders.compute().cumsum().to_dict()
Copy link
Collaborator Author

@charlesbluca charlesbluca May 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One difference between from_map and from_delayed is that Dask objects would already be computed by the time the function is called - using from_map, we would need to manually compute Dask objects either before calling from_map or within the function being called by from_map (which I opted for here).

Is this an intended difference for from_map?

@ayushdg
Copy link
Collaborator

ayushdg commented May 10, 2022

Given that the function is performed on all partitions of an existing dataframe instead of creating a new one, this might be a place where map_partitions might be a good candidate

@rjzamora
Copy link
Contributor

@charlesbluca and I discussed this PR briefly offline, but just to make a note here: The ideal appraoch is probably to use map_partitions (like Ayush suggested) with a BlockwiseDep argument for any arguments that vary between partitions.

@charlesbluca
Copy link
Collaborator Author

Closing in favor of #517

@charlesbluca charlesbluca deleted the limit-from-map branch July 19, 2022 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants