Skip to content
This repository was archived by the owner on May 9, 2024. It is now read-only.

Support UNNEST in projections. #595

Merged
merged 1 commit into from
Jul 31, 2023
Merged

Support UNNEST in projections. #595

merged 1 commit into from
Jul 31, 2023

Conversation

ienkovich
Copy link
Contributor

This PR adds UNNEST support for projection queries (currently, we support it in GROUP BY only). Implementation is quite simple, based on target_exprs holding UNNEST expressions. This is not generic enough and basically makes UNNEST projections to be executed as a separate step, preventing it from merging with other nodes. I think we can get a better solution by introducing a new set of expressions specifically for unnesting in the execution unit, but this work might become a much bigger project than I'm willing to have for my goals.

For now, I need this simple solution to combine it with my ongoing work on TopK aggregate, which would produce an array and require the following UNNEST projection to cover H2O GroupBy Q8. This will let us avoid Window Functions for this query and make its execution scalable (now, we run it in a single thread and have an additional projection to get a single fragment prior to window function execution).

Copy link
Contributor

@alexbaden alexbaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, though you may have a problem w/ L0 tests.

I am curious what happens on GPU with the output slots - IIRC we use an atomic to get the next output slot. So, theoretically UNNEST would give you a different output ordering every time (whereas CPU should be more or less consistent, particularly if you only have one fragment).

@ienkovich
Copy link
Contributor Author

I am curious what happens on GPU with the output slots - IIRC we use an atomic to get the next output slot.

I increment this atomic counter by an array size and then use an additional non-atomic counter to output all rows into the reserved space. That should give me an adequate order.

Using new features on CPU only first would also be acceptable for me. With the following features port for other targets.

@ienkovich ienkovich merged commit 590bce6 into main Jul 31, 2023
@ienkovich ienkovich deleted the ienkovich/unnest branch July 31, 2023 22:14
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants