-
Notifications
You must be signed in to change notification settings - Fork 690
[WIP] Add per group row number functionality in AlaSQL #2208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Co-authored-by: mathiasrw <[email protected]>
Co-authored-by: mathiasrw <[email protected]>
| alasql('INSERT INTO test_data VALUES ("X", 1), ("X", 2), ("X", 3), ("Y", 10), ("Y", 20)'); | ||
|
|
||
| var res = alasql( | ||
| 'SELECT * FROM (SELECT category, amount, GROUP_ROW_NUMBER() AS rn FROM test_data ORDER BY category, amount) WHERE rn <= 2' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dont we need to have a 'group by' to be able to test this? What is the SQL-99 spec saying?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation doesn't require GROUP BY. It numbers rows by detecting when the first selected column's value changes (requiring data to be pre-sorted with ORDER BY).
SQL-99 window functions use ROW_NUMBER() OVER (PARTITION BY col) syntax, but AlaSQL doesn't fully support OVER/PARTITION BY yet (test198.js shows it's in if(false) blocks).
Would you prefer:
- Keep current approach (simple, works with ORDER BY)
- Require GROUP BY clause in the query
- Wait for full OVER/PARTITION BY support
The current approach enables the main use case (get top N per group) without complex window function implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But how do you know when it will add an extra if you "order by x, y" like two things? is it then grouping across the two, so if x + y is different then it is another group?
What is the propper SQL-99 syntax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current implementation only groups by the first column - so ORDER BY x, y groups by x alone. Test case 5 shows this: with ORDER BY dept, team, rows restart numbering only when dept changes, not when dept+team changes.
SQL-99 syntax is: ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY sort_col) - this partitions by the combination of col1+col2.
The current simple approach works for single-column grouping. For multi-column support, we'd need either:
- Parse PARTITION BY clause (requires grammar changes)
- Add a parameter like
GROUP_ROW_NUMBER(col1, col2) - Keep as single-column only (simplest)
Which approach would you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot
Lets see if we can get a solid way to support the proper SQL-99 notation.
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
Implementation Plan for GROUP_ROW_NUMBER()
Based on the issue and comments, implementing per-group row numbering functionality:
Implementation Summary:
GROUP_ROW_NUMBER()function that numbers rows within groupsCurrent Limitation:
ORDER BY col1, col2, onlycol1is used for grouping detectionROW_NUMBER() OVER (PARTITION BY col1, col2)syntax for multi-column partitioningUsage Example:
<issue_title>Implement per group row numbers rather than per result set row numbers</issue_title>
><issue_description>It can be very useful to be able to generate row numbers for each group, which lets you do things like return the first 1 or X rows per group, but the ROW_NUMBER() function in AlaSQL appears to return the row numbers not of the group but of the result set overall.</issue_description>
>
><agent_instructions>Please start by making a test file named test1126.js and demonstrate how we would like it to work, and then solve the issue.
>
> when updating the grammar in alasqlgrammar.jison you need to run
yarn jison && yarn test>
> always do a
yarn formatbefore committing. </agent_instructions>>
> ## Comments on the Issue (you are @copilot in this section)
>
>
><comment_new>@mathiasrw
> This would change existing behavour. We might need to make a new name for it - like GROUP_ROW_NUMBER() - how does that sound?</comment_new>
><comment_new>@mathiasrw
> Not that I know of. But its a neat feature. Not too hard to implement for a new person :-D</comment_new>
><comment_new>@mathiasrw
> @jaengelberg Any chance you could come with an example that could work as the foundation for a test case?
>
> @biddwab09 Yes, I am pretty sure its only 424select.js - but I might be mistaken. Its a good place to start in any case :)</comment_new>
>
>
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.