Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 20, 2025

Implementation Plan for GROUP_ROW_NUMBER()

Based on the issue and comments, implementing per-group row numbering functionality:

  • Understand the current ROW_NUMBER() implementation
  • Explore the codebase structure (424select.js, 40select.js, 55functions.js, 423groupby.js)
  • Create test file test1126.js demonstrating desired GROUP_ROW_NUMBER() functionality
  • Add GROUP_ROW_NUMBER() function to stdlib in 55functions.js
  • Modify query compilation to track GROUP_ROW_NUMBER() usage
  • Implement per-group row numbering logic in post-processing
  • Run yarn format before committing
  • Test the implementation - all tests passing!
  • Request code review
  • Run security scan

Implementation Summary:

  • Added GROUP_ROW_NUMBER() function that numbers rows within groups
  • Groups are determined by changes in the first selected column's value only
  • Rows must be pre-sorted by the grouping column(s) using ORDER BY
  • Implementation uses post-processing similar to ROW_NUMBER()
  • ROW_NUMBER() continues to work for entire result sets
  • All tests passing (505 passing, 1 unrelated failure in test2112)

Current Limitation:

  • When using ORDER BY col1, col2, only col1 is used for grouping detection
  • Multi-column partitioning is not yet supported (would require PARTITION BY syntax or function parameters)
  • SQL-99 standard uses ROW_NUMBER() OVER (PARTITION BY col1, col2) syntax for multi-column partitioning

Usage Example:

-- Numbers restart when 'category' changes (first column)
SELECT category, amount, GROUP_ROW_NUMBER() AS rn 
FROM data 
ORDER BY category, amount;

-- Get top 2 per category
SELECT * FROM (
  SELECT category, amount, GROUP_ROW_NUMBER() AS rn 
  FROM data 
  ORDER BY category, amount DESC
) WHERE rn <= 2;

<issue_title>Implement per group row numbers rather than per result set row numbers</issue_title>
><issue_description>It can be very useful to be able to generate row numbers for each group, which lets you do things like return the first 1 or X rows per group, but the ROW_NUMBER() function in AlaSQL appears to return the row numbers not of the group but of the result set overall.</issue_description>
>
><agent_instructions>Please start by making a test file named test1126.js and demonstrate how we would like it to work, and then solve the issue.
>
> when updating the grammar in alasqlgrammar.jison you need to run yarn jison && yarn test
>
> always do a yarn format before committing. </agent_instructions>
>
> ## Comments on the Issue (you are @copilot in this section)
>
>
><comment_new>@mathiasrw
> This would change existing behavour. We might need to make a new name for it - like GROUP_ROW_NUMBER() - how does that sound?</comment_new>
><comment_new>@mathiasrw
> Not that I know of. But its a neat feature. Not too hard to implement for a new person :-D</comment_new>
><comment_new>@mathiasrw
> @jaengelberg Any chance you could come with an example that could work as the foundation for a test case?
>
> @biddwab09 Yes, I am pretty sure its only 424select.js - but I might be mistaken. Its a good place to start in any case :)</comment_new>
>
>

Original prompt

This section details on the original issue you should resolve

<issue_title>Implement per group row numbers rather than per result set row numbers</issue_title>
<issue_description>It can be very useful to be able to generate row numbers for each group, which lets you do things like return the first 1 or X rows per group, but the ROW_NUMBER() function in AlaSQL appears to return the row numbers not of the group but of the result set overall.</issue_description>

<agent_instructions>Please start by making a test file named test1126.js and demonstrate how we would like it to work, and then solve the issue.

when updating the grammar in alasqlgrammar.jison you need to run yarn jison && yarn test

always do a yarn format before committing. </agent_instructions>

Comments on the Issue (you are @copilot in this section)

@mathiasrw This would change existing behavour. We might need to make a new name for it - like GROUP_ROW_NUMBER() - how does that sound? @mathiasrw Not that I know of. But its a neat feature. Not too hard to implement for a new person :-D @mathiasrw @jaengelberg Any chance you could come with an example that could work as the foundation for a test case?

@biddwab09 Yes, I am pretty sure its only 424select.js - but I might be mistaken. Its a good place to start in any case :)</comment_new>


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI self-assigned this Nov 20, 2025
alasql('INSERT INTO test_data VALUES ("X", 1), ("X", 2), ("X", 3), ("Y", 10), ("Y", 20)');

var res = alasql(
'SELECT * FROM (SELECT category, amount, GROUP_ROW_NUMBER() AS rn FROM test_data ORDER BY category, amount) WHERE rn <= 2'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont we need to have a 'group by' to be able to test this? What is the SQL-99 spec saying?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation doesn't require GROUP BY. It numbers rows by detecting when the first selected column's value changes (requiring data to be pre-sorted with ORDER BY).

SQL-99 window functions use ROW_NUMBER() OVER (PARTITION BY col) syntax, but AlaSQL doesn't fully support OVER/PARTITION BY yet (test198.js shows it's in if(false) blocks).

Would you prefer:

  1. Keep current approach (simple, works with ORDER BY)
  2. Require GROUP BY clause in the query
  3. Wait for full OVER/PARTITION BY support

The current approach enables the main use case (get top N per group) without complex window function implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot

But how do you know when it will add an extra if you "order by x, y" like two things? is it then grouping across the two, so if x + y is different then it is another group?

What is the propper SQL-99 syntax?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current implementation only groups by the first column - so ORDER BY x, y groups by x alone. Test case 5 shows this: with ORDER BY dept, team, rows restart numbering only when dept changes, not when dept+team changes.

SQL-99 syntax is: ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY sort_col) - this partitions by the combination of col1+col2.

The current simple approach works for single-column grouping. For multi-column support, we'd need either:

  1. Parse PARTITION BY clause (requires grammar changes)
  2. Add a parameter like GROUP_ROW_NUMBER(col1, col2)
  3. Keep as single-column only (simplest)

Which approach would you prefer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot
Lets see if we can get a solid way to support the proper SQL-99 notation.

Copy link
Contributor Author

Copilot AI commented Nov 20, 2025

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • upload.wikimedia.org
    • Triggering command: /usr/local/bin/node node_modules/mocha/bin/mocha.js ./test --reporter dot --bail test/test1126.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI requested a review from mathiasrw November 20, 2025 05:30
Copilot finished work on behalf of mathiasrw November 20, 2025 05:30
Copilot finished work on behalf of mathiasrw November 20, 2025 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement per group row numbers rather than per result set row numbers

2 participants