Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to return an iterator on rows #3631

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

viewsharp
Copy link

@viewsharp viewsharp commented Oct 3, 2024

Example of generated query:

const iterCities = `-- name: IterCities :iter
SELECT slug, name
FROM city
ORDER BY name
`

func (q *Queries) IterCities(ctx context.Context) IterCitiesRows {
	rows, err := q.query(ctx, q.iterCitiesStmt, iterCities)
	if err != nil {
		return IterCitiesRows{err: err}
	}
	return IterCitiesRows{rows: rows}
}

type IterCitiesRows struct {
	rows *sql.Rows
	err  error
}

func (r *IterCitiesRows) Iterate() iter.Seq[City] {
	if r.rows == nil {
		return func(yield func(City) bool) {}
	}

	return func(yield func(City) bool) {
		defer r.rows.Close()

		for r.rows.Next() {
			var i City
			err := r.rows.Scan(&i.Slug, &i.Name)
			if err != nil {
				r.err = err
				return
			}

			if !yield(i) {
				r.err = r.rows.Close()
				return
			}
		}
	}
}

func (r *IterCitiesRows) Close() error {
	return r.rows.Close()
}

func (r *IterCitiesRows) Err() error {
	if r.err != nil {
		return r.err
	}
	return r.rows.Err()
}

Example of use:

func example(ctx context.Context, q *Queries) {
	rows := q.IterCities(ctx)
	for city := range rows.Iterate() {
		// do something with city
	}
	if err := rows.Err(); err != nil {
		// handle error
	}
}

UPD: updated the code according to the latest changes

@kyleconroy
Copy link
Collaborator

Thank you so much for the first pass on iterator support. Could you update your PR description with an example of how to use the Iterate() method? I looked over the changes and didn't see an end-to-end test of the new method.

@kyleconroy
Copy link
Collaborator

Fixes #720

@viewsharp
Copy link
Author

@kyleconroy
I've placed end-to-end tests here https://github.com/sqlc-dev/sqlc/pull/3631/files#diff-ef71fe84cdfc32f51a1cf70a80a17506a422b5ccc42e714a56a6a2334e6a4bc5
Do you need some other kind of tests? Please give an example

@viewsharp
Copy link
Author

@kyleconroy
What do you think about the Close method? I still haven't decided if it's necessary.

@viewsharp
Copy link
Author

I found some example tests here https://github.com/sqlc-dev/sqlc/tree/main/examples and added iterator tests there

@viewsharp
Copy link
Author

@kyleconroy, hi
I updated my pull request, check it please

@PatrLind
Copy link

PatrLind commented Nov 5, 2024

I see that the stdlib sometimes uses the suffix Seq() to indicate methods that returns iterators. I suggest to think a bit more if Iterate() is the appropriate name for this functionality.
I'm not saying it should be changed to Seq(), but I just want to make sure that some more thought gets put into the naming selection.

@gbarr
Copy link

gbarr commented Nov 8, 2024

I see that the stdlib sometimes uses the suffix Seq() to indicate methods that returns iterators. I suggest to think a bit more if Iterate() is the appropriate name for this functionality. I'm not saying it should be changed to Seq(), but I just want to make sure that some more thought gets put into the naming selection.

iter.Seq and iter.Seq2 are the names of the generic types for iterators, but methods or functions in the standard library that return iterators have many different names. So I think Iterate would be a good method name.

@gbarr
Copy link

gbarr commented Nov 8, 2024

@kyleconroy What do you think about the Close method? I still haven't decided if it's necessary.

I do not think the Close method is needed, but the func returned by Iterate needs to have defer r.rows.Close() added. Without it a recovered panic would leave the connection in the middle of a fetch

@viewsharp
Copy link
Author

I do not think the Close method is needed, but the func returned by Iterate needs to have defer r.rows.Close() added. Without it a recovered panic would leave the connection in the middle of a fetch

@gbarr, thanks for your comment

  1. I left Close() because it might be needed if you call a method like IterCities and don't call Iterate(). I can't imagine why it might be needed, but I don't see the point in disallowing this behavior
  2. I added defer r.rows.Close() as per your recommendation

@diamondburned
Copy link

diamondburned commented Nov 19, 2024

Instead of having an Err() method, should IterCities just be something like this?

IterCities(ctx context.Context) (seq iter.Seq2[City, error], stop func())

Usage would be:

func example(ctx context.Context, q *Queries) error {
	rows, stop := q.IterCities(ctx)
	defer stop()
	
	for city, err := range rows {
		if err != nil {
			return err
		}

		// do something with city
	}
	
	return nil
}

I think this is a bit cleaner than the above generated code, saving a whole new type for the iterator.

@viewsharp
Copy link
Author

Instead of having an Err() method, should IterCities just be something like this?

IterCities(ctx context.Context) (seq iter.Seq2[City, error], stop func())

I thought about such an interface, but I'm afraid that if the user uses range with one variable, the behavior will not match the expected one

@viewsharp viewsharp closed this Nov 19, 2024
@viewsharp viewsharp reopened this Nov 19, 2024
@kyleconroy
Copy link
Collaborator

kyleconroy commented Nov 26, 2024

I found these in the release notes for Go 1.24. The bytes package adds several functions that work with iterators:

  • Lines returns an iterator over the newline-terminated lines in the byte slice s.
  • SplitSeq returns an iterator over all substrings of s separated by sep.
  • SplitAfterSeq returns an iterator over substrings of s split after each instance of sep.
  • FieldsSeq returns an iterator over substrings of s split around runs of whitespace characters, as defined by unicode.IsSpace.
  • FieldsFuncSeq returns an iterator over substrings of s split around runs of Unicode code points satisfying f(c).

I don't like the Seq suffixes, but I do understand why they picked them.

@viewsharp
Copy link
Author

I found these in the release notes for Go 1.24. The bytes package adds several functions that work with iterators

Following this example, it seems more appropriate to rename Iterate() to Rows() / Items() / Values(). Because the Seq suffix is ​​used when there was already a Foo() or FooBar() function, and we want a similar FooSec() function that returns a sequence

@gbarr
Copy link

gbarr commented Nov 26, 2024

I found these in the release notes for Go 1.24. The bytes package adds several functions that work with iterators

Following this example, it seems more appropriate to rename Iterate() to Rows() / Items() / Values(). Because the Seq suffix is ​​used when there was already a Foo() or FooBar() function, and we want a similar FooSec() function that returns a sequence

I agree that the Seq suffix is only being used when an iterator implementation of an existing function is being added.

My preference would be for Iterate() to be renamed as Rows(). Second preference of Items(). Values() to me implies each iteration returns a single value and not a collection of values.

@viewsharp
Copy link
Author

I would prefer Items().
Rows() is associated with sql.Rows and can be treated as raw data.

@benjaco
Copy link

benjaco commented Dec 1, 2024

It's great to see iterators coming to sqlc. Adding memory pooling as an option on top of this would make this truly a game-changer for large result sets. I would imagine saving the pool in a public variable, and let the callside have the responsibility to put the object back. Thoughts? Maybe out of scope for this one?

@MrBanja
Copy link

MrBanja commented Dec 20, 2024

I'm up for this one. The only thing that blocks us from migrating from pure SQLx is the absence of an iterator.

@kyleconroy
Copy link
Collaborator

kyleconroy commented Dec 20, 2024

Not to blow up our existing design, but I just read this post (https://blog.thibaut-rousseau.com/blog/writing-testing-a-paginated-api-iterator/) which makes me think we may want to consider the following:

func example(ctx context.Context, q *Queries) {
	for city, err := range rows.IterCities(ctx) {
		// do something with city
	}
}

The upside to this design is that it's difficult to use incorrectly. We don't have to decide on a name for the Iter function, since it's removed. You also can't forget to call Close() on the underlying rows resource.

@gbarr
Copy link

gbarr commented Dec 20, 2024

The problem with using Seq2 that returns err is that if the code breaks out of the loop, any error returned from the call r.Close() is lost as there is no way to get it

@sgielen
Copy link

sgielen commented Dec 22, 2024

Just to add my 2cts to the discussion: it seems weird to me to have the error, and have to check it, in every loop iteration. That’s an advantage of the initial proposal.

Since we can start the query in the first iteration, and break inside the loop, we don’t need Close. We can use Err() to determine if the query or any of the row fetches failed.

I personally prefer:

func example(ctx context.Context, q *Queries) error {
	cities := q.IterCities(ctx)
	for city := range cities.Rows() {	
		// do something with city
	}
	return cities.Err()
}

Over:

func example(ctx context.Context, q *Queries) error {
	for city, err := range q.IterCities(ctx) {
		if err != nil {
			return err
		}
		
		// do something with city
	}
}

Simply because of the lack of err clutter inside the loop.

This does assume that:

  • when you call IterCities(), we don’t perform the query yet, so just in case you never call Iterate() we don’t have a dangling query. (Also the added benefit that IterCities doesn’t return an err itself, which is something pgx regrets in their API.)
  • Iterate() performs the query and starts streaming the results. If anything fails, it sets the Err() result and returns. After the last row we close the query, set the Err() result to nil and return.
  • If the loop body breaks (the yield func returns false) we close the query and set Err() to the result of that close operation, then return.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants