PG Adapter Presently Only Fakes Streaming #251

danfinlay · 2015-02-07T17:38:13Z

I was processing a pretty big dataset the other day, and naturally was feeling pretty grateful that Model supports streaming results, until der_On and I discovered:

The PG Adapter Only Fakes Streaming!

It inherits from EventedQueryProcessor, which inherits from an EventEmitter, and only fakes a Stream object, but without the critical back-pressure handling and memory-efficiency of actual streams!

You can see an example of how this is presently implemented here.

One good reason this may have been done originally, is that the core PG module does not support streams.

Fortunately, there is a module, node-pg-query-stream that does exactly what we need here.

I'm adding this issue to help guide efforts at adding proper streaming to the PG adapter, and am going to be taking a shot myself, although this will definitely be the deepest I've ever delved into Model, so no promises! (Only callbacks)

The text was updated successfully, but these errors were encountered:

mde · 2015-02-07T17:41:08Z

So excited to see this happening!

danfinlay · 2015-02-07T17:47:42Z

Does the two-year-old streams branch have tests that might be useful here?

mde · 2015-02-07T18:05:03Z

I believe the only streaming tests we have are here: https://github.com/geddy/model/blob/master/test/integration/adapters/streaming.js

danfinlay · 2015-02-07T18:05:36Z

Since that's the base sql adapter using the EventedQueryProcessor, it's looking to me like the mysql adapter isn't using using real streams either, even though the Mysql module natively supports streams.

One thing that seems a little tricky to me right now is how to swap out the individual database's streaming mechanisms.

I've also always wondered where exactly the SQL responses are converted into Model objects. Seems like it would be pretty cool to set up a through stream that converts the SQL responses into Model objects with no extra work...

danfinlay · 2015-02-07T20:14:55Z

Converting to real streams with the current architecture is a little strange, posting my thoughts here so others could chime in:

There is currently an innerProcessor that initially makes a minimal query to retrieve an array of IDs before querying for full models and associations. My current considerations center around whether this is necessary for a streaming adapter, and if so, how should it properly be implemented.

While a simple solution would be to make the initial query non-streaming, and then stream back the full results, I feel like if we're going to do streaming, we should do it really properly, and this means making no assumptions like "The array of IDs is a reasonable size to return without streaming and hold in memory".

One way would be for the initial query to stream back the IDs, in which case we have to decide how frequently to make the larger association requests.

My hope is that the initial query isn't necessary if using true streams. Curious for someone who is familiar with this pre-query to chime in on its purpose. I swear you could do most of these sql queries without a pre-query, but I may be missing something.

danfinlay · 2015-02-07T20:19:01Z

Oh I see now: JOINs don't work with limit/offset.

So basically we'd have to implement both ways:

Non-limiting/offsetting streaming requests can have a direct stream, but requests with limit or offset need IDs first, and so the question of "how often to query for subsequent results" remains.

It almost makes me think it could be a config variable or request option, how many records to fetch at a time when streaming with a limit or offset. That's a pretty obscure variable, but I wouldn't want to bury it as an assumption...

danfinlay · 2015-02-07T20:21:49Z

What if instead of making a pre-query, we made a subquery?

I'm pretty sure both Postgres and MySQL support this:

Instead of requesting IDs, then making a second request with an IN clause to that array, you simply include that first query inside the IN clause to begin with. There might be an extra step to select the id field from that returned table, if you had to request other values to filter/sort on.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PG Adapter Presently Only Fakes Streaming #251

PG Adapter Presently Only Fakes Streaming #251

danfinlay commented Feb 7, 2015

mde commented Feb 7, 2015

danfinlay commented Feb 7, 2015

mde commented Feb 7, 2015

danfinlay commented Feb 7, 2015

danfinlay commented Feb 7, 2015

danfinlay commented Feb 7, 2015

danfinlay commented Feb 7, 2015

PG Adapter Presently Only Fakes Streaming #251

PG Adapter Presently Only Fakes Streaming #251

Comments

danfinlay commented Feb 7, 2015

The PG Adapter Only Fakes Streaming!

mde commented Feb 7, 2015

danfinlay commented Feb 7, 2015

mde commented Feb 7, 2015

danfinlay commented Feb 7, 2015

danfinlay commented Feb 7, 2015

danfinlay commented Feb 7, 2015

danfinlay commented Feb 7, 2015