Reduce repeating parameters #28768

JeremyMahieu · 2022-08-17T23:59:14Z

When doing a query where multiple values are inserted, these values are turned into parameters and added multiple times, even if the values are the same.

It would be a shorter query, less network data and shorter logs if the parameters were not repeated. Potentially some performance savings at database side also. At the cost of checking if there parameter already exists.

roji · 2022-08-18T07:58:33Z

This is more complicated than it looks.

For example, two different parameters can have the same .NET value, but have a different facet on them (size, precision, scale); in that case we cannot deduplicate them as above. You can start comparing facets as well, but in #28769 you're also slowing down the general case for everyone, since every time a parameter is added you iterate through the entire parameter list to do the check; I don't think this makes sense just for the case where a parameter value is duplicated (that's generally rare). This could maybe be done more efficiently, but the complexity and overhead are probably not worth it; most parameter values are very short (number, short string).

One additional thing to keep in mind is that some providers don't support referencing the same parameter twice from SQL (e.g. ODBC uses ? as the parameter placeholder). This in itself isn't a blocker (we could disable this optimization there, and we probably don't support those scenarios well anyway), but it's yet more added complexity.

JeremyMahieu · 2022-08-18T11:14:55Z

Right. I thought this would be better as a postprocessing step. Right before making the final query, substituting parameter names by using the string version of the values that would en up in the query. But then what if someone relied on the parameter names.

So it's not significant enough of an optimisation to warrant the cost/complexity...

roji · 2022-08-18T11:26:31Z

One more point here, is that if someone really is sending duplicate parameter values a lot, they should probably reconsider their schema and possibly redesign it in a way which avoids that. In other words, the parameter deduplication only saves on the network bandwidth, but everything is still duplicated in the database itself. So I'm again not so sure this is a good idea.

JeremyMahieu · 2022-08-18T11:33:55Z

Where I noticed this for my case, these were foreign keys.
Also these could be inserts of multiple rows meaning you'd get queries with more parameters than there are colums. I don't know where the cutoff of using insert/update statements to modify many rows is but it could be a big percentrage of some types of queries.

Are there any standardized cases where we could measure how much of an impact this could have?

ajcvickers · 2022-08-18T11:44:35Z

Where I noticed this for my case, these were foreign keys.

@roji This might be interesting as a specific optimization in the update pipeline. All the FK values that are for the same column and come from relationship with the same principal (hence same value) could be a single parameter in the batch.

roji · 2022-08-18T11:51:44Z

This might be interesting as a specific optimization in the update pipeline. All the FK values that are for the same column and come from relationship with the same principal (hence same value) could be a single parameter in the batch.

That's an interesting idea - you're thinking of a scenario where we insert a principal with many dependents, right?

One problem is that this wouldn't work with the new batching API, where the same parameter set isn't shared across different SQL statements (each batch command has its own parameter set). The exception to that is SQL Server bulk insert mode, where we use a single command with MERGE, so that would still work even with the new batching API.

Also, foreign keys tend to be particularly small (typically an int, maybe a GUID), so the gains may not be worth it.

@JeremyMahieu

Also these could be inserts of multiple rows meaning you'd get queries with more parameters than there are colums. I don't know where the cutoff of using insert/update statements to modify many rows is but it could be a big percentrage of some types of queries.

Do you mean a scenario where you're inserting many rows in the same SaveChanges which largely have the same data in them? That seems like a very specific scenario...

In any case, I'm not sure about the relevance of the "cutoff of using insert/update statements to modify many rows... The SQL Server provider does use MERGE when inserting multiple rows (but not updating), but regardless of whether INSERT or MERGE is used, the parameters are essentially the same.

JeremyMahieu · 2022-08-18T12:03:08Z

That's an interesting idea - you're thinking of a scenario where we insert a principal with many dependents, right?

Yes or moving dependants from one principal to another. For example 20 students change to a new class at the end of a semester. 20 student rows would get their ClassId column changed, for this 20 duplicate parameters would be made.

INSERT or MERGE is used, the parameters are essentially the same.

Ah so large dataset changes would suffer from the same problem. Sorry I'm not too familiar with everything. Disregard the "cutoff" comment. I assumed large operations woudln't work with similar queries.

Another scenario where this might happen is in select statements where you use multiple Where(o => names.Contains(o.name)) filters. This names list would end up being serialized in a parameter, could potentially be very long and used multiple times.

JeremyMahieu · 2022-08-18T12:06:52Z

the same parameter set isn't shared across different SQL statements (each batch command has its own parameter set)

Use a temp parameter table to be used across the transaction. 🙃

roji · 2022-08-18T12:10:17Z

Another scenario where this might happen is in select statements where you use multiple Where(o => names.Contains(o.name)) filters. This names list would end up being serialized in a parameter, could potentially be very long and used multiple times.

Not sure I understand this one - where would this names list be used multiple times? Each query you execute is its own command (and round trip), with its own parameters; so you wouldn't be able to share the same parameter across multiple queries. In theory, when we enable query batching one day (either for the built-in split query scenario (#10878), or via a user-facing batchin API (#10879)), then that could become meaningful. But it would mean the batched queries have to again resend the same parameter, i.e. two batched queries which happen to do the same names.Contains(o.Name) with the same names parameter. That seems like an extremely narrow scenario, and as above, it would also be incompatible with the new ADO.NET batching API which we're planning to implement, where each command has its own parameter set.

JeremyMahieu · 2022-08-18T12:36:46Z

Also think about default values.
One real life example I have is an insert of 8 rows, with 21 parameters each. In total 168 parameter.
Most common values:

48 times: False
16 times: '0'
16 times: NULL (DbType = Guid)
8 times: NULL (DbType = Int32) (not sure if this could be grouped with the previous one)
8 times: 'MyRedactedEnum' (Nullable = false) (Size = 4000)
8 times a string that represents a barcode: 'abcdefg' (Size = 4000)
8 times some float: 1.510000228881836
8 time a default date: 0001-01-01T00:00:00.0000000
8 times the same guid
The rest is unique data
So these 128 parameters could be reduced to 8 or 9 parameters. Would more than halve the query length.

ErikEJ · 2022-08-18T12:58:46Z

@JeremyMahieu Have you measured the impact on performance (via ADO.NET) ?

JeremyMahieu · 2022-08-18T13:02:57Z

@JeremyMahieu Have you measured the impact on performance (via ADO.NET) ?

No nothing of this nature. Could I use the benchmarks that are in the EFCore project for this? I'd like some sort of standardized scenario.

JeremyMahieu · 2022-08-18T13:07:03Z

where would this names list be used multiple times?

You're right, I have never done this, would be vary rare/specific.

roji · 2022-08-18T13:12:05Z

Could I use the benchmarks that are in the EFCore project for this? I'd like some sort of standardized scenario.

Not really - this is a very specific scenario, none of our benchmarks specifically check for repeated parameter values. But it should be trivial to do an ADO.NET benchmark (best to leave EF out of it) with BenchmarkDotNet, which measures the actual impact on duplicated vs. deduplicated parameters. I suspect you won't get a very meaningful difference.

stevendarby · 2022-08-23T22:07:56Z

Sending the additional parameters probably doesn't have a significant performance impact but the number of rows inserted per batch does have an impact on performance, and the maximum you can insert per batch is limited by the number of parameters you can include in one command.

The example above of 21 parameters per row means inserting a max of ~100 rows at a time. It's possible a larger batch would be more optimal, if a reduced number of parameters allowed it.

roji · 2022-08-23T22:14:17Z

@stevendarby that's theoretically true, though typically the 2100 parameter limit is rarely an actual limiting factor for batching.

The long-term plan for the 2100 parameter limit is to switch to using the new ADO.NET batching API (#18990); this would mean that instead of a single multistatement command, where all statements share a single parameter collection (as today), we'd batch together multiple commands, each with its own parameter collection. This would mean that the 2100 parameter limit will apply to a single statement, rather than to the entire batch. Unfortunately, for this to work, both EF and SqlClient would have to implement the batching API, which will likely take some time.

In any case, trying to increase batch sizes by deduplicating parameter values would only work for the very specific cases where values are actually duplicated. While I'm sure this would possibly help some specific scenarios, it overall seems like a questionable reason to go into parameter deduplication, which has complexity and performance overhead costs that everyone would need to pay.

stevendarby · 2022-08-23T22:21:21Z

Sure, it's a consideration for @JeremyMahieu's benchmarking if they do end up doing that.

stevendarby · 2022-08-23T22:30:37Z

FWIW there is a related issue for reducing parameters in queries - #24476. This issue seems distinct from it, but just mentioning in case it's worth considering them together.

ajcvickers · 2022-08-30T10:37:42Z

Note from triage: we don't plan to do anything here as of now. Compelling perf analysis could change that.

JeremyMahieu added customer-reported type-enhancement labels Aug 17, 2022

JeremyMahieu mentioned this issue Aug 18, 2022

Reduce repeating parameters #28769

Closed

6 tasks

ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Aug 30, 2022

ajcvickers added closed-no-further-action The issue is closed and no further action is planned. and removed type-enhancement labels Aug 30, 2022

roji mentioned this issue Nov 15, 2022

EFCore 7.0 Bug - Invalid Operation Exception RE "Temporary Value" #29570

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce repeating parameters #28768

Reduce repeating parameters #28768

JeremyMahieu commented Aug 17, 2022 •

edited

Loading

roji commented Aug 18, 2022

JeremyMahieu commented Aug 18, 2022

roji commented Aug 18, 2022

JeremyMahieu commented Aug 18, 2022 •

edited

Loading

ajcvickers commented Aug 18, 2022

roji commented Aug 18, 2022

JeremyMahieu commented Aug 18, 2022 •

edited

Loading

JeremyMahieu commented Aug 18, 2022

roji commented Aug 18, 2022 •

edited

Loading

JeremyMahieu commented Aug 18, 2022 •

edited

Loading

ErikEJ commented Aug 18, 2022

JeremyMahieu commented Aug 18, 2022 •

edited

Loading

JeremyMahieu commented Aug 18, 2022

roji commented Aug 18, 2022

stevendarby commented Aug 23, 2022

roji commented Aug 23, 2022

stevendarby commented Aug 23, 2022 •

edited

Loading

stevendarby commented Aug 23, 2022

ajcvickers commented Aug 30, 2022

Reduce repeating parameters #28768

Reduce repeating parameters #28768

Comments

JeremyMahieu commented Aug 17, 2022 • edited Loading

roji commented Aug 18, 2022

JeremyMahieu commented Aug 18, 2022

roji commented Aug 18, 2022

JeremyMahieu commented Aug 18, 2022 • edited Loading

ajcvickers commented Aug 18, 2022

roji commented Aug 18, 2022

JeremyMahieu commented Aug 18, 2022 • edited Loading

JeremyMahieu commented Aug 18, 2022

roji commented Aug 18, 2022 • edited Loading

JeremyMahieu commented Aug 18, 2022 • edited Loading

ErikEJ commented Aug 18, 2022

JeremyMahieu commented Aug 18, 2022 • edited Loading

JeremyMahieu commented Aug 18, 2022

roji commented Aug 18, 2022

stevendarby commented Aug 23, 2022

roji commented Aug 23, 2022

stevendarby commented Aug 23, 2022 • edited Loading

stevendarby commented Aug 23, 2022

ajcvickers commented Aug 30, 2022

JeremyMahieu commented Aug 17, 2022 •

edited

Loading

JeremyMahieu commented Aug 18, 2022 •

edited

Loading

JeremyMahieu commented Aug 18, 2022 •

edited

Loading

roji commented Aug 18, 2022 •

edited

Loading

JeremyMahieu commented Aug 18, 2022 •

edited

Loading

JeremyMahieu commented Aug 18, 2022 •

edited

Loading

stevendarby commented Aug 23, 2022 •

edited

Loading