Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary *last column* in ORDER BY when joining for collection #19828

Closed
roji opened this issue Feb 7, 2020 · 39 comments · Fixed by #24360
Closed

Remove unnecessary *last column* in ORDER BY when joining for collection #19828

roji opened this issue Feb 7, 2020 · 39 comments · Fixed by #24360
Assignees
Labels
area-perf area-query breaking-change closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Milestone

Comments

@roji
Copy link
Member

roji commented Feb 7, 2020

When loading related one-to-many entities, EF adds ORDER BY clauses to make sure all related entities for a given entity are grouped together.

However, we can remove the last ORDER BY clause as it is unnecessary and causes more work (see #19571).

@PaulARoy
Copy link

FYI I have a repro case where I get a timeout (> 40s) retrieving a single line (amongst 10) solely due to this (removing the ORDER BY causes the query to fetch data instantly).

I have a few owned entities which have owned collections.

@LiangZugeng
Copy link

We encountered the same issue here, we're using Polemo MySQL provider, in our case, the auto generated ORDER BY of Include() forces MySQL to use temp table which had significant performance impact, the query usually take less than 1ms to finish without ORDER BY and it took 14s when ORDER BY applied, and we also caught several exceptions from MySQL reporting 'Can't change size of file (OS errno 28 - No space left on device)', turns out MySQL tries to allocate space for temp tables and the file exceeds 16MB which is the default settings on our MySQL server.

Please do take this seriously as ORDER BY is performed differently on different SQL products, we had to split the LINQ query into three separate queries to workaround it, but I'm not sure how this will affect other queries in the system, it's basically impossible to reproduce this in the testing environment without extensive load tests as the performance impact is closely related to the complexity of the query as well as the result set data volume. Hopefully the fix comes sooner.

Generated SQL:

SELECT `t`.`Id`, `t`.`ApprovedTime`, `t`.`Approver`, `t`.`ApproverComments`, `t`.`AutoCreatePO`, `t`.`CancelReason`, `t`.`ChargedAmount`, `t`.`CreatedTime`, `t`.`Creator`, `t`.`CustomerId`, `t`.`DeliveryDate`, `t`.`ImageFileInfo`, `t`.`IsInternalOrder`, `t`.`LastUpdatedTime`, `t`.`Notes`, `t`.`OrderNumber`, `t`.`OrderType`, `t`.`PackagingMethod`, `t`.`Priority`, `t`.`ProjectName`, `t`.`SalesOrderCompleted`, `t`.`SalesRep`, `t`.`Status`, `t`.`TenantId`, `t`.`TotalAmount`, `t`.`UniqueId`, `t1`.`Id`, `t1`.`Area`, `t1`.`AreaName`, `t1`.`ClientSalesOrderNumber`, `t1`.`ClientWorkOrderNumber`, `t1`.`CreatedTime`, `t1`.`Creator`, `t1`.`CuttingMachineNumber`, `t1`.`CuttingOperator`, `t1`.`ImageFileInfo`, `t1`.`InternalWorkOrderNumber`, `t1`.`LastUpdatedTime`, `t1`.`Length`, `t1`.`OrderPlacingDate`, `t1`.`PackingCaseId`, `t1`.`Position`, `t1`.`Price`, `t1`.`ProcessingDate`, `t1`.`SalesOrderId`, `t1`.`Status`, `t1`.`StockInOperator`, `t1`.`StockInTime`, `t1`.`StockOutOperator`, `t1`.`StockOutTime`, `t1`.`StockingAreaId`, `t1`.`TenantId`, `t1`.`TileNumber`, `t1`.`Width`, `t3`.`Id`, `t3`.`CaseNumber`, `t3`.`CreatedTime`, `t3`.`Creator`, `t3`.`LastUpdatedTime`, `t3`.`SalesOrderId`, `t3`.`Status`, `t3`.`StockInOperator`, `t3`.`StockInTime`, `t3`.`StockOutOperator`, `t3`.`StockOutTime`, `t3`.`StockingAreaId`, `t3`.`TenantId`, `t3`.`UniqueId`
FROM (
    SELECT `s`.`Id`, `s`.`ApprovedTime`, `s`.`Approver`, `s`.`ApproverComments`, `s`.`AutoCreatePO`, `s`.`CancelReason`, `s`.`ChargedAmount`, `s`.`CreatedTime`, `s`.`Creator`, `s`.`CustomerId`, `s`.`DeliveryDate`, `s`.`ImageFileInfo`, `s`.`IsInternalOrder`, `s`.`LastUpdatedTime`, `s`.`Notes`, `s`.`OrderNumber`, `s`.`OrderType`, `s`.`PackagingMethod`, `s`.`Priority`, `s`.`ProjectName`, `s`.`SalesOrderCompleted`, `s`.`SalesRep`, `s`.`Status`, `s`.`TenantId`, `s`.`TotalAmount`, `s`.`UniqueId`
    FROM `SalesOrders` AS `s`
    WHERE (`s`.`TenantId` = '57555980-54df-49ac-b8d5-701090c85420') AND (`s`.`Id` = 1620)
    LIMIT 2
) AS `t`
LEFT JOIN (
    SELECT `t0`.`Id`, `t0`.`Area`, `t0`.`AreaName`, `t0`.`ClientSalesOrderNumber`, `t0`.`ClientWorkOrderNumber`, `t0`.`CreatedTime`, `t0`.`Creator`, `t0`.`CuttingMachineNumber`, `t0`.`CuttingOperator`, `t0`.`ImageFileInfo`, `t0`.`InternalWorkOrderNumber`, `t0`.`LastUpdatedTime`, `t0`.`Length`, `t0`.`OrderPlacingDate`, `t0`.`PackingCaseId`, `t0`.`Position`, `t0`.`Price`, `t0`.`ProcessingDate`, `t0`.`SalesOrderId`, `t0`.`Status`, `t0`.`StockInOperator`, `t0`.`StockInTime`, `t0`.`StockOutOperator`, `t0`.`StockOutTime`, `t0`.`StockingAreaId`, `t0`.`TenantId`, `t0`.`TileNumber`, `t0`.`Width`
    FROM `Tiles` AS `t0`
    WHERE `t0`.`TenantId` = '57555980-54df-49ac-b8d5-701090c85420'
) AS `t1` ON `t`.`Id` = `t1`.`SalesOrderId`
LEFT JOIN (
    SELECT `t2`.`Id`, `t2`.`CaseNumber`, `t2`.`CreatedTime`, `t2`.`Creator`, `t2`.`LastUpdatedTime`, `t2`.`SalesOrderId`, `t2`.`Status`, `t2`.`StockInOperator`, `t2`.`StockInTime`, `t2`.`StockOutOperator`, `t2`.`StockOutTime`, `t2`.`StockingAreaId`, `t2`.`TenantId`, `t2`.`UniqueId`
    FROM `TilePackingCases` AS `t2`
    WHERE `t2`.`TenantId` = '57555980-54df-49ac-b8d5-701090c85420'
) AS `t3` ON `t`.`Id` = `t3`.`SalesOrderId`
ORDER BY `t`.`Id`, `t1`.`Id`, `t3`.`Id`

Execution Results Comparison (first 3 with ORDER BY on, last 3 without ORDER BY)

image

MySQL Execution Plan (note the red circle around ORDER)

image

@roji
Copy link
Member Author

roji commented Jun 16, 2020

@LiangZugeng @PaulARoy thanks for posting here - note that this is only about removing the last ORDER BY; previous ones are necessary in order to properly materialize results.

What could be helpful for us to evaluate the priority of this, is a runnable code sample which clearly shows that the last ORDER BY has a significant impact on perf.

@PaulARoy
Copy link

I'm sorry but it's been 2 month, we fixed it and switched to postgres in between.
The best way to reach it was to stack owned entities / owned collections.

The last order by was the only cause of perf downgrade.

@LiangZugeng
Copy link

I will spend some time to write some code to reproduce the perf degrade issue this weekend, I will be targeting MySQL as it’s where we found the issue. Will post a Github repo here later for this.

@grietine
Copy link

grietine commented Sep 2, 2020

What is the status about this issue?

@roji
Copy link
Member Author

roji commented Sep 2, 2020

@grietine the issue is in our backlog, which means it won't be part of the upcoming 5.0 release. We'll probably consider it for 6.0.

@dandenton

This comment has been minimized.

@roji

This comment has been minimized.

@roji

This comment has been minimized.

@dandenton

This comment has been minimized.

@roji

This comment has been minimized.

@rabberbock
Copy link

Our team is using Pomelo and MySql and this is causing real performance issues.

Are there any workarounds for the time being beyond just loading the related entities and join them manually?

@roji roji changed the title Remove unnecessary last ORDER BY when joining Remove unnecessary *last* ORDER BY when joining Nov 5, 2020
@roji
Copy link
Member Author

roji commented Nov 5, 2020

@rabberbock please note that this issue on only about removing the last ORDER BY when joining; if you're concerned about non-last ORDER BY clauses, see #19571 (and also consider trying out split queries).

If you're convinced you're seeing a perf issue because of the last ORDER BY, it would be useful to have a code sample that shows this - it would help us bump up the priority of this issue.

@rabberbock
Copy link

rabberbock commented Nov 5, 2020

@roji Thanks for the pointers! I can try to put something together. The gist of it is as follows:

The execution plan with the last ORDER BY is the following:

image

Without the last ORDER BY it's:

image

So with the last ORDER BY it uses tmp table, which in our case really degrades performance.

Without the last ORDER BY it runs in miliseconds, with the ORDER BY the same query takes a minute.

Here is a link to the MySql docs that explain when a tmp table is used. https://dev.mysql.com/doc/refman/8.0/en/internal-temporary-tables.html.

What would you like to see in a code sample beyond the above execution plans?

@roji
Copy link
Member Author

roji commented Nov 5, 2020

Thanks for this info @rabberbock! I think the above (and the link to the MySQL docs) should be sufficient IMHO.

@roji roji removed this from the Backlog milestone Nov 5, 2020
@ajcvickers ajcvickers added this to the 6.0.0 milestone Nov 6, 2020
roji added a commit to dotnet/EntityFramework.Docs that referenced this issue Aug 16, 2021
roji added a commit to dotnet/EntityFramework.Docs that referenced this issue Aug 16, 2021
roji added a commit to dotnet/EntityFramework.Docs that referenced this issue Aug 17, 2021
roji added a commit that referenced this issue Aug 18, 2021
roji added a commit that referenced this issue Aug 18, 2021
roji added a commit that referenced this issue Aug 19, 2021
@ajcvickers ajcvickers modified the milestones: 6.0.0-rc1, 6.0.0 Nov 8, 2021
@dariusonsched
Copy link

@mikhail-khalizev - Love your work-around, but it does not work for me. It is raising the following exception:

Expression of type 'MyNamespace.Extensions.RemoveOrderingExpression' cannot be used for parameter of type 'System.Linq.IQueryable1[MyNamespace.Entities.Company]' of method 'MyNamespace.Entities.Company FirstOrDefault[Company](System.Linq.IQueryable1[MyNamespace.Entities.Company])' (Parameter 'arg0')"

For a straight-forward query:

_dbContext.Set<Company>().AsQueryable().AsNoTracking().Include(c => c.Business).RemoveOrdering().FirstOrDefaultAsync()

Any thoughts? I copied your code 'as-is'...

@roji
Copy link
Member Author

roji commented Jun 4, 2022

@mikhail-khalizev @dariusonsched removing all orderings from a query as the above RemoveOrdering does (#19828 (comment)) is dangerous and could result in incorrect results. EF Core generates the orderings it needs in order to perform its job, interfering with that may have various undefined consequences.

@dariusonsched
Copy link

@roji thank you for the insight, though I must then not be understanding how this is/was resolved in EF 6 (we are still on EF 5).

@roji
Copy link
Member Author

roji commented Jun 4, 2022

@dariusonsched this issue tracked removing the last ORDER BY, which indeed isn't necessary. The others are necessary.

@stevendarby
Copy link
Contributor

stevendarby commented Jun 5, 2022

I would clarify further that - as the title now states - it’s about removing the last column in the ORDER BY and not the last ORDER BY as a whole (there could possibly be more than one set of ORDER BY statements in the whole query and the workaround in #19828 (comment) was predicated on this misunderstanding)

@vdolek
Copy link

vdolek commented Aug 16, 2022

Strange thing happens to me with this new feature. Surprisingly it significantly slows down my query.

Described here:
https://stackoverflow.com/questions/73375578/performance-downgrade-after-upgrading-ef-core-5-to-6

Does anybody know what may cause this?

@roji
Copy link
Member Author

roji commented Aug 16, 2022

@vdolek it's very unlikely that removing an ORDER BY makes a query slower... I'd first confirm this by executing the SQL directly against the database, outside of your EF Core application. This would remove other changes in your application, the .NET version, the EF version...

In any case, if you see a problem, please open a new issue with the full repro details.

@pantonis
Copy link

pantonis commented Aug 17, 2022

I m on EF 6.0.7 with Pomelo.EntityFrameworkCore.MySql 6.0.2 and the order by is still getting appended when doing an Order by. It causes huge performance degradation on my query. If I run the query without the order by it is super super fast. Adding it back causes slow perf. Do I have to do anything to remove the order by when running INNER and LEFT Join?

@vdolek
Copy link

vdolek commented Aug 17, 2022

@roji I have tried that and really when I run those two queries directly, the one without tle last order by clause has bad performance (5 min vs 10ms).

@roji
Copy link
Member Author

roji commented Aug 17, 2022

@pantonis this issue only removed the very last ordering (see #19571 for details), and only if it's added by EF for internal materialization purposes. If you're convinced you're seeing an unnecessary last ordering, please open a new issue with a full repro and we'll investigate.

@vdolek here as well, please post the full query, as well as a the database schema with data so we can repro this here. If you'd like to share that privately, you can send an email to the address on my github profile.

@mguinness
Copy link

@vdolek Can you provide the output using SET SHOWPLAN_TEXT ON; for both queries?

@roji
Copy link
Member Author

roji commented Aug 17, 2022

@vdolek and others, please open a new issue rather than continue posting here - that would be better.

@theFra985
Copy link

I'm still having issues with the default ORDER BY found at the end of the query when handling selections with a lot of inclusions (mainly for serving complex objects from a REST endpoint).
I took the @chazt3n workaround and optimized it a bit to keep the explicitly stated sorting queries working removing only the redundant primary keys sorting (I know it's not actually redundant as per MySQL documentation it's not guaranteed to be reliable without).

So, when using this workaround keep in mind that queries will NOT be idempotent.

/// <summary>
/// Workaround to avoid MySql out of memory on queries with bigger field lists<br/>
/// Inspired by <a href="https://github.com/dotnet/efcore/issues/19828#issuecomment-847222980">chazt3n workaround</a>
/// </summary>
/// <remarks>
/// The interceptor works on the assumption of the first field of the main SELECT statement being also the first field
/// in the ORDER BY statement after the explicitly requested OrderBy expressions
/// </remarks>
public class RemoveLastOrderByInterceptor : DbCommandInterceptor {
	public const string QueryTag = "RemoveLastOrderBy";

	public override ValueTask<InterceptionResult<DbDataReader>> ReaderExecutingAsync(
		DbCommand command, CommandEventData eventData, InterceptionResult<DbDataReader> result,
		CancellationToken token = new()) {
		try {
			TryApplyPatch(command);
		} catch (Exception ex) { // The exception handling can be different based on your requirements
			// _logger.LogError(ex, "Failed to intercept query.");
			Console.WriteLine(ex);
			throw; // Fails forcefully to avoid unexpected silent behaviours
		}

		return base.ReaderExecutingAsync(command, eventData, result, token);
	}

	private static bool TryApplyPatch(DbCommand command) {
		const string orderBy = "ORDER BY";
		const string select = "SELECT ";
		var query = command.CommandText;
		int idx;
		if (!query.StartsWith("-- "))
			// Check if the command is tagged
			return false;
		var separatorIdx = query.IndexOf("\n\n", StringComparison.Ordinal);
		if (separatorIdx < 0
		    || query.IndexOf(QueryTag, 0, separatorIdx + 1, StringComparison.Ordinal) < 0)
			// Efficiently checks if the tags block contains the required QueryTag
			return false;
		if ((idx = query.LastIndexOf(orderBy, StringComparison.Ordinal)) < 0)
			// The query doesn't have an ORDER BY statement
			return false;
		// Using StringBuilder to avoid string allocation issues
		// While using early versions of .NET Framework there would be buffer allocation exceptions so it's
		// necessary to remove part of the pre-allocated string before appending (or specify capacity explicitly)
		var cmd = new StringBuilder(query);
		// Identify first SELECT field
		var start = query.IndexOf(select, StringComparison.Ordinal);
		if (start >= 0) {
			var nextIdx = query.IndexOf(",", start, StringComparison.Ordinal);
			var fromIdx = query.IndexOf("FROM", start, StringComparison.Ordinal);
			// Support both selection with only one value and multi value selection
			var end = nextIdx < 0 ? fromIdx : Math.Min(nextIdx, fromIdx);
			var from = start + select.Length;
			// Assemble first selected field query
			var firstField = cmd.ToString(from, end - from);
			// Check if the ORDER BY starts with a different field than the first selected field and, in such
			// case, identifies the index where the "redundant" ORDER BY begins
			var orderStart = query.IndexOf(firstField, idx, StringComparison.Ordinal);
			if (orderStart > idx + orderBy.Length + 1)
				idx = orderStart - 2 /* Subtract 2 characters to take account of the trailing comma */;
		}

		// Cut ORDER BY statement or remove it entirely
		command.CommandText = cmd
			.Remove(idx, query.Length - idx)
			.Append(';')
			.ToString();
		return true;
	}
}

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-perf area-query breaking-change closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.