-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cosmos: Deal with missing property values #13131
Comments
Another related aspect of this is shown in the following query: SELECT * FROM Customers c
WHERE c.Address.City = null This query will only return documents that have an Address in which the City set to null. If the whole address is null or missing in a document, that document will not be returned. Assuming that we agree that getting the documents in which the whole Address is null is the most expected behavior, we can easily compensate by rewritting the predicate to add "null protection": SELECT * FROM Customers c
WHERE c.Address = null OR c.Address.City = null But still, for any document in which the Address property is completely missing, there doesn't seem to be anything reasonable we can do to compensate. |
@smitpatel and I discussed this further. He found that we could expand null comparisons in predicates to use the IS_DEFINED() function in order to compensate for the current behavior. This would only help with WHERE, not with ORDER BY. However presumably having a predicate that filters a large number of documents based on IS_DEFINED() would be expensive (because it would need to scan all the documents). We are currently discussing with the Cosmos DB team to see if we get a recommendation or if these are behaviors they would consider changing in the future. |
Assigning to @divega to write conclusion. We may probably not do anything special about this. |
I am following up with the Cosmos DB team to close on some details. My understanding is that they have long term plans to extend Cosmos DB indexing, so that the behavior will be more predictable for ORDER BY (it will start returning rows for which the property is not defined, either at the end or at the begining of the query results), but not sure what they will do will help with WHERE directly, or whether it will make expanding In the meantime, I believe we should plan to implement a combination of options:
|
I got additional information from the Cosmos DB team:
This leave us with only two choices to make:
|
@divega to document triage decisions |
Triage decisions:
|
@divega is this already in the works? Or already implemented in some way? |
@NickSevens we didn't get to it in 3.0. As @AndriySvyryd mentioned we considered number 2 to be the highest priority. It would be great if you could give some details of when and why you need the method, to help us prioritize. I am clearing the milestone to discuss in triage when we this could fit. |
Sure thing @divega, I'm trying to get paged results, in which I'm ordering by an optional property. So essentialy I'm calling 2 queries: one ordering the data which has the property, one which doesn't have the property (essentialy adding all NULL and Undefined at the end). var orderedData = context.Items.OrderBy(i => i.MyProp).ToList();
var nulledData = context.Items.Where(i => !i.IsDefined());
orderedData.AddRange(nulledData); |
@AndriySvyryd I have split this into dotnet/EntityFramework.Docs#1712 and because this seems better for tracking purposes. Since we said we won't proactively add compensation for missing properties, I think we can close this as fixed now. Your thoughts? @NickSevens the new issue I created at #17722 should be what you want. Please consider voting for it. Notice that my attempt at a code snippet of how the API would be used looks a bit different from yours. Also, in a conversation with the Cosmos DB team last year, they told me that they were planning to make changes to indexing so that when you issue a SELECT query with something like an ORDER BY on a property, they would start including document instances in which the property is missing. I can follow up on my side, but have you tested this recently on Cosmos DB? |
I have, but that still doesn’t work. I believe NULL values are included, but undefined are not. |
Thanks @NickSevens. I have sent email to my contacts in Cosmos DB team to find out if there is an ETA for this being addressed. I agree this increases the priority of providing access to IS_DEFINED() in LINQ queries. |
Awesome. Thanks for looking into it @divega |
@divega We still need to implement decision 2, do you want to split it out as well? |
@AndriySvyryd, ok, I will. @NickSevens I got an answer that might be useful:
|
Closing as duplicate since we haven't done anything yet, but we now have separate issues tracking the actions we decided on. |
I get a bad request when I try to query the following using powershell However, I get the correct results when I query the same directly in the container |
@snehashankar Get-CosmosDbDocument is not owned by the EF team, file an issue at https://github.com/PlagueHO/CosmosDB |
Since Cosmos DB is schema-less database, it is possible to reference a property that is not defined in all documents in a query. For example, the following query:
Returns the following results for a collection that contains three documents that do not contain MissingProperty:
When EF Core models mapped to Cosmos DB evolve, we expect that it will be common to use a new version of an entity type that contains a property that is not defined in existing documented already stored in the database.
From the perspective of materialization, this could be dealt with by just skipping properties that are missing in the store. This would result in the properties on the objects to keep whatever value they were initialized to. For example, for optional properties, a missing value in the store would become equivalent to the property being null.
However there is an important caveat with this approach: because of how indexing works in Cosmos DB, queries that reference the missing property somewhere else than in the projection could return unexpected results. For example:
If a property that is missing in some documents is referenced in a predicate that tests it against null, only documents that contain the property will be returned
If a property that is missing in some documents is referenced in the sort expression or an ORDER BY clause, documents that contain the property with any value (including null) will be sorted, but documents that do not define the properties will be filtered out because they are not in the index used to resolve ORDER BY
Although for ORDER BY there is a way we could compensate by issuing two separate queries (the first one to get all the data, and the second one to get the order an potentially less data), it seems that this could be relatively expensive. This approach would not help for the WHERE clause case because it could require all the data from the collection to be retrieved.
But for WHERE we could use IS_DEFINED (see #13131 (comment)).
What we can do?
The alternatives I can see are:
The text was updated successfully, but these errors were encountered: