-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index with booleans #2994
Index with booleans #2994
Conversation
Included changes from develop
Thanks so much for the PR submission! Some comments/questions, not meant to be exhaustive at this point: (a) I thought we decided in the discussion in #2735 that if A = [1, 2, 3, 4] then A[[true, false, true]] is a dimension error, but the comment above seems to say this PR does not complain and returns [1,3]. Why not adhere to the recommended design? (b) Reviewing this PR revealed the pre-existing oddity that if B = [1,2;3,4] then (in the parser, easier to test) B[2,2] = 4 and B[2,[2,1]] = [[4,3]] (both pretty much as expected) but B[2,[2]] = 4 . I would expect the latter to produce [[4]]. (Or quite possibly B(2,[2,1]) to produce [4,3] and B(2,[2]) to produce [4], while B([2],[2,1]) produced [[4,3]].) The current behavior seems to prevent certain dimensionality invariants (based on the dimensions of the matrix being indexed and the shape of the indices) from holding true. I bring these up because they produce the very surprising behavior that B([false, true], [true, true])) is [[3,4]] while B([[false,true]],[[true, false]]) is 3. It seems like a problem to me that the type of the result differs for indices that are identical in type and shape and differ only in the value of a single boolean inside the indices. Hence, should I file the indexing oddity of B[2,[2,1]] vs B[2,[2]] as an issue? And should the boolean indexing wait until that newly surfaced issue is resolved? Or another possibility is to leave B(2,[2]) alone but still have B([false,true],[false,true]) return [[4]]. Thoughts, @josdejong ? |
Thanks for the review Glen! (a)I was testing different methods to accomplish this but in the end I couldn't make it work thinking it would introduce breaking changes. The issue I couldn't resolve is.
In the end I thought to present this PR that is comparable to what Octave does even though is not as agreed (b)I think this behavior is related to #2344, it might make some cases simpler but in this case it's very odd. I think that behavior is as intended but let's wait for Jos's comments. In the context of using logic inside of index it's not that odd.
Maybe a proposal could be that if the index comes from an array, then the result should match the source regardless if it has singleton dimensions.
I think these issues could be addressed independently Just as an example this is what the parser does with an array of booleans.
|
On (a), I guessed as much, but it seems like it's a case of implementation pushing behavior around. If the behavior desired is the one agreed on in #2735, then it seems like we need to find an implementation that supports that. For example, it could be that index objects are enhanced to allow vectors of booleans within them, and the "index" function just passes such specifiers through, so that subset can have all of the information and deal with it. There may be other possible implementations that would yield the desired behavior, just brainstorming. On (b),
On the contrary, I find this extremely odd: depending on what the exact condition is, the result might be a vector or a number? That seems like it will be painful, since I will have to then write the expressions that use the result to handle either. It would seem much preferable that whatever the condition is, it should produce an array/vector as a result, that might have just one (or zero) entries.
I agree that the existing shape-of-indexing-results oddity is the underlying problem, but if it were up to me, I'd first resolve that concern (in a separate issue/PR) before merging this boolean-indexing stuff, because of the very counterintuitive results it produces in the Boolean case. There are various options on how to modify the shapes of indexing results, so I would also suggest first an issue or a discussion to hash out the desired behavior, before a PR addressing the point. But let's see how @josdejong wants to go -- another position might be to leave the shape of results of numeric indexing alone, and only fix the vector-vs-number inconsistency for Boolean indexing. That's not what I would do, but it's certainly possible to implement. |
Fair points I agree that the flexibility of showing either a number or an array depending on the case is odd for a program. I was thinking that in a different context wouldn't be that odd. Let's imagine a stack of paper in a box and I try to get them according to an index. If many pages are found they are returned in a manilla envelope, but when only one page is found there is no envelope, I might be glad to see the found page rather than complain about the missing envelope. both a) and b) are consistent with the way numpy indexes, which would be really nice to have in mathjs, but might need more core changes. In fact the other proposed change that an empty index gets you an empty array makes it even more odd that:
As if a set of many pages is presented in an envelope, a single page is without the envelope and no pages found gets you only the empty envelope. So I mostly agree with you, at some point I thought about changing the behavior of index further as you mentioned but thought it might be unwelcome as it wasn't agreed upon. In fact I wasn't sure about returning an empty index (currently it just throws an error) I see some partial functionality that can be used in the meantime something better comes along, but as you mentioned, let's wait for Jos to weigh in. As a reference, the current implementation of broadcasting is missing some optimization as making something like numpy: nditer would require some core changes in the matrix algorithm suite. Currently it is using the mathjs functions to make copies of matrices (requiring more memory) but it's actually working in the meantime an optimized solution can be implemented. |
Thanks David! The logic behind the behavior of the function I do not really have a strong opinion about how For the new filtering behavior it makes sense to me to always return a Matrix/Array, and not unwrap the output when it contains only a single value. |
Hi Jos, thanks for the review Only as a reference, It works like this by default:
And now it includes this behavior with
|
Thanks, that is clear. Whilst there is a clear use case to let |
Here is a possible case for that.
For me it's the same logic as with regular indexing. If the index is a calculated variable, it's uncertain if the index will contain one or many numbers. Of course B could have various values equal to 5, so it's not predictable if the indexing with booleans will return one or many numbers. In the references I know, it usually returns an array (the same as regular indexing in those references). So I don't know if it will also be confusing if the behavior of indexing is different if it comes from an array of booleans. |
Hm, yeah that is true, if you have a variable in I just find it odd to have something like |
Personally I wouldn't. I would prefer a predictable way as it can be used for other steps. I will look into a way for an array of booleans to always return predictable outcomes regarding of config. If there is some confusion later regarding regular indexing we can take a look into it. I think this is an in between state at some point it will need to be able to do something like.
Currently this throws a dimensional error but we will need to take a look at some point regarding broadcasting the 6.
|
In this version:
This behavior of returning a number when indexing with a single number or an array with a single element when indexing with an array of booleans is equivalent to numpy's way of indexing. One difference is that indexing with booleans returns the same number of dimensions as the source but numpy reduces the other dimensions. |
Hi Glen and Jos, I'm really glad you are liking the results so far. Thanks for the comments and review! I agree these are a lot of changes at once, I wasn't planning for this. So thanks for the effort of reviewing this many changes at once and being so understanding of this whole process. |
Hi, I think this latest commit covers most of the issues. Please let me know if something is missing. |
Thanks David for adding even more tests. Last question: do we keep the |
Sorry my bad, I was certain I removed that as soon as you asked, either I messed something up with my setup or removed some of that and forgot about other places I think it's better now. |
OK I took a look through, specific comments above. In general, I think this now leaves prior numeric/numeric array indexing alone, which I think is appropriate, and has the agreed-upon behavior for boolean indexing. So I have no objections to its merge. I will comment a bit on #2344 |
Good catch on this latest fix to your PR. Should you add a test that would have exercised the problem? |
Thanks, Glen Ok, I will include a test where an array to broadcast to a size that is equal to the array will broadcast correctly (and not skip the broadcast process). I was concerned it might be too specific. |
Thanks! IMHO no test for something that actually went awry at any point is too specific. |
Makes sense, I'll keep that in mind, thanks! |
Thanks for reviewing Glen. @dvd101x there is still one open comment from Glen about code duplication. Can you reply on that? Besides that I think the PR is ready to be merged :). |
Thanks Jos, I think Glen's comment about code duplication is addressed with the latest commit. If no other topic is unresolved, I think this is ready to be merged. |
Yup, that's the sort of thing I had in mind, thanks. I have no objections to this being merged. |
Awesome 🎉. Thanks for your patience David! Going to merge your PR now. |
Published now in |
Hi, regarding #2735 here is a way to do indexing with an array of booleans.
new capabilities
Main changes:
index
Checks if it's getting an array of only booleans and converts such arguments to a corresponding vector of numeric indices (comparable to octave and numpy)
index([true, false, true])
is likeindex([1, 0, 1])
index([true, false, true])
works like:index([0, 2])
index([1, 3])
in the parsersubset
Can use an empty
index
(comparable to octave and numpy)subset(value, index)
returns an empty value of the same type if it has an empty indexsubset(value, index, replacement)
, returns the value unmodified if it has an empty index