Index with booleans #2994

dvd101x · 2023-07-09T17:24:11Z

Hi, regarding #2735 here is a way to do indexing with an array of booleans.

new capabilities

a = [1, 2, 3, 4];
a[[true, false, true, false]]   # yields [1, 3]
a[a > 2]                        # yields [3, 4]

b = [1, 2, 4, 8]; 
a[b < 4]                        # yields [1, 2]

a[a > 2 and a < 4]              # yields 3
a[a > 2 and b < 8]              # yields 3
a[a > 2 and b < 3]              # yields []

A = [1, 2; 3, 4];
A[[false, true], 2]             # yields 4

Main changes:

`index`

Checks if it's getting an array of only booleans and converts such arguments to a corresponding vector of numeric indices (comparable to octave and numpy)

currently index([true, false, true]) is like index([1, 0, 1])
the proposal is that index([true, false, true]) works like:
- index([0, 2])
- index([1, 3]) in the parser

`subset`

Can use an empty index (comparable to octave and numpy)

subset(value, index) returns an empty value of the same type if it has an empty index
subset(value, index, replacement), returns the value unmodified if it has an empty index

Note:
it doesn't check if the size of the array of booleans matches the dimension of the value in subset (it's similar to octave, but unlike numpy).

Included changes from develop

gwhitney · 2023-07-10T17:59:39Z

Thanks so much for the PR submission!

Some comments/questions, not meant to be exhaustive at this point:

(a) I thought we decided in the discussion in #2735 that if A = [1, 2, 3, 4] then A[[true, false, true]] is a dimension error, but the comment above seems to say this PR does not complain and returns [1,3]. Why not adhere to the recommended design?

(b) Reviewing this PR revealed the pre-existing oddity that if B = [1,2;3,4] then (in the parser, easier to test) B[2,2] = 4 and B[2,[2,1]] = [[4,3]] (both pretty much as expected) but B[2,[2]] = 4 . I would expect the latter to produce [[4]]. (Or quite possibly B(2,[2,1]) to produce [4,3] and B(2,[2]) to produce [4], while B([2],[2,1]) produced [[4,3]].) The current behavior seems to prevent certain dimensionality invariants (based on the dimensions of the matrix being indexed and the shape of the indices) from holding true. I bring these up because they produce the very surprising behavior that B([false, true], [true, true])) is [[3,4]] while B([[false,true]],[[true, false]]) is 3. It seems like a problem to me that the type of the result differs for indices that are identical in type and shape and differ only in the value of a single boolean inside the indices. Hence, should I file the indexing oddity of B[2,[2,1]] vs B[2,[2]] as an issue? And should the boolean indexing wait until that newly surfaced issue is resolved? Or another possibility is to leave B(2,[2]) alone but still have B([false,true],[false,true]) return [[4]]. Thoughts, @josdejong ?

dvd101x · 2023-07-10T19:51:07Z

Thanks for the review Glen!

(a)

I was testing different methods to accomplish this but in the end I couldn't make it work thinking it would introduce breaking changes. The issue I couldn't resolve is.

subset hast the information of the size of the matrix and receives index (not the array of booleans). So it doesn't know what was the size of the array of booleans that generated the index.
index is the one that receives the array (and checks if it contains only booleans) but it doesn't know what is the size of the matrix it will be used in.

In the end I thought to present this PR that is comparable to what Octave does even though is not as agreed

(b)

I think this behavior is related to #2344, it might make some cases simpler but in this case it's very odd. I think that behavior is as intended but let's wait for Jos's comments.

In the context of using logic inside of index it's not that odd.

A = [10, 20, 30, 40];
A[A > 20] # yields [30, 40]
A[A < 20] # yields 10

Maybe a proposal could be that if the index comes from an array, then the result should match the source regardless if it has singleton dimensions.

A = [1, 2; 3, 4]
A[1, 2]          # shall return number 2
A[[1], 2]        # shall return [2]
A[1, [2]]        # shall return [2]
A[[1], [2]]      # shall return [2]

I think these issues could be addressed independently

Just as an example this is what the parser does with an array of booleans.

B = [10, 20, 30];
B[[true]]             # yields 10
B[[true, true]]       # yields [10, 10]
B[[true, true, true]] # yields [10, 10, 10]
B[B > 5]              # yields [10, 10, 10]

gwhitney · 2023-07-10T23:21:25Z

On (a), I guessed as much, but it seems like it's a case of implementation pushing behavior around. If the behavior desired is the one agreed on in #2735, then it seems like we need to find an implementation that supports that. For example, it could be that index objects are enhanced to allow vectors of booleans within them, and the "index" function just passes such specifiers through, so that subset can have all of the information and deal with it. There may be other possible implementations that would yield the desired behavior, just brainstorming.

On (b),

In the context of using logic inside of index it's not that odd.
A = [10, 20, 30, 40];
A[A > 20] # yields [30, 40]
A[A < 20] # yields 10

On the contrary, I find this extremely odd: depending on what the exact condition is, the result might be a vector or a number? That seems like it will be painful, since I will have to then write the expressions that use the result to handle either. It would seem much preferable that whatever the condition is, it should produce an array/vector as a result, that might have just one (or zero) entries.

Maybe a proposal could be that if the index comes from an array, then the result should match the source regardless if it has singleton dimensions.
A = [1, 2; 3, 4]
A[1, 2]          # shall return number 2
A[[1], 2]        # shall return [2]
A[1, [2]]        # shall return [2]
A[[1], [2]]      # shall return [2]
I think these issues could be addressed independently

I agree that the existing shape-of-indexing-results oddity is the underlying problem, but if it were up to me, I'd first resolve that concern (in a separate issue/PR) before merging this boolean-indexing stuff, because of the very counterintuitive results it produces in the Boolean case. There are various options on how to modify the shapes of indexing results, so I would also suggest first an issue or a discussion to hash out the desired behavior, before a PR addressing the point. But let's see how @josdejong wants to go -- another position might be to leave the shape of results of numeric indexing alone, and only fix the vector-vs-number inconsistency for Boolean indexing. That's not what I would do, but it's certainly possible to implement.

dvd101x · 2023-07-11T01:02:22Z

Fair points

I agree that the flexibility of showing either a number or an array depending on the case is odd for a program.

I was thinking that in a different context wouldn't be that odd. Let's imagine a stack of paper in a box and I try to get them according to an index. If many pages are found they are returned in a manilla envelope, but when only one page is found there is no envelope, I might be glad to see the found page rather than complain about the missing envelope.

both a) and b) are consistent with the way numpy indexes, which would be really nice to have in mathjs, but might need more core changes.

In fact the other proposed change that an empty index gets you an empty array makes it even more odd that:

A = [10, 20, 30, 40];
A[A > 20] # yields [30, 40]
A[A < 20] # yields 10
A[A > 50] # yields [ ]

As if a set of many pages is presented in an envelope, a single page is without the envelope and no pages found gets you only the empty envelope.

So I mostly agree with you, at some point I thought about changing the behavior of index further as you mentioned but thought it might be unwelcome as it wasn't agreed upon. In fact I wasn't sure about returning an empty index (currently it just throws an error)

I see some partial functionality that can be used in the meantime something better comes along, but as you mentioned, let's wait for Jos to weigh in.

As a reference, the current implementation of broadcasting is missing some optimization as making something like numpy: nditer would require some core changes in the matrix algorithm suite. Currently it is using the mathjs functions to make copies of matrices (requiring more memory) but it's actually working in the meantime an optimized solution can be implemented.

josdejong · 2023-07-12T13:48:18Z

Thanks David!

The logic behind the behavior of the function subset is: when the output contains a single, scalar value, unwrap it from the matrix and return the (numeric) value. When the output contains multiple values, return a matrix. I'm just happy with the current behavior, it is very practical when getting/setting individual values in a matrix that you don't have to worry about wrapping/unwrapping the value correctly. Also, I do like this proposal in #2344 to be able to disable this "smart unwrapping" feature if you need predictability.

I do not really have a strong opinion about how A[[1], 2] and A[1, [2]] should behave. I think in practice you will not enter a nested array with just one value, so it is a bit of a theoretic case to me. It makes sense to let the output be an array then, instead of the current behavior of returning an unwrapped number.

For the new filtering behavior it makes sense to me to always return a Matrix/Array, and not unwrap the output when it contains only a single value.

dvd101x · 2023-07-12T20:05:24Z

Hi Jos, thanks for the review

Only as a reference, It works like this by default:

A = [10, 20, 30, 40];
A[A > 20] # yields [30, 40]
A[A < 20] # yields 10
A[A > 50] # yields [ ]

And now it includes this behavior with math.config({ predictable: true}) according to #2344

A = [10, 20, 30, 40];
A[A > 20] # yields [30, 40]
A[A < 20] # yields [10]
A[A > 50] # yields [ ]

josdejong · 2023-07-13T13:32:39Z

Thanks, that is clear.

Whilst there is a clear use case to let A[0] return a number 10 (and I would like to keep that behavior like it is), I do not see a use case for the filtering functionality returning a number, since a filtering operation in general has an array as input and an array with an arbitrary number of items as output. So I would like A[A < 20] to return [10] rather than 10 (always predictable). How do you think about that? Does that make sense to you? Or do you see use cases for that?

dvd101x · 2023-07-13T14:53:46Z

Here is a possible case for that.

A = [1, 2, 3];
B = [4, 5, 6];
A[B == 5] # yields 2 or [2]

For me it's the same logic as with regular indexing. If the index is a calculated variable, it's uncertain if the index will contain one or many numbers.

Of course B could have various values equal to 5, so it's not predictable if the indexing with booleans will return one or many numbers.

In the references I know, it usually returns an array (the same as regular indexing in those references). So I don't know if it will also be confusing if the behavior of indexing is different if it comes from an array of booleans.

josdejong · 2023-07-13T15:07:27Z

Hm, yeah that is true, if you have a variable in A[x], the output depends on whether x holds a number or a range for example. So, yeah, there is some logic to it.

I just find it odd to have something like A[A < 20] or A[B == 5] return a number. Would you want that in practice?

dvd101x · 2023-07-13T15:24:09Z

Personally I wouldn't. I would prefer a predictable way as it can be used for other steps.

I will look into a way for an array of booleans to always return predictable outcomes regarding of config. If there is some confusion later regarding regular indexing we can take a look into it.

I think this is an in between state at some point it will need to be able to do something like.

A = 1:4;
A[A > 2] = 6;
A          # yields [1, 2, 6, 6]

Currently this throws a dimensional error but we will need to take a look at some point regarding broadcasting the 6.

A[3:4] = 6

dvd101x · 2023-07-14T14:50:52Z

In this version:

Index class now
- Converts an array of booleans to an array of numbers
- Has a property to indicate the size of the array of booleans
Return predictable subsets if indexing with an array of booleans
- for other cases returns predictable subsets if config.predictable
Subsets now validates if the array of booleans had an equal size to the Array|Matrix and throws a dimensional error otherwise.

This behavior of returning a number when indexing with a single number or an array with a single element when indexing with an array of booleans is equivalent to numpy's way of indexing.

One difference is that indexing with booleans returns the same number of dimensions as the source but numpy reduces the other dimensions.

dvd101x · 2023-07-20T05:32:57Z

Hi Glen and Jos,

I'm really glad you are liking the results so far. Thanks for the comments and review!

I agree these are a lot of changes at once, I wasn't planning for this. So thanks for the effort of reviewing this many changes at once and being so understanding of this whole process.

dvd101x · 2023-07-22T00:48:26Z

Hi, I think this latest commit covers most of the issues. Please let me know if something is missing.

josdejong · 2023-07-24T11:55:41Z

Thanks David for adding even more tests.

Last question: do we keep the config.predictable in this PR or not? See #2994 (comment). I see you marked the discussion as resolved, does that mean that you do want to keep it in this PR (so we'll schedule this PR for a new major version)?

… into index-with-booleans

dvd101x · 2023-07-24T15:00:09Z

Sorry my bad, I was certain I removed that as soon as you asked, either I messed something up with my setup or removed some of that and forgot about other places

I think it's better now.

docs/datatypes/matrices.md

src/function/matrix/subset.js

src/type/matrix/MatrixIndex.js

gwhitney · 2023-07-24T19:19:02Z

OK I took a look through, specific comments above. In general, I think this now leaves prior numeric/numeric array indexing alone, which I think is appropriate, and has the agreed-upon behavior for boolean indexing. So I have no objections to its merge. I will comment a bit on #2344

gwhitney · 2023-07-25T15:59:48Z

Good catch on this latest fix to your PR. Should you add a test that would have exercised the problem?

dvd101x · 2023-07-25T20:18:48Z

Thanks, Glen

Ok, I will include a test where an array to broadcast to a size that is equal to the array will broadcast correctly (and not skip the broadcast process). I was concerned it might be too specific.

gwhitney · 2023-07-25T21:06:43Z

I was concerned it might be too specific.

Thanks! IMHO no test for something that actually went awry at any point is too specific.

dvd101x · 2023-07-26T00:43:49Z

Makes sense, I'll keep that in mind, thanks!

josdejong · 2023-07-27T10:11:56Z

Thanks for reviewing Glen.

@dvd101x there is still one open comment from Glen about code duplication. Can you reply on that? Besides that I think the PR is ready to be merged :).

dvd101x · 2023-07-27T17:59:55Z

Thanks Jos,

I think Glen's comment about code duplication is addressed with the latest commit.

If no other topic is unresolved, I think this is ready to be merged.

gwhitney · 2023-07-28T00:15:45Z

Yup, that's the sort of thing I had in mind, thanks. I have no objections to this being merged.

josdejong · 2023-07-28T07:29:28Z

Awesome 🎉. Thanks for your patience David! Going to merge your PR now.

josdejong · 2023-08-23T14:08:40Z

Published now in v11.10.0, thanks again!

dvd101x and others added 13 commits April 24, 2023 16:40

Included math to syntax when missing

104310f

Included solveODE

431ee46

renamed initialStep as firstStep

1475b41

Included tests for solveODE

b4cac47

Test the full state instead of the final state

ca78866

Fixed issue with tolerance

bc0dd3f

Merge branch 'develop' into master

ffc5a2f

Indexing with an array of booleans

af74198

Merge pull request #1 from josdejong/develop

2884488

Included changes from develop

Indexing with booleans and with empty

219745d

Changed index embedded docs

2f12038

removed solveODE

f5f96dd

typos on tests

436e566

included config.predictable

5b07776

dvd101x added 2 commits July 14, 2023 08:04

Throws an error if the size doesn't match

9dbed0a

Included config predictable to get subset

c7f5135

dvd101x added 3 commits July 14, 2023 09:23

Merge branch 'develop' into index-with-booleans

b89f023

Can do replacement by broadcasting

b71c2f9

DenseMatrix set can broadcast first

c175020

Test coverage for subset

4691ad0

Merge branch 'develop' into index-with-booleans

f02a86b

dvd101x added 3 commits July 24, 2023 14:37

Removed config predictable from subset

2726573

Merge branch 'index-with-booleans' of https://github.com/dvd101x/mathjs…

64f105e

… into index-with-booleans

Removed config from index and sparseMatrix

2221408

gwhitney reviewed Jul 24, 2023

View reviewed changes

gwhitney mentioned this pull request Jul 24, 2023

subset of matrix should always return the same type (matrix) #2344

Open

dvd101x added 3 commits July 24, 2023 22:55

Redaction and typos

748a2aa

Cleanup unnecesary changes

57d36e9

fixed issue when there is no need to broadcast

f77905a

Inline ifs

28f6e6d

Included specific broadcasting test

45745f5

Reduced repetition

4f8c1de

Merge branch 'develop' into index-with-booleans

f63423c

josdejong merged commit 49c793b into josdejong:develop Jul 28, 2023

dvd101x deleted the index-with-booleans branch July 28, 2023 17:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index with booleans #2994

Index with booleans #2994

dvd101x commented Jul 9, 2023 •

edited by gwhitney

Loading

gwhitney commented Jul 10, 2023

dvd101x commented Jul 10, 2023

gwhitney commented Jul 10, 2023

dvd101x commented Jul 11, 2023

josdejong commented Jul 12, 2023

dvd101x commented Jul 12, 2023 •

edited

Loading

josdejong commented Jul 13, 2023

dvd101x commented Jul 13, 2023

josdejong commented Jul 13, 2023 •

edited

Loading

dvd101x commented Jul 13, 2023 •

edited

Loading

dvd101x commented Jul 14, 2023 •

edited

Loading

dvd101x commented Jul 20, 2023

dvd101x commented Jul 22, 2023

josdejong commented Jul 24, 2023

dvd101x commented Jul 24, 2023 •

edited

Loading

gwhitney commented Jul 24, 2023

gwhitney commented Jul 25, 2023

dvd101x commented Jul 25, 2023

gwhitney commented Jul 25, 2023

dvd101x commented Jul 26, 2023

josdejong commented Jul 27, 2023

dvd101x commented Jul 27, 2023

gwhitney commented Jul 28, 2023

josdejong commented Jul 28, 2023

josdejong commented Aug 23, 2023

Index with booleans #2994

Index with booleans #2994

Conversation

dvd101x commented Jul 9, 2023 • edited by gwhitney Loading

new capabilities

Main changes:

index

subset

gwhitney commented Jul 10, 2023

dvd101x commented Jul 10, 2023

(a)

(b)

gwhitney commented Jul 10, 2023

dvd101x commented Jul 11, 2023

josdejong commented Jul 12, 2023

dvd101x commented Jul 12, 2023 • edited Loading

josdejong commented Jul 13, 2023

dvd101x commented Jul 13, 2023

josdejong commented Jul 13, 2023 • edited Loading

dvd101x commented Jul 13, 2023 • edited Loading

dvd101x commented Jul 14, 2023 • edited Loading

dvd101x commented Jul 20, 2023

dvd101x commented Jul 22, 2023

josdejong commented Jul 24, 2023

dvd101x commented Jul 24, 2023 • edited Loading

gwhitney commented Jul 24, 2023

gwhitney commented Jul 25, 2023

dvd101x commented Jul 25, 2023

gwhitney commented Jul 25, 2023

dvd101x commented Jul 26, 2023

josdejong commented Jul 27, 2023

dvd101x commented Jul 27, 2023

gwhitney commented Jul 28, 2023

josdejong commented Jul 28, 2023

josdejong commented Aug 23, 2023

dvd101x commented Jul 9, 2023 •

edited by gwhitney

Loading

`index`

`subset`

dvd101x commented Jul 12, 2023 •

edited

Loading

josdejong commented Jul 13, 2023 •

edited

Loading

dvd101x commented Jul 13, 2023 •

edited

Loading

dvd101x commented Jul 14, 2023 •

edited

Loading

dvd101x commented Jul 24, 2023 •

edited

Loading