Drop identity values problem #28

simpletonDL · 2019-10-15T12:58:51Z

Hello, I don`t understand how to make GraphBlass not write implicit zeroes (identity values). I found in the documentation the following:

The entries in the pattern of A can take on any value, including the implicit value, whatever it happens to be. This differs slightly from MATLAB, which always drops all explicit zeros from its sparse matrices. This is a minor difference but it cannot be done in GraphBLAS.

What I should do, if I want to always drop identity values after some operations?
Below there is a simple example of matrix multiplication that generates identity (zero) value.

    GrB_Matrix a, b;
    GrB_Matrix_new(&a, GrB_INT64, 2, 2);
    GrB_Matrix_new(&b,GrB_INT64, 2, 2);

    GrB_Matrix_setElement(a, 2, 0, 0);
    GrB_Matrix_setElement(a, -2, 0, 1);
    GrB_Matrix_setElement(b, 1, 0, 0);
    GrB_Matrix_setElement(b, 1, 1, 0);

    GrB_Monoid monoid;
    GrB_Semiring semiring;

    GrB_Monoid_new_INT64(&monoid, GrB_PLUS_INT64, (int64_t) 0);
    GrB_Semiring_new(&semiring, monoid, GrB_TIMES_INT64);

    GrB_Matrix matrix_new;
    GrB_Matrix_new(&matrix_new, GrB_INT64, 2, 2);

    GrB_mxm(matrix_new, GrB_NULL, GrB_NULL, semiring, a, b, GrB_NULL);
    GxB_print(matrix_new, GxB_SHORT);

The output matrix contains one entry that equal to zero:

...
row: 0 : 1 entries [0:0]
    column 0: int64 0

In the real task, I need to use custom types and custom operations, but at first, I want to solve this small problem. Can you help me, please?

The text was updated successfully, but these errors were encountered:

DrTimothyAldenDavis · 2019-10-15T14:57:55Z

A very good question. It points out a feature of GraphBLAS, since "zero" can differ depending on the semiring (in a path distance problem, for example, an edge of weight zero is very different than no edge at all). So zeros cannot be dropped automatically inside GraphBLAS.

But there are cases when you do want to delete entries, like all explicit zeros.

It takes a second step to delete entries from a matrix. If you are using SuiteSparse:GraphBLAS, then you can use the following to drop explicit zeros from the GrB_Matrix A. This works for any matrix, including any user-defined type.

GxB_select (A, NULL, NULL, GxB_NONZERO, A, NULL, NULL) ;

GxB_select can also be used to drop any other particular value (or range of values, using, say, GxB_GT_ZERO, which keeps only those entries greater than zero, dropping values that are zero or less). GxB_GT_ZERO only works for the 11 built-in types, while GxB_NONZERO works for any type, including user-defined types. For user-defined types, it checks to see if the bit pattern is all zero, and keeps those that have at least one 1 bit in them. So if your typedef is a struct with "holes" in it, this might not always work as expected.

If you are using another GraphBLAS library, you need to use the matrix as its own mask (assuming A has a built-in type, not a user-defined type).

GrB_assign (A, A, NULL, A, GrB_ALL, nrows, GrB_ALL, ncols, Replace) ;

where Replace is a descriptor with the replace option turned on.

If A has a user-defined type, you first have to create a boolean matrix, where M(i,j) = 0 if A(i,j) is zero, or M(i,j)=1 otherwise. That can be done with a user-defined typecast function, via GrB_apply:

void my_typecast_func (void *z, const void *x)
{
     bool result = 0 if x is zero, 1 if x is nonzero
     *((bool *) z) = result ;
}

GrB_UnaryOp_new (&My_typecast_function, my_typecast_func, GrB_BOOL, My_type) ;
GrB_Matrix_new (&M, GrB_BOOL, nrows, ncols) ;
GrB_apply (M, NULL, NULL, My_typecast_function, A, NULL) ;
GrB_assign (A, M, NULL, A, GrB_ALL, nrows, GrB_ALL, ncols, Replace) ;

(technically speaking, all the "NULL"s above should be GrB_NULL ... but NULL works the same as GrB_NULL in SuiteSparse:GraphBLAS).

tgmattso · 2019-10-15T15:35:42Z

You know, that GxB_select() is a darn useful function. We should add it to the next GraphBLAS release.

…

--tim From: Tim Davis <[email protected]> Reply-To: GraphBLAS/LAGraph <[email protected]> Date: Tuesday, October 15, 2019 at 7:58 AM To: GraphBLAS/LAGraph <[email protected]> Cc: Subscribed <[email protected]> Subject: Re: [GraphBLAS/LAGraph] Drop identity values problem (#28) A very good question. It points out a feature of GraphBLAS, since "zero" can differ depending on the semiring (in a path distance problem, for example, an edge of weight zero is very different than no edge at all). So zeros cannot be dropped automatically inside GraphBLAS. But there are cases when you do want to delete entries, like all explicit zeros. It takes a second step to delete entries from a matrix. If you are using SuiteSparse:GraphBLAS, then you can use the following to drop explicit zeros from the GrB_Matrix A. This works for any matrix, including any user-defined type. GxB_select (A, NULL, NULL, GxB_NONZERO, A, NULL, NULL) ; GxB_select can also be used to drop any other particular value (or range of values, using, say, GxB_GT_ZERO, which keeps only those entries greater than zero, dropping values that are zero or less). GxB_GT_ZERO only works for the 11 built-in types, while GxB_NONZERO works for any type, including user-defined types. For user-defined types, it checks to see if the bit pattern is all zero, and keeps those that have at least one 1 bit in them. So if your typedef is a struct with "holes" in it, this might not always work as expected. If you are using another GraphBLAS library, you need to use the matrix as its own mask (assuming A has a built-in type, not a user-defined type). GrB_assign (A, A, NULL, A, GrB_ALL, nrows, GrB_ALL, ncols, Replace) ; where Replace is a descriptor with the replace option turned on. If A has a user-defined type, you first have to create a boolean matrix, where M(i,j) = 0 if A(i,j) is zero, or M(i,j)=1 otherwise. That can be done with a user-defined typecast function, via GrB_apply: void my_typecast_func (void *z, const void *x) { bool result = 0 if x is zero, 1 if x is nonzero ((*bool) z) = result ; } GrB_UnaryOp_new (&My_typecast_function, my_typecast_func, GrB_BOOL, My_type) ; GrB_Matrix_new (&M, GrB_BOOL, nrows, ncols) ; GrB_apply (M, NULL, NULL, My_typecast_function, A, NULL) ; GrB_assign (A, M, NULL, A, GrB_ALL, nrows, GrB_ALL, ncols, Replace) ; (technically speaking, all the "NULL"s above should be GrB_NULL ... but NULL works the same as GrB_NULL in SuiteSparse:GraphBLAS). — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#28?email_source=notifications&email_token=AATVMEYMVD5N4T5U2VMFSDTQOXK7JA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJCS7I#issuecomment-542255485>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AATVMEY2BCJI63SLE4TEJMDQOXK7JANCNFSM4JA4HHPA>.

DrTimothyAldenDavis · 2019-10-15T17:09:02Z

Yes, GxB_select is very useful. I used it for both MIT GraphChallenge solutions, and for some parts of LAGraph. The triangle count needs the same as L=tril(A) in MATLAB (extract the lower triangular part). That is tricky do in pure GraphBLAS. You can't do it with a mask. The only way to do it is with GrB_extractTuples, and then delete the tuples you don't want. Tedious... I also needed it for the ReLU, to drop values that were less than or equal to zero. So it seems to be an important function. GxB_select acts kind of like a functional mask, which GraphBLAS doesn't have.

…

On Tue, Oct 15, 2019 at 10:35 AM Tim Mattson ***@***.***> wrote: You know, that GxB_select() is a darn useful function. We should add it to the next GraphBLAS release. --tim From: Tim Davis ***@***.***> Reply-To: GraphBLAS/LAGraph ***@***.***> Date: Tuesday, October 15, 2019 at 7:58 AM To: GraphBLAS/LAGraph ***@***.***> Cc: Subscribed ***@***.***> Subject: Re: [GraphBLAS/LAGraph] Drop identity values problem (#28) A very good question. It points out a feature of GraphBLAS, since "zero" can differ depending on the semiring (in a path distance problem, for example, an edge of weight zero is very different than no edge at all). So zeros cannot be dropped automatically inside GraphBLAS. But there are cases when you do want to delete entries, like all explicit zeros. It takes a second step to delete entries from a matrix. If you are using SuiteSparse:GraphBLAS, then you can use the following to drop explicit zeros from the GrB_Matrix A. This works for any matrix, including any user-defined type. GxB_select (A, NULL, NULL, GxB_NONZERO, A, NULL, NULL) ; GxB_select can also be used to drop any other particular value (or range of values, using, say, GxB_GT_ZERO, which keeps only those entries greater than zero, dropping values that are zero or less). GxB_GT_ZERO only works for the 11 built-in types, while GxB_NONZERO works for any type, including user-defined types. For user-defined types, it checks to see if the bit pattern is all zero, and keeps those that have at least one 1 bit in them. So if your typedef is a struct with "holes" in it, this might not always work as expected. If you are using another GraphBLAS library, you need to use the matrix as its own mask (assuming A has a built-in type, not a user-defined type). GrB_assign (A, A, NULL, A, GrB_ALL, nrows, GrB_ALL, ncols, Replace) ; where Replace is a descriptor with the replace option turned on. If A has a user-defined type, you first have to create a boolean matrix, where M(i,j) = 0 if A(i,j) is zero, or M(i,j)=1 otherwise. That can be done with a user-defined typecast function, via GrB_apply: void my_typecast_func (void *z, const void *x) { bool result = 0 if x is zero, 1 if x is nonzero ((*bool) z) = result ; } GrB_UnaryOp_new (&My_typecast_function, my_typecast_func, GrB_BOOL, My_type) ; GrB_Matrix_new (&M, GrB_BOOL, nrows, ncols) ; GrB_apply (M, NULL, NULL, My_typecast_function, A, NULL) ; GrB_assign (A, M, NULL, A, GrB_ALL, nrows, GrB_ALL, ncols, Replace) ; (technically speaking, all the "NULL"s above should be GrB_NULL ... but NULL works the same as GrB_NULL in SuiteSparse:GraphBLAS). — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub< #28?email_source=notifications&email_token=AATVMEYMVD5N4T5U2VMFSDTQOXK7JA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJCS7I#issuecomment-542255485>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AATVMEY2BCJI63SLE4TEJMDQOXK7JANCNFSM4JA4HHPA>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=AEYIIOJQLINZGYTNAPQNRRDQOXPNBA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJHBSA#issuecomment-542273736>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEYIIOMAN7WWG4S62FCQB4LQOXPNBANCNFSM4JA4HHPA> .

gsvgit · 2019-10-16T05:52:51Z

Hello.

I have the same question. I clearly understand why it is not good idea to remove zero values. But what if I explicitly specify zero as an identity in the monoid? In path distance problem the identity is not zero, so we should not delete zeroes, but I think that we can remove minus infinity which is identity. So, the question is about identities: is it possible to drop identities out automatically during operations over sparse matrices?

DrTimothyAldenDavis · 2019-10-16T10:13:22Z

A matrix doesn’t remain in a single semiring in an algorithm. It can be used in multiple semirings. There are several examples of this So the value that isn’t there is suddenly different. It changes with the semiring. As a result, it’s impossible to automatically drop any values

On Wed, Oct 16, 2019 at 12:52 AM Semyon ***@***.***> wrote: Hello. I have the same question. I clearly understand why it is not good idea to remove zero values. But what if I explicitly specify zero as an identity in the monoid? In path distance problem the identity is not zero, so we should not delete zeroes, but I think that we can remove minus infinity which is identity. So, the question is about identities: is it possible to drop identities out automatically during operations over sparse matrices? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=AEYIIOMMQMVYBQNE4MNYNRDQO2T3JA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBLFXCY#issuecomment-542530443>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEYIIOIOLFA3HLR7SDXKOBLQO2T3JANCNFSM4JA4HHPA> .

-- Sent from Gmail Mobile

ScottKolo · 2019-10-16T14:31:06Z

I agree that this is not something that should be done automatically, but it would be convenient to have a utility method or canonical way of doing it. The GxB_select approach seems to be that, so I also agree with the calls to get that into the standard (this application alone justifies it in my opinion).

I think the usual argument here is that not dropping identity values in some cases could result in a lot of fill-in down the road, leading to performance issues.

Maybe an LAGraph utility function would be a nice middle ground?

DrTimothyAldenDavis · 2019-10-16T15:24:25Z

Yes, adding it to LAGraph would be a good idea. It would use an #ifdef so that the GxB_select can be used if SuiteSparse:GraphBLAS is in use, and would use GrB* functions otherwise. I have a function in my MATLAB interface to do this as well, as A = GrB.prune (A). By default, it prunes zeros. To prune other values equal to the identity id, use A = GrB.prune (A, id).

…

On Wed, Oct 16, 2019 at 9:31 AM Scott Kolodziej ***@***.***> wrote: I agree that this is not something that should be done *automatically,* but it would be convenient to have a utility method or canonical way of doing it. The GxB_select approach seems to be that, so I also agree with the calls to get that into the standard (this application alone justifies it in my opinion). I think the usual argument here is that not dropping identity values in some cases could result in a lot of fill-in down the road, leading to performance issues. Maybe an LAGraph utility function would be a nice middle ground? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=AEYIIOM56AHABGAUTJM744LQO4QTRA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBMWSBA#issuecomment-542730500>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEYIIONIISEAJSNAPFF6ZALQO4QTRANCNFSM4JA4HHPA> .

aydinbuluc · 2019-10-16T15:26:37Z

I am actually surprised that we managed to not include an easy way to do this in GraphBLAS. Prune always existed in CombBLAS (by now, for a decade)

…

On Wed, Oct 16, 2019 at 8:24 AM Tim Davis ***@***.***> wrote: Yes, adding it to LAGraph would be a good idea. It would use an #ifdef so that the GxB_select can be used if SuiteSparse:GraphBLAS is in use, and would use GrB* functions otherwise. I have a function in my MATLAB interface to do this as well, as A = GrB.prune (A). By default, it prunes zeros. To prune other values equal to the identity id, use A = GrB.prune (A, id). On Wed, Oct 16, 2019 at 9:31 AM Scott Kolodziej ***@***.***> wrote: > I agree that this is not something that should be done *automatically,* > but it would be convenient to have a utility method or canonical way of > doing it. The GxB_select approach seems to be that, so I also agree with > the calls to get that into the standard (this application alone justifies > it in my opinion). > > I think the usual argument here is that not dropping identity values in > some cases could result in a lot of fill-in down the road, leading to > performance issues. > > Maybe an LAGraph utility function would be a nice middle ground? > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > < #28?email_source=notifications&email_token=AEYIIOM56AHABGAUTJM744LQO4QTRA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBMWSBA#issuecomment-542730500 >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AEYIIONIISEAJSNAPFF6ZALQO4QTRANCNFSM4JA4HHPA > > . > — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=AAMJ7L2C55CN7YB2D3I4JHDQO4W27A5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM42CI#issuecomment-542756105>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAMJ7L3TNSL2CKLZIG4QQKLQO4W27ANCNFSM4JA4HHPA> .

gsvgit · 2019-10-16T15:30:34Z

Ah... I see. @DrTimothyAldenDavis thank you for the explanation!

And even for such operation like GrB_mxm where we should specify semiring, we still have no enough information to drop identities of the given semiring automatically?
Suppose the next case.

I have sparse matrices A and B without explicit zero values.
I perform matrix multiplication A * B over semiring where zero is identity. The result is matrix C in which the value of some cells is explicit zero (because we do not drop it out), and the value of some cells is implicit zero. The first (and principal for me) question here is why the behavior of operation is not agreed with specified semiring? And the second is mentioned by @ScottKolo: such behavior can lead to poor performance.
Now I want to use C in operation over semiring in which zero is not identity. And now I'm confused. Because in terms of the result type all implicit and explicit zeros are equal. But in terms of argument type (C is an argument of operation over another semiring) implicit values and explicit zeros are different.

So, I guess that

The behavior of operation with specified semiring should be agreed with this semiring.
If I want to switch from one semiring to another, I should do it explicitly by using the select function, for example.

DrTimothyAldenDavis · 2019-10-16T15:52:17Z

Automatic dropping of zeros (say in MATLAB) is an awful thing to do. But it's perfect to add as a non-default option, where the user is able to prune things easily at any time. But it can't be done automatically, for many reasons: First of all, it breaks the semirings in GraphBLAS. Switching between semirings causes all implicit values to change but not explicit values, so the explicit zero is never the same thing as an implicit entry that is not present in the pattern. The matrix has no tag that tells what semiring it's in, nor a tag to say what the implicit value is, so there's no select function to change a matrix from one semiring to another. Second, it destroys all the graph theoretic structure in the resulting matrices. There are things I could do inside MATLAB, but I can't because it drops zeros all the time (MATLAB uses my solvers for x=A\b, and I also do C=A*B when A and/or B are sparse, inside MATLAB). In GraphBLAS, in the future, I could speed up GrB_mxm on a sequence of matrices with the same pattern, so the pattern of the result never changes. That way, I could cache the symbolic analysis and reuse it. Zoom ... but if you make me drop things, this breaks and I can't do it. Third, it's slow. If a few zeros are in the matrix, it's faster to leave them there, and prune as needed. Changing the pattern of a matrix can cause a huge slowdown. Zombies are better for this (that's a long story... http://aldenmath.com/my-friendly-zombie/ ). (I should probably turn our discussion into a blog post there because this is a very important question). Fourth, there are times in GraphBLAS where you want to keep all zeros. GraphBLAS does not have a different object for dense or sparse matrices, as MATLAB does. There are times when dense is faster ... say a vector of size n, that gives the depth of each node in a breadth-first-search. That vector starts out sparse (empty, actually) and slow accumulates entries until it becomes dense. But each time new entries get added, I have to redo the whole data structure (in my implementation). So it's far faster to start it dense, with explicit zeros (or whatever identity values it needs). In this case, any kind of automatic dropping is bad. Fifth, it's unpredictable. Say the result is floating point epsilon, because of roundoff. So it is kept. But in another machine the result is zero. So you get a different combinatoric result depending on what your roundoff is, what your compiler -O flag is, what your compiler is, if you're in parallel or not, on the GPU or not ... ack. Now try to explore a bug where your pattern differs from what you expect. Turn on -g, and your bug goes away. Heisenbug. Nasty. Having said all this, it is essential that some algorithms need to drop entries that match some specific criterion, like "drop all zeros", "drop all nans", "drop all entries <= 0", and even "drop all entries that satisfy some condition determined by a function f (aij, i, j, m, n, thunk) where aij is the value, i and j are the indices, thunk is some user-defined 'scalar', etc". That can be used for all sorts of things, like L=tril(A) in MATLAB, which cannot be done easily in pure GraphBLAS. So I absolutely agree that it needs to be simple to drop things. It just can never be done automatically.

…

-- Tim

On Wed, Oct 16, 2019 at 10:30 AM Semyon ***@***.***> wrote: Ah... I see. @DrTimothyAldenDavis <https://github.com/DrTimothyAldenDavis> thank you for the explanation! And even for such operation like GrB_mxm where we should specify semiring, we still have no enough information to drop identities of the given semiring automatically? Suppose the next case. 1. I have sparse matrices A and B without explicit zero values. 2. I perform matrix multiplication A * B over semiring where zero is identity. The result is matrix C in which the value of some cells is explicit zero (because we do not drop it out), and the value of some cells is implicit zero. The first (and principal for me) question here is why the behavior of operation is not agreed with specified semiring? And the second is mentioned by @ScottKolo <https://github.com/ScottKolo>: such behavior can lead to poor performance. 3. Now I want to use C in operation over semiring in which zero is not identity. And now I'm confused. Because in terms of the result type all implicit and explicit zeros are equal. But in terms of argument type (C is an argument of operation over another semiring) implicit values and explicit zeros are different. So, I guess that 1. The behavior of operation with specified semiring should be agreed with this semiring. 2. If I want to switch from one semiring to another, I should do it explicitly by using the select function, for example. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=AEYIIOPT4ZSHFKXCQFOVOSDQO4XRXA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM5RQQ#issuecomment-542759106>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEYIIOIVVOUUR3GNRCZO6EDQO4XRXANCNFSM4JA4HHPA> .

mcmillan03 · 2019-10-16T16:07:01Z

IIRC, it was because "it could be done with a combination of existing operations." GxB_select is in the list of issues to consider for inclusion in the next update to the spec. By the way, using something called GrB_select() to "remove" unwanted values from a matrix/vector is a bit counter-intuitive. "Prune" implies a heuristic which might be useful (especially supporting binaryops and scalar constants as one input). Soliciting ideas for names On Wed, Oct 16, 2019 at 8:26 AM Aydin Buluc <[email protected]> wrote:

…

I am actually surprised that we managed to not include an easy way to do this in GraphBLAS. Prune always existed in CombBLAS (by now, for a decade) On Wed, Oct 16, 2019 at 8:24 AM Tim Davis ***@***.***> wrote: > Yes, adding it to LAGraph would be a good idea. It would use an #ifdef so > that the GxB_select can be used if SuiteSparse:GraphBLAS is in use, and > would use GrB* functions otherwise. > > I have a function in my MATLAB interface to do this as well, as A = > GrB.prune (A). By default, it prunes zeros. To prune other values equal > to the identity id, use A = GrB.prune (A, id). > > On Wed, Oct 16, 2019 at 9:31 AM Scott Kolodziej < ***@***.***> > wrote: > > > I agree that this is not something that should be done *automatically,* > > but it would be convenient to have a utility method or canonical way of > > doing it. The GxB_select approach seems to be that, so I also agree with > > the calls to get that into the standard (this application alone justifies > > it in my opinion). > > > > I think the usual argument here is that not dropping identity values in > > some cases could result in a lot of fill-in down the road, leading to > > performance issues. > > > > Maybe an LAGraph utility function would be a nice middle ground? > > > > — > > You are receiving this because you commented. > > Reply to this email directly, view it on GitHub > > < > #28?email_source=notifications&email_token=AEYIIOM56AHABGAUTJM744LQO4QTRA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBMWSBA#issuecomment-542730500 > >, > > or unsubscribe > > < > https://github.com/notifications/unsubscribe-auth/AEYIIONIISEAJSNAPFF6ZALQO4QTRANCNFSM4JA4HHPA > > > > . > > > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > < #28?email_source=notifications&email_token=AAMJ7L2C55CN7YB2D3I4JHDQO4W27A5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM42CI#issuecomment-542756105 >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AAMJ7L3TNSL2CKLZIG4QQKLQO4W27ANCNFSM4JA4HHPA > > . > — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=AANXEP37VTFPXZE73PDC2JDQO4XDFA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM5CRA#issuecomment-542757188>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANXEPYQO47EEGI67IHO67DQO4XDFANCNFSM4JA4HHPA> .

mcmillan03 · 2019-10-16T16:12:32Z

Mathematicians please chime in...what I am about to say is secondhand explanation that was given to me years ago.... Note that you are using a semiring which does not define additive inverse (e.g. "minus"). The production of a "zero" is happenstance (because an additive inverse operation occurred somewhere either by adding a negated value or subtraction...which are not part of the semiring). I would defer to the more mathematically inclined to correct my understanding.

…

On Wed, Oct 16, 2019 at 8:30 AM Semyon ***@***.***> wrote: Ah... I see. @DrTimothyAldenDavis <https://github.com/DrTimothyAldenDavis> thank you for the explanation! And even for such operation like GrB_mxm where we should specify semiring, we still have no enough information to drop identities of the given semiring automatically? Suppose the next case. 1. I have sparse matrices A and B without explicit zero values. 2. I perform matrix multiplication A * B over semiring where zero is identity. The result is matrix C in which the value of some cells is explicit zero (because we do not drop it out), and the value of some cells is implicit zero. The first (and principal for me) question here is why the behavior of operation is not agreed with specified semiring? And the second is mentioned by @ScottKolo <https://github.com/ScottKolo>: such behavior can lead to poor performance. 3. Now I want to use C in operation over semiring in which zero is not identity. And now I'm confused. Because in terms of the result type all implicit and explicit zeros are equal. But in terms of argument type (C is an argument of operation over another semiring) implicit values and explicit zeros are different. So, I guess that 1. The behavior of operation with specified semiring should be agreed with this semiring. 2. If I want to switch from one semiring to another, I should do it explicitly by using the select function, for example. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=AANXEP3NZYTFZT6WH77B3O3QO4XRXA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM5RQQ#issuecomment-542759106>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANXEP33DWWXVWKLOUAC6BLQO4XRXANCNFSM4JA4HHPA> .

DrTimothyAldenDavis · 2019-10-16T16:14:06Z

GxB_select was named that way because it doesn't prune. It selects entries for the output. So for example, for the sparse deep neural network, to select only positive entries, I do GxB_select (A, ... , GxB_GT_ZERO...). and to delete explicit zeros I do GxB_select (A, ... , GxB_NONZERO ...). which selects all zeros. GxB_select keeps all entries A(i,j) for which the selectop f (aij,i,j,m,n,thunk) is true, just as the mask M(i,j)=true selects the entry (i,j) to be written to the result. I'm open to other naming alternatives, though. I considered something with the word "mask" in it, but it acts differently than the mask so I avoided that name as potentially confusing. On Wed, Oct 16, 2019 at 11:07 AM Doc McMillan <[email protected]> wrote:

…

IIRC, it was because "it could be done with a combination of existing operations." GxB_select is in the list of issues to consider for inclusion in the next update to the spec. By the way, using something called GrB_select() to "remove" unwanted values from a matrix/vector is a bit counter-intuitive. "Prune" implies a heuristic which might be useful (especially supporting binaryops and scalar constants as one input). Soliciting ideas for names On Wed, Oct 16, 2019 at 8:26 AM Aydin Buluc ***@***.***> wrote: > I am actually surprised that we managed to not include an easy way to do > this in GraphBLAS. Prune always existed in CombBLAS (by now, for a decade) > > On Wed, Oct 16, 2019 at 8:24 AM Tim Davis ***@***.***> > wrote: > > > Yes, adding it to LAGraph would be a good idea. It would use an #ifdef so > > that the GxB_select can be used if SuiteSparse:GraphBLAS is in use, and > > would use GrB* functions otherwise. > > > > I have a function in my MATLAB interface to do this as well, as A = > > GrB.prune (A). By default, it prunes zeros. To prune other values equal > > to the identity id, use A = GrB.prune (A, id). > > > > On Wed, Oct 16, 2019 at 9:31 AM Scott Kolodziej < > ***@***.***> > > wrote: > > > > > I agree that this is not something that should be done *automatically,* > > > but it would be convenient to have a utility method or canonical way of > > > doing it. The GxB_select approach seems to be that, so I also agree > with > > > the calls to get that into the standard (this application alone > justifies > > > it in my opinion). > > > > > > I think the usual argument here is that not dropping identity values in > > > some cases could result in a lot of fill-in down the road, leading to > > > performance issues. > > > > > > Maybe an LAGraph utility function would be a nice middle ground? > > > > > > — > > > You are receiving this because you commented. > > > Reply to this email directly, view it on GitHub > > > < > > > #28?email_source=notifications&email_token=AEYIIOM56AHABGAUTJM744LQO4QTRA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBMWSBA#issuecomment-542730500 > > >, > > > or unsubscribe > > > < > > > https://github.com/notifications/unsubscribe-auth/AEYIIONIISEAJSNAPFF6ZALQO4QTRANCNFSM4JA4HHPA > > > > > > . > > > > > > > — > > You are receiving this because you are subscribed to this thread. > > Reply to this email directly, view it on GitHub > > < > #28?email_source=notifications&email_token=AAMJ7L2C55CN7YB2D3I4JHDQO4W27A5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM42CI#issuecomment-542756105 > >, > > or unsubscribe > > < > https://github.com/notifications/unsubscribe-auth/AAMJ7L3TNSL2CKLZIG4QQKLQO4W27ANCNFSM4JA4HHPA > > > > . > > > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > < #28?email_source=notifications&email_token=AANXEP37VTFPXZE73PDC2JDQO4XDFA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM5CRA#issuecomment-542757188 >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AANXEPYQO47EEGI67IHO67DQO4XDFANCNFSM4JA4HHPA > > . > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=AEYIIOKMBHZ35VMAC5AZJMTQO432NA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBNBTJY#issuecomment-542775719>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEYIIOJ7VMSLAIX5MPMPFADQO432NANCNFSM4JA4HHPA> .

szarnyasg · 2019-10-16T16:57:17Z

Maybe GxB_keep would be a better name? The documentation says

Each entry A(i,j) is evaluated with the operator, which returns true if the entry is to be kept in the output, or false if it is not to appear in the output.

For me, this name works well for simple cases such as keeping the lower triangular part of a matrix. Not sure about the more complex cases (i.e. ones with a mask) though.

tgmattso · 2019-10-16T17:04:06Z

GrB_keep() is a nice name, but I still like GrB_select() better. From an SQL point of view, I’m used to using SELECT to choose the items I want to pull into a table. So the name is quite intuitive to people with exposure to SQL.

…

-Tim From: Gabor Szarnyas <[email protected]> Reply-To: GraphBLAS/LAGraph <[email protected]> Date: Wednesday, October 16, 2019 at 9:57 AM To: GraphBLAS/LAGraph <[email protected]> Cc: Tim Mattson <[email protected]>, Comment <[email protected]> Subject: Re: [GraphBLAS/LAGraph] Drop identity values problem (#28) Maybe GxB_keep would be a better name? The documentation says Each entry A(i,j) is evaluated with the operator, which returns true if the entry is to be kept in the output, or false if it is not to appear in the output. For me, this name works well for simple cases such as keeping the lower triangular part of a matrix. Not sure about the more complex cases (i.e. ones with masks) though. — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#28?email_source=notifications&email_token=AATVME5IQJRNZJZSCMGJZK3QO5BW5A5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBNGQDA#issuecomment-542795788>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AATVME6SLILXRZPIIUQCH5LQO5BW5ANCNFSM4JA4HHPA>.

mcmillan03 · 2019-10-16T17:10:02Z

Okay...if I understand correctly the select approach still results in a two step process with the formation of an intermediate matrix. Is there need for a one step process?

…

On Wed, Oct 16, 2019 at 9:14 AM Tim Davis ***@***.***> wrote: GxB_select was named that way because it doesn't prune. It selects entries for the output. So for example, for the sparse deep neural network, to select only positive entries, I do GxB_select (A, ... , GxB_GT_ZERO...). and to delete explicit zeros I do GxB_select (A, ... , GxB_NONZERO ...). which selects all zeros. GxB_select keeps all entries A(i,j) for which the selectop f (aij,i,j,m,n,thunk) is true, just as the mask M(i,j)=true selects the entry (i,j) to be written to the result. I'm open to other naming alternatives, though. I considered something with the word "mask" in it, but it acts differently than the mask so I avoided that name as potentially confusing. On Wed, Oct 16, 2019 at 11:07 AM Doc McMillan ***@***.***> wrote: > IIRC, it was because "it could be done with a combination of existing > operations." GxB_select is in the list of issues to consider for inclusion > in the next update to the spec. > > By the way, using something called GrB_select() to "remove" unwanted values > from a matrix/vector is a bit counter-intuitive. "Prune" implies a > heuristic which might be useful (especially supporting binaryops and scalar > constants as one input). Soliciting ideas for names > > On Wed, Oct 16, 2019 at 8:26 AM Aydin Buluc ***@***.***> > wrote: > > > I am actually surprised that we managed to not include an easy way to do > > this in GraphBLAS. Prune always existed in CombBLAS (by now, for a > decade) > > > > On Wed, Oct 16, 2019 at 8:24 AM Tim Davis ***@***.***> > > wrote: > > > > > Yes, adding it to LAGraph would be a good idea. It would use an #ifdef > so > > > that the GxB_select can be used if SuiteSparse:GraphBLAS is in use, and > > > would use GrB* functions otherwise. > > > > > > I have a function in my MATLAB interface to do this as well, as A = > > > GrB.prune (A). By default, it prunes zeros. To prune other values equal > > > to the identity id, use A = GrB.prune (A, id). > > > > > > On Wed, Oct 16, 2019 at 9:31 AM Scott Kolodziej < > > ***@***.***> > > > wrote: > > > > > > > I agree that this is not something that should be done > *automatically,* > > > > but it would be convenient to have a utility method or canonical way > of > > > > doing it. The GxB_select approach seems to be that, so I also agree > > with > > > > the calls to get that into the standard (this application alone > > justifies > > > > it in my opinion). > > > > > > > > I think the usual argument here is that not dropping identity values > in > > > > some cases could result in a lot of fill-in down the road, leading to > > > > performance issues. > > > > > > > > Maybe an LAGraph utility function would be a nice middle ground? > > > > > > > > — > > > > You are receiving this because you commented. > > > > Reply to this email directly, view it on GitHub > > > > < > > > > > > #28?email_source=notifications&email_token=AEYIIOM56AHABGAUTJM744LQO4QTRA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBMWSBA#issuecomment-542730500 > > > >, > > > > or unsubscribe > > > > < > > > > > > https://github.com/notifications/unsubscribe-auth/AEYIIONIISEAJSNAPFF6ZALQO4QTRANCNFSM4JA4HHPA > > > > > > > > . > > > > > > > > > > — > > > You are receiving this because you are subscribed to this thread. > > > Reply to this email directly, view it on GitHub > > > < > > > #28?email_source=notifications&email_token=AAMJ7L2C55CN7YB2D3I4JHDQO4W27A5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM42CI#issuecomment-542756105 > > >, > > > or unsubscribe > > > < > > > https://github.com/notifications/unsubscribe-auth/AAMJ7L3TNSL2CKLZIG4QQKLQO4W27ANCNFSM4JA4HHPA > > > > > > . > > > > > > > — > > You are receiving this because you are subscribed to this thread. > > Reply to this email directly, view it on GitHub > > < > #28?email_source=notifications&email_token=AANXEP37VTFPXZE73PDC2JDQO4XDFA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM5CRA#issuecomment-542757188 > >, > > or unsubscribe > > < > https://github.com/notifications/unsubscribe-auth/AANXEPYQO47EEGI67IHO67DQO4XDFANCNFSM4JA4HHPA > > > > . > > > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > < #28?email_source=notifications&email_token=AEYIIOKMBHZ35VMAC5AZJMTQO432NA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBNBTJY#issuecomment-542775719 >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AEYIIOJ7VMSLAIX5MPMPFADQO432NANCNFSM4JA4HHPA > > . > — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=AANXEP2VWWPEWCZ7DHYAB7TQO44U7A5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBNCJMA#issuecomment-542778544>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANXEPZCQEFJ6GHNHXUDIEDQO44U7ANCNFSM4JA4HHPA> .

simpletonDL · 2019-10-17T01:04:16Z

Thank you very much for a very detailed explanation of this issue, it's great!)

I understand why automatic dropping of zeros isn't allowed in GraphBlas and why it breaks the framework. So I don't pretend to include this feature by default, but I would be very grateful if you, if possible, could add the ability to change operation behaviour (like some entry in the operation descriptor or something else). So I want to give an additional more real-life example when it would be very useful and the selection operation wouldn't be enough.

The problem will appear if we want to change the graph dynamically, e.g. adding edges step by step. Suppose we want to find all paths in the directed graph which satisfy some conditions. For simplicity, let's assume that conditions are predicates A, B, C so the entry of the matrix G that corresponds to graph is a subset of {A, B, C}. So predicate P belongs to G[i][j] iff there is a path from node i to j, which satisfies the predicate P.

Also, we have some rules, that allow merging two paths, whose final vertices coincide. E.g. if some path from i to j satisfies the predicate B and some path from j to k satisfy predicate C, then the path from i to k satisfies the predicate A. Thus these rules constitute the semiring, whose elements are subsets of {A, B, C} and binary multiplication operation corresponds to applying all rules to subsets. In the example above, {B} multiplied by {C} is {A}, but {C} multiplied by {C} is empty set, because there isn't such kind of rule. The addition operation, you guessed it, is a simple union of subsets.

At the beginning of the algorithm, we initialize matrix with some subsets (which ones don't matter), so the base of the algorithm is the answer for paths of length 1. Then we multiply the matrix by itself and receive an answer for paths of length 2. Then we get the union of matrices for paths of length 2 and 1 and can repeat multiplication to get answers for paths of length 3. And so on. I believe, that these iterations will converge :D

In this algorithm, I need to create own function to implement set multiplication. And there is a case when the result of the operation is empty set (when there is no suitable rule). In the current version, I have to set the explicit value of this empty set, but in the semantics of algorithm, it is equal to an explicit value, which doesn't occur in the pattern, because it means "there isn't the path that satisfies at least one predicates between this vertices".

That gets into serious troubles. The main terrible thing can happen even after the first matrix multiplication due to the appearance of a huge number of "zeroes" explicit values. Even if we clear all unnecessary values after each operation thanks to the selection operation, the zeroes values will come to us after multiplication, and before selection in the worst case will permeate the swap memory, get out of there and kill the process. And this is a real-life case.

The other problem is performance due to many unnecessary operations. At first, We have to add implicit unnecessary value, at second, delete this value. It seems a little strange.

So it would be very nice to be able to change the behaviour of operations (make it drop unnecessary values) or to return from user-defined operation function special value or something else. This will make it more flexible.

In conclusion, I want to thank everyone) I am very pleased to participate in the conversation.

DrTimothyAldenDavis · 2019-10-17T01:33:31Z

I see. Currently, if you're using SuiteSparse:GraphBLAS, it's a single call to GxB_select after each multiplication, to drop the zeros. It would be very hard to change my internal functions to drop zeros during the computation of GrB_mxm. There might be variants of GrB_mxm that could exploit it though. I'd have to think it over. (I currently have 10 different variants of C=A*B ... for each 1,040 different semirings, as well, plus the generic ones. Yes, I know that's crazy ...). Some of the 10 variants might be able to drop zeros on the fly, to same time and memory, but not very many of them. Most will have to compute all of C and then drop zeros when they are done. This is because of how the matrix multiplication gets done. Even sequentially, most of the methods are best done as two phases: one to do the symbolic work (deciding what the pattern of the result is) and the second phase computes the values. The first symbolic phase doesn't know the values so it can't drop anything. When computing in parallel, the multiphase approach is essential for parallelism. Dropping on the fly, during the GrB_mxm, makes the size of the result unpredictable, and thus not parallel. But it would be possible to include a nondefault descriptor option which would drop zeros *after* any GrB* function is done, just before it returns its result to the caller. So in that case, the performance of the following 2 cases would be the same: GrB_mxm (C, ...) GxB_select (C, NULL, NULL, GxB_NONZERO, C, NULL, NULL) ; or with a possible descriptor: // this doesn't work yet; just a straw man GrB_Descriptor_set (desc, GxB_DROP, true) ; GrB_Descriptor_set (desc, GxB_DROPVALUE, 0) ; GrB_mxm (C, ..., desc) ; Since I do the GxB_select in parallel, it's not an in-place algorithm, in either case. I have to create a new copy of the matrix with the zeros dropped. The advantage of the latter example, with the descriptor, would be that you wouldn't have to create the descriptor all the time. You could just pass the "drop descriptor" to all your favorite GrB* functions. I'll have to see what the C API committee thinks about this idea, of adding the GxB_DROP* fields to the descriptor. It's an extension (GxB*) but I like to get their feedback before I add things like this, anyway. But the descriptor would likely not speed things up very much, at least at first, as compared to having you call GxB_select at particular points in your code. Is including GxB_select after your GrB_mxm (and any GrB* function) enough to solve your memory thrashing problem?

…

On Wed, Oct 16, 2019 at 8:04 PM simpletonDL ***@***.***> wrote: Thank you very much for a very detailed explanation of this issue, it`s great!) I understand why automatic dropping of zeros isnt allowed in GraphBlas and why it breaks the framework. So I dont pretend to include this feature by default, but I would be very grateful if you, if possible, could add the ability to change operation behaviour (like some entry in the operation descriptor or something else). So I want to give an additional more real-life example when it would be very useful and the selection operation wouldn`t be enough. The problem will appear if we want to change the graph dynamically, e.g. adding edges step by step. Suppose we want to find all paths in the directed graph which satisfy some conditions. For simplicity, lets assume that conditions are predicates A, B, C so the entry of the matrix G that corresponds to graph is a subset of {A, B, C}. So predicate P belongs to G[i][j] iff there is a path from node i to j, which satisfies the predicate P. Also, we have some rules, that allow merging two paths, whose final vertices coincide. E.g. if some path from i to j satisfies the predicate B and some path from j to k satisfy predicate C, then the path from i to k satisfies the predicate A. Thus these rules constitute the semiring, whose elements are subsets of {A, B, C} and binary multiplication operation corresponds to applying all rules to subsets. In the example above, {B} multiplied by {C} is {A}, but {C} multiplied by {C} is empty set, because there isnt such kind of rule. The addition operation, you guessed it, is a simple union of subsets. At the beginning of the algorithm, we initialize matrix with some subsets (which ones don't matter), so the base of the algorithm is the answer for paths of length 1. Then we multiply the matrix by itself and receive an answer for paths of length 2. Then we get the union of matrices for paths of length 2 and 1 and can repeat multiplication to get answers for paths of length 3. And so on. I believe, that these iterations will converge :D In this algorithm, I need to create own function to implement set multiplication. And there is a case when the result of the operation is empty set (when there is no suitable rule). In the current version, I have to set the explicit value of this empty set, but in the semantics of algorithm, it is equal to an explicit value, which doesnt occur in the pattern, because it means "there isnt the path that satisfies at least one predicates between this vertices". That gets into serious troubles. The main terrible thing can happen even after the first matrix multiplication due to the appearance of a huge number of "zeroes" explicit values. Even if we clear all unnecessary values after each operation thanks to the selection operation, the zeroes values will come to us after multiplication, and before selection in the worst case will permeate the swap memory, get out of there and kill the process. And this is a real-life case. The other problem is performance due to many unnecessary operations. At first, We have to add implicit unnecessary value, at second, delete this value. It seems a little strange. So it would be very nice to be able to change the behaviour of operations (make it drop unnecessary values) or to return from user-defined operation function special value or something else. This will make it more flexible. In conclusion, I want to thank everyone) I am very pleased to participate in the conversation. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=AEYIIOPIYQZHWVYFKN6PAKTQO62ZHA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBOMVMY#issuecomment-542952115>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEYIIOINHSMHC22GDD3HYETQO62ZHANCNFSM4JA4HHPA> .

simpletonDL · 2019-10-17T02:13:21Z

There is a problem exactly after GrB_mxm and before GxB_select, because zeroes respawn between these moments. GxB_select_ can reduce the problem, but it can`t solve it. If dropping identity values is impossible during the matrix multiplication (only for user-defined operations) due to algorithmic features, I will understand/

simpletonDL · 2019-10-17T02:16:49Z

In this case, I think, the issue can be closed. Thank you very much for your answer!

aydinbuluc · 2019-10-17T02:47:38Z

On Wed, Oct 16, 2019 at 6:33 PM Tim Davis ***@***.***> wrote: I see. Currently, if you're using SuiteSparse:GraphBLAS, it's a single call to GxB_select after each multiplication, to drop the zeros. It would be very hard to change my internal functions to drop zeros during the computation of GrB_mxm. There might be variants of GrB_mxm that could exploit it though. I'd have to think it over. (I currently have 10 different variants of C=A*B ... for each 1,040 different semirings, as well, plus the generic ones. Yes, I know that's crazy ...). Some of the 10 variants might be able to drop zeros on the fly, to same time and memory, but not very many of them. Most will have to compute all of C and then drop zeros when they are done. This is because of how the matrix multiplication gets done. Even sequentially, most of the methods are best done as two phases: one to do the symbolic work (deciding what the pattern of the result is) and the second phase computes the values. The first symbolic phase doesn't know the values so it can't drop anything. When computing in parallel, the multiphase approach is essential for parallelism. Dropping on the fly, during the GrB_mxm, makes the size of the result unpredictable, and thus not parallel. But it would be possible to include a nondefault descriptor option which would drop zeros *after* any GrB* function is done, just before it returns its result to the caller. So in that case, the performance of the following 2 cases would be the same: GrB_mxm (C, ...) GxB_select (C, NULL, NULL, GxB_NONZERO, C, NULL, NULL) ; or with a possible descriptor: // this doesn't work yet; just a straw man GrB_Descriptor_set (desc, GxB_DROP, true) ; GrB_Descriptor_set (desc, GxB_DROPVALUE, 0) ; GrB_mxm (C, ..., desc) ; Since I do the GxB_select in parallel, it's not an in-place algorithm, in either case. I have to create a new copy of the matrix with the zeros dropped. The advantage of the latter example, with the descriptor, would be that you wouldn't have to create the descriptor all the time. You could just pass the "drop descriptor" to all your favorite GrB* functions. I'll have to see what the C API committee thinks about this idea, of adding the GxB_DROP* fields to the descriptor. It's an extension (GxB*) but I like to get their feedback before I add things like this, anyway.

We had implemented a generalized version of this drop back in the KDT days, and it still exists in CombBLAS, though sparsely documented. It is both unfortunate and reassuring at the same time to see all the design decisions of KDT/CombBLAS percolating back up to GraphBLAS. I never brought this topic up because the solution we had was ugly and I had hoped some other solution would come up in the meantime. Here is how it worked. Each scalar multiplication operator had an additional field that they were allowed to modify: returnedSAID. If that field was set to yes, then the subsequent addition would not happen and the output would not be materialized. The symbolic phase could keep track of the number of time returnedSAID was true and not allocate space for them. This was our in-band signaling mechanism. The semiring implementer would decide what that particular semiring’s SAID (which stood for “semiring additive identity”) was. For built-in semirings, this is pretty obvious. Let's say we wanted to eliminate all edges that do not satisfy a certain criteria. In this case, it was some sort of "date" field being recent enough on the edge payload and the edge type being a "retweet". Then the scalar multiplication of the semiring would be something like (copied from here: https://people.eecs.berkeley.edu/~aydin/CombBLAS/html/_twitter_edge_8h_source.html ) static VECTYPE filtered_select2nd(const TwitterEdge <https://people.eecs.berkeley.edu/~aydin/CombBLAS/html/class_twitter_edge.html> & arg1, const VECTYPE & arg2, time_t & sincedate) { .... if(arg1.isRetwitter <https://people.eecs.berkeley.edu/~aydin/CombBLAS/html/class_twitter_edge.html#ab81eec42af325334d27c5aae09a6bf68>() && arg1.LastTweetBy <https://people.eecs.berkeley.edu/~aydin/CombBLAS/html/class_twitter_edge.html#a06d6a686329cf6e18e1e8dbc5cab40fd>(sincedate)) // T1 is of type edges for BFS { return arg2; } else { SR::returnedSAID(true); return VECTYPE(); } } I guess all I am saying is that we can incorporate this to GraphBLAS as you suggested but the way you described would still materialize that “drop” object in the sparse matrix, albeit temporarily, and then remove it. In-band SAID signaling solved that very efficiently and in a rather general way for us, albeit in a relatively ugly way.

…

But the descriptor would likely not speed things up very much, at least at first, as compared to having you call GxB_select at particular points in your code. Is including GxB_select after your GrB_mxm (and any GrB* function) enough to solve your memory thrashing problem? On Wed, Oct 16, 2019 at 8:04 PM simpletonDL ***@***.***> wrote: > Thank you very much for a very detailed explanation of this issue, it`s > great!) > > I understand why automatic dropping of zeros isnt allowed in GraphBlas > and why it breaks the framework. So I dont pretend to include this > feature by default, but I would be very grateful if you, if possible, could > add the ability to change operation behaviour (like some entry in the > operation descriptor or something else). So I want to give an additional > more real-life example when it would be very useful and the selection > operation wouldn`t be enough. > > The problem will appear if we want to change the graph dynamically, e.g. > adding edges step by step. Suppose we want to find all paths in the > directed graph which satisfy some conditions. For simplicity, lets assume > that conditions are predicates A, B, C so the entry of the matrix G that > corresponds to graph is a subset of {A, B, C}. So predicate P belongs to > G[i][j] iff there is a path from node i to j, which satisfies the predicate > P. Also, we have some rules, that allow merging two paths, whose final > vertices coincide. E.g. if some path from i to j satisfies the predicate B > and some path from j to k satisfy predicate C, then the path from i to k > satisfies the predicate A. Thus these rules constitute the semiring, whose > elements are subsets of {A, B, C} and binary multiplication operation > corresponds to applying all rules to subsets. In the example above, {B} > multiplied by {C} is {A}, but {C} multiplied by {C} is empty set, because > there isnt such kind of rule. The addition operation, you guessed it, is > a simple union of subsets. > > At the beginning of the algorithm, we initialize matrix with some subsets > (which ones don't matter), so the base of the algorithm is the answer for > paths of length 1. Then we multiply the matrix by itself and receive an > answer for paths of length 2. Then we get the union of matrices for paths > of length 2 and 1 and can repeat multiplication to get answers for paths of > length 3. And so on. I believe, that these iterations will converge :D > > In this algorithm, I need to create own function to implement set > multiplication. And there is a case when the result of the operation is > empty set (when there is no suitable rule). In the current version, I have > to set the explicit value of this empty set, but in the semantics of > algorithm, it is equal to an explicit value, which doesnt occur in the > pattern, because it means "there isnt the path that satisfies at least > one predicates between this vertices". > > That gets into serious troubles. The main terrible thing can happen even > after the first matrix multiplication due to the appearance of a huge > number of "zeroes" explicit values. Even if we clear all unnecessary values > after each operation thanks to the selection operation, the zeroes values > will come to us after multiplication, and before selection in the worst > case will permeate the swap memory, get out of there and kill the process. > And this is a real-life case. > > The other problem is performance due to many unnecessary operations. At > first, We have to add implicit unnecessary value, at second, delete this > value. It seems a little strange. > > So it would be very nice to be able to change the behaviour of operations > (make it drop unnecessary values) or to return from user-defined operation > function special value or something else. This will make it more flexible. > > In conclusion, I want to thank everyone) I am very pleased to participate > in the conversation. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > < #28?email_source=notifications&email_token=AEYIIOPIYQZHWVYFKN6PAKTQO62ZHA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBOMVMY#issuecomment-542952115 >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AEYIIOINHSMHC22GDD3HYETQO62ZHANCNFSM4JA4HHPA > > . > — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=AAMJ7LZOK6NJIQXUI7UOHVTQO66GZA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBOOCMY#issuecomment-542957875>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAMJ7LZWU2T7FSYCE6JKL6TQO66GZANCNFSM4JA4HHPA> .

simpletonDL · 2019-10-17T02:50:48Z

If the first part of matrix multiplication computes the pattern of result, then in any case memory will be allocated for identity values. So I understand that there is no way to increase memory in this case (may be, it works only for some special built-in types).

DrTimothyAldenDavis · 2019-10-17T15:20:43Z

I think there is a better solution, one that uses the predicate B as a mask. There's no need to drop zeros, just don't compute them in the first place (that is the purpose of the mask). In your earlier email, you wrote: The problem will appear if we want to change the graph dynamically, e.g. adding edges step by step. Suppose we want to find all paths in the directed graph which satisfy some conditions. For simplicity, lets assume that conditions are predicates A, B, C so the entry of the matrix G that corresponds to graph is a subset of {A, B, C}. So predicate P belongs to G[i][j] iff there is a path from node i to j, which satisfies the predicate P. Also, we have some rules, that allow merging two paths, whose final vertices coincide. E.g. if some path from i to j satisfies the predicate B and some path from j to k satisfy predicate C, then the path from i to k satisfies the predicate A. Thus these rules constitute the semiring, whose elements are subsets of {A, B, C} and binary multiplication operation corresponds to applying all rules to subsets. In the example above, {B} multiplied by {C} is {A}, but {C} multiplied by {C} is empty set, because there isnt such kind of rule. The addition operation, you guessed it, is a simple union of subsets. This sounds like the GraphBLAS operation G<B>=A*C'. Let me ask the following. Is the following computation being done? I will write it as if it considers all i and j, but don't fear, this is not what I do. Just the mathematical specification: Let A, B, C be square boolean matrices of dimension n. The matrix G will be n by n and is currently empty. for all i = 1 to n for all j = 1 to n if (B (i,j)) is true then for all k = 1 to n G (i,j) = G(i,j) OR (A(i,k) AND C (j,k)) If that is what you want to compute, then it is a very fast GrB_mxm computation, G<B> = A*C'. I do not take O(n^3) to do the above computation. The above is just a simple mathematical specification of what is computed by the following: GrB_Descriptor_new (&desc) ; GrB_Descriptor_set (desc, GrB_IN1, GrB_TRAN) ; GrB_mxm (G, B, NULL, GxB_LOR_LAND_BOOL, A, C, desc) ; The GxB_LOR_LAND_BOOL is the boolean monoid. If instead you want A and C to be integer, and want to compute the following: for all i = 1 to n for all j = 1 to n if (B (i,j)) is true then for all k = 1 to n G (i,j) = G(i,j) + (A(i,k) * C (k,j)) Then the computation is also fast, just with different semiring. If A and C are stored in their default format, which is by row (CSR), then this will use my masked dot product, internally. That function is very fast, very parallel, and very memory efficient. It will use no more than O (nne(B)) memory, where nne(B) is the number of explicit entries present in B. I do not need to transpose the matrix C to compute G<B>=A*C'. If G is changing dynamically, then you might also consider using an accumulator operator, like G<B> += A*C'. The matrix B(i,j) does not have to be boolean. It can be any built-in type, which are all typecastable to bool. So in that case, the above pseudocode would read "if B(i,j) is nonzero then...". You may still want to drop zeros after the fact, if G(i,j) is computed yet becomes explicitly zero. That could happen if A or C have negative entries in them, or if there is an accum operator and G(i,j) starts out negative and then becomes zero after the accumulation occurs. Is this what you want to compute?

…

On Wed, Oct 16, 2019 at 9:50 PM simpletonDL ***@***.***> wrote: If the first part of matrix multiplication computes the pattern of result, then in any case memory will be allocated for identity values. So I understand that there is no way to increase memory in this case (may be, it works only for some special built-in types). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=AEYIIOJPLJ3AIIDICSJBSP3QO7HIVA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBOSAKI#issuecomment-542973993>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEYIIOLGIXUK6WGVNRP6G5LQO7HIVANCNFSM4JA4HHPA> .

simpletonDL · 2019-10-18T10:00:20Z

Oh, it’s interesting, I’ll think about it and try to do it, and say whether it worked out

johnrgilbert · 2019-10-19T20:14:35Z

This is a great discussion about an issue that's been interesting and important (and sometimes controversial) going back to KDT, CombBLAS, and even sparse Matlab in the early 1990s. I just planted a link to it on GraphBLAS.org :-)

DrTimothyAldenDavis closed this as completed Nov 1, 2019

szarnyasg mentioned this issue Jul 28, 2021

Update set method for sparse to only add non-zero elements lessthanoptimal/ejml#145

Open

1 task

galaxy001 mentioned this issue Feb 20, 2024

How to drop zeros for a matrix ? python-graphblas/python-graphblas#539

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop identity values problem #28

Drop identity values problem #28

simpletonDL commented Oct 15, 2019

DrTimothyAldenDavis commented Oct 15, 2019 •

edited

Loading

tgmattso commented Oct 15, 2019 via email

DrTimothyAldenDavis commented Oct 15, 2019 via email

gsvgit commented Oct 16, 2019

DrTimothyAldenDavis commented Oct 16, 2019 via email

ScottKolo commented Oct 16, 2019

DrTimothyAldenDavis commented Oct 16, 2019 via email

aydinbuluc commented Oct 16, 2019 via email

gsvgit commented Oct 16, 2019

DrTimothyAldenDavis commented Oct 16, 2019 via email

mcmillan03 commented Oct 16, 2019 via email

mcmillan03 commented Oct 16, 2019 via email

DrTimothyAldenDavis commented Oct 16, 2019 via email

szarnyasg commented Oct 16, 2019 •

edited

Loading

tgmattso commented Oct 16, 2019 via email

mcmillan03 commented Oct 16, 2019 via email

simpletonDL commented Oct 17, 2019 •

edited

Loading

DrTimothyAldenDavis commented Oct 17, 2019 via email

simpletonDL commented Oct 17, 2019

simpletonDL commented Oct 17, 2019

aydinbuluc commented Oct 17, 2019 via email

simpletonDL commented Oct 17, 2019

DrTimothyAldenDavis commented Oct 17, 2019 via email

simpletonDL commented Oct 18, 2019

johnrgilbert commented Oct 19, 2019

Drop identity values problem #28

Drop identity values problem #28

Comments

simpletonDL commented Oct 15, 2019

DrTimothyAldenDavis commented Oct 15, 2019 • edited Loading

tgmattso commented Oct 15, 2019 via email

DrTimothyAldenDavis commented Oct 15, 2019 via email

gsvgit commented Oct 16, 2019

DrTimothyAldenDavis commented Oct 16, 2019 via email

ScottKolo commented Oct 16, 2019

DrTimothyAldenDavis commented Oct 16, 2019 via email

aydinbuluc commented Oct 16, 2019 via email

gsvgit commented Oct 16, 2019

DrTimothyAldenDavis commented Oct 16, 2019 via email

mcmillan03 commented Oct 16, 2019 via email

mcmillan03 commented Oct 16, 2019 via email

DrTimothyAldenDavis commented Oct 16, 2019 via email

szarnyasg commented Oct 16, 2019 • edited Loading

tgmattso commented Oct 16, 2019 via email

mcmillan03 commented Oct 16, 2019 via email

simpletonDL commented Oct 17, 2019 • edited Loading

DrTimothyAldenDavis commented Oct 17, 2019 via email

simpletonDL commented Oct 17, 2019

simpletonDL commented Oct 17, 2019

aydinbuluc commented Oct 17, 2019 via email

simpletonDL commented Oct 17, 2019

DrTimothyAldenDavis commented Oct 17, 2019 via email

simpletonDL commented Oct 18, 2019

johnrgilbert commented Oct 19, 2019

DrTimothyAldenDavis commented Oct 15, 2019 •

edited

Loading

szarnyasg commented Oct 16, 2019 •

edited

Loading

simpletonDL commented Oct 17, 2019 •

edited

Loading