-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop identity values problem #28
Comments
A very good question. It points out a feature of GraphBLAS, since "zero" can differ depending on the semiring (in a path distance problem, for example, an edge of weight zero is very different than no edge at all). So zeros cannot be dropped automatically inside GraphBLAS. But there are cases when you do want to delete entries, like all explicit zeros. It takes a second step to delete entries from a matrix. If you are using SuiteSparse:GraphBLAS, then you can use the following to drop explicit zeros from the GrB_Matrix A. This works for any matrix, including any user-defined type.
GxB_select can also be used to drop any other particular value (or range of values, using, say, GxB_GT_ZERO, which keeps only those entries greater than zero, dropping values that are zero or less). GxB_GT_ZERO only works for the 11 built-in types, while GxB_NONZERO works for any type, including user-defined types. For user-defined types, it checks to see if the bit pattern is all zero, and keeps those that have at least one 1 bit in them. So if your typedef is a struct with "holes" in it, this might not always work as expected. If you are using another GraphBLAS library, you need to use the matrix as its own mask (assuming A has a built-in type, not a user-defined type).
where Replace is a descriptor with the replace option turned on. If A has a user-defined type, you first have to create a boolean matrix, where M(i,j) = 0 if A(i,j) is zero, or M(i,j)=1 otherwise. That can be done with a user-defined typecast function, via GrB_apply:
(technically speaking, all the "NULL"s above should be GrB_NULL ... but NULL works the same as GrB_NULL in SuiteSparse:GraphBLAS). |
You know, that GxB_select() is a darn useful function. We should add it to the next GraphBLAS release.
…--tim
From: Tim Davis <[email protected]>
Reply-To: GraphBLAS/LAGraph <[email protected]>
Date: Tuesday, October 15, 2019 at 7:58 AM
To: GraphBLAS/LAGraph <[email protected]>
Cc: Subscribed <[email protected]>
Subject: Re: [GraphBLAS/LAGraph] Drop identity values problem (#28)
A very good question. It points out a feature of GraphBLAS, since "zero" can differ depending on the semiring (in a path distance problem, for example, an edge of weight zero is very different than no edge at all). So zeros cannot be dropped automatically inside GraphBLAS.
But there are cases when you do want to delete entries, like all explicit zeros.
It takes a second step to delete entries from a matrix. If you are using SuiteSparse:GraphBLAS, then you can use the following to drop explicit zeros from the GrB_Matrix A. This works for any matrix, including any user-defined type.
GxB_select (A, NULL, NULL, GxB_NONZERO, A, NULL, NULL) ;
GxB_select can also be used to drop any other particular value (or range of values, using, say, GxB_GT_ZERO, which keeps only those entries greater than zero, dropping values that are zero or less). GxB_GT_ZERO only works for the 11 built-in types, while GxB_NONZERO works for any type, including user-defined types. For user-defined types, it checks to see if the bit pattern is all zero, and keeps those that have at least one 1 bit in them. So if your typedef is a struct with "holes" in it, this might not always work as expected.
If you are using another GraphBLAS library, you need to use the matrix as its own mask (assuming A has a built-in type, not a user-defined type).
GrB_assign (A, A, NULL, A, GrB_ALL, nrows, GrB_ALL, ncols, Replace) ;
where Replace is a descriptor with the replace option turned on.
If A has a user-defined type, you first have to create a boolean matrix, where M(i,j) = 0 if A(i,j) is zero, or M(i,j)=1 otherwise. That can be done with a user-defined typecast function, via GrB_apply:
void my_typecast_func (void *z, const void *x)
{
bool result = 0 if x is zero, 1 if x is nonzero
((*bool) z) = result ;
}
GrB_UnaryOp_new (&My_typecast_function, my_typecast_func, GrB_BOOL, My_type) ;
GrB_Matrix_new (&M, GrB_BOOL, nrows, ncols) ;
GrB_apply (M, NULL, NULL, My_typecast_function, A, NULL) ;
GrB_assign (A, M, NULL, A, GrB_ALL, nrows, GrB_ALL, ncols, Replace) ;
(technically speaking, all the "NULL"s above should be GrB_NULL ... but NULL works the same as GrB_NULL in SuiteSparse:GraphBLAS).
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#28?email_source=notifications&email_token=AATVMEYMVD5N4T5U2VMFSDTQOXK7JA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJCS7I#issuecomment-542255485>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AATVMEY2BCJI63SLE4TEJMDQOXK7JANCNFSM4JA4HHPA>.
|
Yes, GxB_select is very useful. I used it for both MIT GraphChallenge
solutions, and for some parts of LAGraph.
The triangle count needs the same as L=tril(A) in MATLAB (extract the lower
triangular part).
That is tricky do in pure GraphBLAS. You can't do it with a mask. The
only way to do it is with GrB_extractTuples,
and then delete the tuples you don't want. Tedious...
I also needed it for the ReLU, to drop values that were less than or equal
to zero.
So it seems to be an important function. GxB_select acts kind of like a
functional mask,
which GraphBLAS doesn't have.
…On Tue, Oct 15, 2019 at 10:35 AM Tim Mattson ***@***.***> wrote:
You know, that GxB_select() is a darn useful function. We should add it to
the next GraphBLAS release.
--tim
From: Tim Davis ***@***.***>
Reply-To: GraphBLAS/LAGraph ***@***.***>
Date: Tuesday, October 15, 2019 at 7:58 AM
To: GraphBLAS/LAGraph ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [GraphBLAS/LAGraph] Drop identity values problem (#28)
A very good question. It points out a feature of GraphBLAS, since "zero"
can differ depending on the semiring (in a path distance problem, for
example, an edge of weight zero is very different than no edge at all). So
zeros cannot be dropped automatically inside GraphBLAS.
But there are cases when you do want to delete entries, like all explicit
zeros.
It takes a second step to delete entries from a matrix. If you are using
SuiteSparse:GraphBLAS, then you can use the following to drop explicit
zeros from the GrB_Matrix A. This works for any matrix, including any
user-defined type.
GxB_select (A, NULL, NULL, GxB_NONZERO, A, NULL, NULL) ;
GxB_select can also be used to drop any other particular value (or range
of values, using, say, GxB_GT_ZERO, which keeps only those entries greater
than zero, dropping values that are zero or less). GxB_GT_ZERO only works
for the 11 built-in types, while GxB_NONZERO works for any type, including
user-defined types. For user-defined types, it checks to see if the bit
pattern is all zero, and keeps those that have at least one 1 bit in them.
So if your typedef is a struct with "holes" in it, this might not always
work as expected.
If you are using another GraphBLAS library, you need to use the matrix as
its own mask (assuming A has a built-in type, not a user-defined type).
GrB_assign (A, A, NULL, A, GrB_ALL, nrows, GrB_ALL, ncols, Replace) ;
where Replace is a descriptor with the replace option turned on.
If A has a user-defined type, you first have to create a boolean matrix,
where M(i,j) = 0 if A(i,j) is zero, or M(i,j)=1 otherwise. That can be done
with a user-defined typecast function, via GrB_apply:
void my_typecast_func (void *z, const void *x)
{
bool result = 0 if x is zero, 1 if x is nonzero
((*bool) z) = result ;
}
GrB_UnaryOp_new (&My_typecast_function, my_typecast_func, GrB_BOOL,
My_type) ;
GrB_Matrix_new (&M, GrB_BOOL, nrows, ncols) ;
GrB_apply (M, NULL, NULL, My_typecast_function, A, NULL) ;
GrB_assign (A, M, NULL, A, GrB_ALL, nrows, GrB_ALL, ncols, Replace) ;
(technically speaking, all the "NULL"s above should be GrB_NULL ... but
NULL works the same as GrB_NULL in SuiteSparse:GraphBLAS).
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<
#28?email_source=notifications&email_token=AATVMEYMVD5N4T5U2VMFSDTQOXK7JA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJCS7I#issuecomment-542255485>,
or unsubscribe<
https://github.com/notifications/unsubscribe-auth/AATVMEY2BCJI63SLE4TEJMDQOXK7JANCNFSM4JA4HHPA>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#28?email_source=notifications&email_token=AEYIIOJQLINZGYTNAPQNRRDQOXPNBA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJHBSA#issuecomment-542273736>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEYIIOMAN7WWG4S62FCQB4LQOXPNBANCNFSM4JA4HHPA>
.
|
Hello. I have the same question. I clearly understand why it is not good idea to remove zero values. But what if I explicitly specify zero as an identity in the monoid? In path distance problem the identity is not zero, so we should not delete zeroes, but I think that we can remove minus infinity which is identity. So, the question is about identities: is it possible to drop identities out automatically during operations over sparse matrices? |
A matrix doesn’t remain in a single semiring in an algorithm. It can be
used in multiple semirings. There are several examples of this
So the value that isn’t there is suddenly different. It changes with the
semiring.
As a result, it’s impossible to automatically drop any values
On Wed, Oct 16, 2019 at 12:52 AM Semyon ***@***.***> wrote:
Hello.
I have the same question. I clearly understand why it is not good idea to
remove zero values. But what if I explicitly specify zero as an identity in
the monoid? In path distance problem the identity is not zero, so we should
not delete zeroes, but I think that we can remove minus infinity which is
identity. So, the question is about identities: is it possible to drop
identities out automatically during operations over sparse matrices?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#28?email_source=notifications&email_token=AEYIIOMMQMVYBQNE4MNYNRDQO2T3JA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBLFXCY#issuecomment-542530443>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEYIIOIOLFA3HLR7SDXKOBLQO2T3JANCNFSM4JA4HHPA>
.
--
Sent from Gmail Mobile
|
I agree that this is not something that should be done automatically, but it would be convenient to have a utility method or canonical way of doing it. The GxB_select approach seems to be that, so I also agree with the calls to get that into the standard (this application alone justifies it in my opinion). I think the usual argument here is that not dropping identity values in some cases could result in a lot of fill-in down the road, leading to performance issues. Maybe an LAGraph utility function would be a nice middle ground? |
Yes, adding it to LAGraph would be a good idea. It would use an #ifdef so
that the GxB_select can be used if SuiteSparse:GraphBLAS is in use, and
would use GrB* functions otherwise.
I have a function in my MATLAB interface to do this as well, as A =
GrB.prune (A). By default, it prunes zeros. To prune other values equal
to the identity id, use A = GrB.prune (A, id).
…On Wed, Oct 16, 2019 at 9:31 AM Scott Kolodziej ***@***.***> wrote:
I agree that this is not something that should be done *automatically,*
but it would be convenient to have a utility method or canonical way of
doing it. The GxB_select approach seems to be that, so I also agree with
the calls to get that into the standard (this application alone justifies
it in my opinion).
I think the usual argument here is that not dropping identity values in
some cases could result in a lot of fill-in down the road, leading to
performance issues.
Maybe an LAGraph utility function would be a nice middle ground?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#28?email_source=notifications&email_token=AEYIIOM56AHABGAUTJM744LQO4QTRA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBMWSBA#issuecomment-542730500>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEYIIONIISEAJSNAPFF6ZALQO4QTRANCNFSM4JA4HHPA>
.
|
I am actually surprised that we managed to not include an easy way to do
this in GraphBLAS. Prune always existed in CombBLAS (by now, for a decade)
…On Wed, Oct 16, 2019 at 8:24 AM Tim Davis ***@***.***> wrote:
Yes, adding it to LAGraph would be a good idea. It would use an #ifdef so
that the GxB_select can be used if SuiteSparse:GraphBLAS is in use, and
would use GrB* functions otherwise.
I have a function in my MATLAB interface to do this as well, as A =
GrB.prune (A). By default, it prunes zeros. To prune other values equal
to the identity id, use A = GrB.prune (A, id).
On Wed, Oct 16, 2019 at 9:31 AM Scott Kolodziej ***@***.***>
wrote:
> I agree that this is not something that should be done *automatically,*
> but it would be convenient to have a utility method or canonical way of
> doing it. The GxB_select approach seems to be that, so I also agree with
> the calls to get that into the standard (this application alone justifies
> it in my opinion).
>
> I think the usual argument here is that not dropping identity values in
> some cases could result in a lot of fill-in down the road, leading to
> performance issues.
>
> Maybe an LAGraph utility function would be a nice middle ground?
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <
#28?email_source=notifications&email_token=AEYIIOM56AHABGAUTJM744LQO4QTRA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBMWSBA#issuecomment-542730500
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AEYIIONIISEAJSNAPFF6ZALQO4QTRANCNFSM4JA4HHPA
>
> .
>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#28?email_source=notifications&email_token=AAMJ7L2C55CN7YB2D3I4JHDQO4W27A5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM42CI#issuecomment-542756105>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMJ7L3TNSL2CKLZIG4QQKLQO4W27ANCNFSM4JA4HHPA>
.
|
Ah... I see. @DrTimothyAldenDavis thank you for the explanation! And even for such operation like
So, I guess that
|
Automatic dropping of zeros (say in MATLAB) is an awful thing to do. But
it's perfect to add as a non-default option, where the user is able to
prune things easily at any time. But it can't be done automatically, for
many reasons:
First of all, it breaks the semirings in GraphBLAS. Switching between
semirings causes all implicit values to change but not explicit values, so
the explicit zero is never the same thing as an implicit entry that is not
present in the pattern. The matrix has no tag that tells what semiring
it's in, nor a tag to say what the implicit value is, so there's no select
function to change a matrix from one semiring to another.
Second, it destroys all the graph theoretic structure in the resulting
matrices. There are things I could do inside MATLAB, but I can't because
it drops zeros all the time (MATLAB uses my solvers for x=A\b, and I also
do C=A*B when A and/or B are sparse, inside MATLAB). In GraphBLAS, in the
future, I could speed up GrB_mxm on a sequence of matrices with the same
pattern, so the pattern of the result never changes. That way, I could
cache the symbolic analysis and reuse it. Zoom ... but if you make me
drop things, this breaks and I can't do it.
Third, it's slow. If a few zeros are in the matrix, it's faster to leave
them there, and prune as needed. Changing the pattern of a matrix can
cause a huge slowdown. Zombies are better for this (that's a long story...
http://aldenmath.com/my-friendly-zombie/ ). (I should probably turn our
discussion into a blog post there because this is a very important
question).
Fourth, there are times in GraphBLAS where you want to keep all zeros.
GraphBLAS does not have a different object for dense or sparse matrices, as
MATLAB does. There are times when dense is faster ... say a vector of size
n, that gives the depth of each node in a breadth-first-search. That
vector starts out sparse (empty, actually) and slow accumulates entries
until it becomes dense. But each time new entries get added, I have to
redo the whole data structure (in my implementation). So it's far faster
to start it dense, with explicit zeros (or whatever identity values it
needs). In this case, any kind of automatic dropping is bad.
Fifth, it's unpredictable. Say the result is floating point epsilon,
because of roundoff. So it is kept. But in another machine the result is
zero. So you get a different combinatoric result depending on what your
roundoff is, what your compiler -O flag is, what your compiler is, if
you're in parallel or not, on the GPU or not ... ack. Now try to explore
a bug where your pattern differs from what you expect. Turn on -g, and
your bug goes away. Heisenbug. Nasty.
Having said all this, it is essential that some algorithms need to drop
entries that match some specific criterion, like "drop all zeros", "drop
all nans", "drop all entries <= 0", and even "drop all entries that satisfy
some condition determined by a function f (aij, i, j, m, n, thunk) where
aij is the value, i and j are the indices, thunk is some user-defined
'scalar', etc". That can be used for all sorts of things, like L=tril(A)
in MATLAB, which cannot be done easily in pure GraphBLAS.
So I absolutely agree that it needs to be simple to drop things. It just
can never be done automatically.
…-- Tim
On Wed, Oct 16, 2019 at 10:30 AM Semyon ***@***.***> wrote:
Ah... I see. @DrTimothyAldenDavis <https://github.com/DrTimothyAldenDavis>
thank you for the explanation!
And even for such operation like GrB_mxm where we should specify
semiring, we still have no enough information to drop identities of the
given semiring automatically?
Suppose the next case.
1. I have sparse matrices A and B without explicit zero values.
2. I perform matrix multiplication A * B over semiring where zero is
identity. The result is matrix C in which the value of some cells is
explicit zero (because we do not drop it out), and the value of some cells
is implicit zero. The first (and principal for me) question here is why the
behavior of operation is not agreed with specified semiring? And the second
is mentioned by @ScottKolo <https://github.com/ScottKolo>: such
behavior can lead to poor performance.
3. Now I want to use C in operation over semiring in which zero is not
identity. And now I'm confused. Because in terms of the result type all
implicit and explicit zeros are equal. But in terms of argument type (C
is an argument of operation over another semiring) implicit values and
explicit zeros are different.
So, I guess that
1. The behavior of operation with specified semiring should be agreed
with this semiring.
2. If I want to switch from one semiring to another, I should do it
explicitly by using the select function, for example.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#28?email_source=notifications&email_token=AEYIIOPT4ZSHFKXCQFOVOSDQO4XRXA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM5RQQ#issuecomment-542759106>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEYIIOIVVOUUR3GNRCZO6EDQO4XRXANCNFSM4JA4HHPA>
.
|
IIRC, it was because "it could be done with a combination of existing
operations." GxB_select is in the list of issues to consider for inclusion
in the next update to the spec.
By the way, using something called GrB_select() to "remove" unwanted values
from a matrix/vector is a bit counter-intuitive. "Prune" implies a
heuristic which might be useful (especially supporting binaryops and scalar
constants as one input). Soliciting ideas for names
On Wed, Oct 16, 2019 at 8:26 AM Aydin Buluc <[email protected]>
wrote:
… I am actually surprised that we managed to not include an easy way to do
this in GraphBLAS. Prune always existed in CombBLAS (by now, for a decade)
On Wed, Oct 16, 2019 at 8:24 AM Tim Davis ***@***.***>
wrote:
> Yes, adding it to LAGraph would be a good idea. It would use an #ifdef so
> that the GxB_select can be used if SuiteSparse:GraphBLAS is in use, and
> would use GrB* functions otherwise.
>
> I have a function in my MATLAB interface to do this as well, as A =
> GrB.prune (A). By default, it prunes zeros. To prune other values equal
> to the identity id, use A = GrB.prune (A, id).
>
> On Wed, Oct 16, 2019 at 9:31 AM Scott Kolodziej <
***@***.***>
> wrote:
>
> > I agree that this is not something that should be done *automatically,*
> > but it would be convenient to have a utility method or canonical way of
> > doing it. The GxB_select approach seems to be that, so I also agree
with
> > the calls to get that into the standard (this application alone
justifies
> > it in my opinion).
> >
> > I think the usual argument here is that not dropping identity values in
> > some cases could result in a lot of fill-in down the road, leading to
> > performance issues.
> >
> > Maybe an LAGraph utility function would be a nice middle ground?
> >
> > —
> > You are receiving this because you commented.
> > Reply to this email directly, view it on GitHub
> > <
>
#28?email_source=notifications&email_token=AEYIIOM56AHABGAUTJM744LQO4QTRA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBMWSBA#issuecomment-542730500
> >,
> > or unsubscribe
> > <
>
https://github.com/notifications/unsubscribe-auth/AEYIIONIISEAJSNAPFF6ZALQO4QTRANCNFSM4JA4HHPA
> >
> > .
> >
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <
#28?email_source=notifications&email_token=AAMJ7L2C55CN7YB2D3I4JHDQO4W27A5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM42CI#issuecomment-542756105
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AAMJ7L3TNSL2CKLZIG4QQKLQO4W27ANCNFSM4JA4HHPA
>
> .
>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#28?email_source=notifications&email_token=AANXEP37VTFPXZE73PDC2JDQO4XDFA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM5CRA#issuecomment-542757188>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANXEPYQO47EEGI67IHO67DQO4XDFANCNFSM4JA4HHPA>
.
|
Mathematicians please chime in...what I am about to say is secondhand
explanation that was given to me years ago....
Note that you are using a semiring which does not define additive inverse
(e.g. "minus"). The production of a "zero" is happenstance (because an
additive inverse operation occurred somewhere either by adding a negated
value or subtraction...which are not part of the semiring). I would defer
to the more mathematically inclined to correct my understanding.
…On Wed, Oct 16, 2019 at 8:30 AM Semyon ***@***.***> wrote:
Ah... I see. @DrTimothyAldenDavis <https://github.com/DrTimothyAldenDavis>
thank you for the explanation!
And even for such operation like GrB_mxm where we should specify
semiring, we still have no enough information to drop identities of the
given semiring automatically?
Suppose the next case.
1. I have sparse matrices A and B without explicit zero values.
2. I perform matrix multiplication A * B over semiring where zero is
identity. The result is matrix C in which the value of some cells is
explicit zero (because we do not drop it out), and the value of some cells
is implicit zero. The first (and principal for me) question here is why the
behavior of operation is not agreed with specified semiring? And the second
is mentioned by @ScottKolo <https://github.com/ScottKolo>: such
behavior can lead to poor performance.
3. Now I want to use C in operation over semiring in which zero is not
identity. And now I'm confused. Because in terms of the result type all
implicit and explicit zeros are equal. But in terms of argument type (C
is an argument of operation over another semiring) implicit values and
explicit zeros are different.
So, I guess that
1. The behavior of operation with specified semiring should be agreed
with this semiring.
2. If I want to switch from one semiring to another, I should do it
explicitly by using the select function, for example.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#28?email_source=notifications&email_token=AANXEP3NZYTFZT6WH77B3O3QO4XRXA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM5RQQ#issuecomment-542759106>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANXEP33DWWXVWKLOUAC6BLQO4XRXANCNFSM4JA4HHPA>
.
|
GxB_select was named that way because it doesn't prune. It selects entries
for the output. So for example, for the sparse deep neural network, to
select only positive entries, I do
GxB_select (A, ... , GxB_GT_ZERO...).
and to delete explicit zeros I do
GxB_select (A, ... , GxB_NONZERO ...).
which selects all zeros. GxB_select keeps all entries A(i,j) for which the
selectop f (aij,i,j,m,n,thunk) is true, just as the mask M(i,j)=true
selects the entry (i,j) to be written to the result.
I'm open to other naming alternatives, though. I considered something with
the word "mask" in it, but it acts differently than the mask so I avoided
that name as potentially confusing.
On Wed, Oct 16, 2019 at 11:07 AM Doc McMillan <[email protected]>
wrote:
… IIRC, it was because "it could be done with a combination of existing
operations." GxB_select is in the list of issues to consider for inclusion
in the next update to the spec.
By the way, using something called GrB_select() to "remove" unwanted values
from a matrix/vector is a bit counter-intuitive. "Prune" implies a
heuristic which might be useful (especially supporting binaryops and scalar
constants as one input). Soliciting ideas for names
On Wed, Oct 16, 2019 at 8:26 AM Aydin Buluc ***@***.***>
wrote:
> I am actually surprised that we managed to not include an easy way to do
> this in GraphBLAS. Prune always existed in CombBLAS (by now, for a
decade)
>
> On Wed, Oct 16, 2019 at 8:24 AM Tim Davis ***@***.***>
> wrote:
>
> > Yes, adding it to LAGraph would be a good idea. It would use an #ifdef
so
> > that the GxB_select can be used if SuiteSparse:GraphBLAS is in use, and
> > would use GrB* functions otherwise.
> >
> > I have a function in my MATLAB interface to do this as well, as A =
> > GrB.prune (A). By default, it prunes zeros. To prune other values equal
> > to the identity id, use A = GrB.prune (A, id).
> >
> > On Wed, Oct 16, 2019 at 9:31 AM Scott Kolodziej <
> ***@***.***>
> > wrote:
> >
> > > I agree that this is not something that should be done
*automatically,*
> > > but it would be convenient to have a utility method or canonical way
of
> > > doing it. The GxB_select approach seems to be that, so I also agree
> with
> > > the calls to get that into the standard (this application alone
> justifies
> > > it in my opinion).
> > >
> > > I think the usual argument here is that not dropping identity values
in
> > > some cases could result in a lot of fill-in down the road, leading to
> > > performance issues.
> > >
> > > Maybe an LAGraph utility function would be a nice middle ground?
> > >
> > > —
> > > You are receiving this because you commented.
> > > Reply to this email directly, view it on GitHub
> > > <
> >
>
#28?email_source=notifications&email_token=AEYIIOM56AHABGAUTJM744LQO4QTRA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBMWSBA#issuecomment-542730500
> > >,
> > > or unsubscribe
> > > <
> >
>
https://github.com/notifications/unsubscribe-auth/AEYIIONIISEAJSNAPFF6ZALQO4QTRANCNFSM4JA4HHPA
> > >
> > > .
> > >
> >
> > —
> > You are receiving this because you are subscribed to this thread.
> > Reply to this email directly, view it on GitHub
> > <
>
#28?email_source=notifications&email_token=AAMJ7L2C55CN7YB2D3I4JHDQO4W27A5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM42CI#issuecomment-542756105
> >,
> > or unsubscribe
> > <
>
https://github.com/notifications/unsubscribe-auth/AAMJ7L3TNSL2CKLZIG4QQKLQO4W27ANCNFSM4JA4HHPA
> >
> > .
> >
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <
#28?email_source=notifications&email_token=AANXEP37VTFPXZE73PDC2JDQO4XDFA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM5CRA#issuecomment-542757188
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AANXEPYQO47EEGI67IHO67DQO4XDFANCNFSM4JA4HHPA
>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#28?email_source=notifications&email_token=AEYIIOKMBHZ35VMAC5AZJMTQO432NA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBNBTJY#issuecomment-542775719>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEYIIOJ7VMSLAIX5MPMPFADQO432NANCNFSM4JA4HHPA>
.
|
Maybe
For me, this name works well for simple cases such as keeping the lower triangular part of a matrix. Not sure about the more complex cases (i.e. ones with a mask) though. |
GrB_keep() is a nice name, but I still like GrB_select() better.
From an SQL point of view, I’m used to using SELECT to choose the items I want to pull into a table. So the name is quite intuitive to people with exposure to SQL.
…-Tim
From: Gabor Szarnyas <[email protected]>
Reply-To: GraphBLAS/LAGraph <[email protected]>
Date: Wednesday, October 16, 2019 at 9:57 AM
To: GraphBLAS/LAGraph <[email protected]>
Cc: Tim Mattson <[email protected]>, Comment <[email protected]>
Subject: Re: [GraphBLAS/LAGraph] Drop identity values problem (#28)
Maybe GxB_keep would be a better name? The documentation says
Each entry A(i,j) is evaluated with the operator, which returns true if the entry is to be kept in the output, or false if it is not to appear in the output.
For me, this name works well for simple cases such as keeping the lower triangular part of a matrix. Not sure about the more complex cases (i.e. ones with masks) though.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#28?email_source=notifications&email_token=AATVME5IQJRNZJZSCMGJZK3QO5BW5A5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBNGQDA#issuecomment-542795788>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AATVME6SLILXRZPIIUQCH5LQO5BW5ANCNFSM4JA4HHPA>.
|
Okay...if I understand correctly the select approach still results in a two
step process with the formation of an intermediate matrix. Is there need
for a one step process?
…On Wed, Oct 16, 2019 at 9:14 AM Tim Davis ***@***.***> wrote:
GxB_select was named that way because it doesn't prune. It selects entries
for the output. So for example, for the sparse deep neural network, to
select only positive entries, I do
GxB_select (A, ... , GxB_GT_ZERO...).
and to delete explicit zeros I do
GxB_select (A, ... , GxB_NONZERO ...).
which selects all zeros. GxB_select keeps all entries A(i,j) for which the
selectop f (aij,i,j,m,n,thunk) is true, just as the mask M(i,j)=true
selects the entry (i,j) to be written to the result.
I'm open to other naming alternatives, though. I considered something with
the word "mask" in it, but it acts differently than the mask so I avoided
that name as potentially confusing.
On Wed, Oct 16, 2019 at 11:07 AM Doc McMillan ***@***.***>
wrote:
> IIRC, it was because "it could be done with a combination of existing
> operations." GxB_select is in the list of issues to consider for
inclusion
> in the next update to the spec.
>
> By the way, using something called GrB_select() to "remove" unwanted
values
> from a matrix/vector is a bit counter-intuitive. "Prune" implies a
> heuristic which might be useful (especially supporting binaryops and
scalar
> constants as one input). Soliciting ideas for names
>
> On Wed, Oct 16, 2019 at 8:26 AM Aydin Buluc ***@***.***>
> wrote:
>
> > I am actually surprised that we managed to not include an easy way to
do
> > this in GraphBLAS. Prune always existed in CombBLAS (by now, for a
> decade)
> >
> > On Wed, Oct 16, 2019 at 8:24 AM Tim Davis ***@***.***>
> > wrote:
> >
> > > Yes, adding it to LAGraph would be a good idea. It would use an
#ifdef
> so
> > > that the GxB_select can be used if SuiteSparse:GraphBLAS is in use,
and
> > > would use GrB* functions otherwise.
> > >
> > > I have a function in my MATLAB interface to do this as well, as A =
> > > GrB.prune (A). By default, it prunes zeros. To prune other values
equal
> > > to the identity id, use A = GrB.prune (A, id).
> > >
> > > On Wed, Oct 16, 2019 at 9:31 AM Scott Kolodziej <
> > ***@***.***>
> > > wrote:
> > >
> > > > I agree that this is not something that should be done
> *automatically,*
> > > > but it would be convenient to have a utility method or canonical
way
> of
> > > > doing it. The GxB_select approach seems to be that, so I also agree
> > with
> > > > the calls to get that into the standard (this application alone
> > justifies
> > > > it in my opinion).
> > > >
> > > > I think the usual argument here is that not dropping identity
values
> in
> > > > some cases could result in a lot of fill-in down the road, leading
to
> > > > performance issues.
> > > >
> > > > Maybe an LAGraph utility function would be a nice middle ground?
> > > >
> > > > —
> > > > You are receiving this because you commented.
> > > > Reply to this email directly, view it on GitHub
> > > > <
> > >
> >
>
#28?email_source=notifications&email_token=AEYIIOM56AHABGAUTJM744LQO4QTRA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBMWSBA#issuecomment-542730500
> > > >,
> > > > or unsubscribe
> > > > <
> > >
> >
>
https://github.com/notifications/unsubscribe-auth/AEYIIONIISEAJSNAPFF6ZALQO4QTRANCNFSM4JA4HHPA
> > > >
> > > > .
> > > >
> > >
> > > —
> > > You are receiving this because you are subscribed to this thread.
> > > Reply to this email directly, view it on GitHub
> > > <
> >
>
#28?email_source=notifications&email_token=AAMJ7L2C55CN7YB2D3I4JHDQO4W27A5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM42CI#issuecomment-542756105
> > >,
> > > or unsubscribe
> > > <
> >
>
https://github.com/notifications/unsubscribe-auth/AAMJ7L3TNSL2CKLZIG4QQKLQO4W27ANCNFSM4JA4HHPA
> > >
> > > .
> > >
> >
> > —
> > You are receiving this because you are subscribed to this thread.
> > Reply to this email directly, view it on GitHub
> > <
>
#28?email_source=notifications&email_token=AANXEP37VTFPXZE73PDC2JDQO4XDFA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM5CRA#issuecomment-542757188
> >,
> > or unsubscribe
> > <
>
https://github.com/notifications/unsubscribe-auth/AANXEPYQO47EEGI67IHO67DQO4XDFANCNFSM4JA4HHPA
> >
> > .
> >
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <
#28?email_source=notifications&email_token=AEYIIOKMBHZ35VMAC5AZJMTQO432NA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBNBTJY#issuecomment-542775719
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AEYIIOJ7VMSLAIX5MPMPFADQO432NANCNFSM4JA4HHPA
>
> .
>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#28?email_source=notifications&email_token=AANXEP2VWWPEWCZ7DHYAB7TQO44U7A5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBNCJMA#issuecomment-542778544>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANXEPZCQEFJ6GHNHXUDIEDQO44U7ANCNFSM4JA4HHPA>
.
|
Thank you very much for a very detailed explanation of this issue, it's great!) I understand why automatic dropping of zeros isn't allowed in GraphBlas and why it breaks the framework. So I don't pretend to include this feature by default, but I would be very grateful if you, if possible, could add the ability to change operation behaviour (like some entry in the operation descriptor or something else). So I want to give an additional more real-life example when it would be very useful and the selection operation wouldn't be enough. The problem will appear if we want to change the graph dynamically, e.g. adding edges step by step. Suppose we want to find all paths in the directed graph which satisfy some conditions. For simplicity, let's assume that conditions are predicates A, B, C so the entry of the matrix G that corresponds to graph is a subset of {A, B, C}. So predicate P belongs to G[i][j] iff there is a path from node i to j, which satisfies the predicate P. Also, we have some rules, that allow merging two paths, whose final vertices coincide. E.g. if some path from i to j satisfies the predicate B and some path from j to k satisfy predicate C, then the path from i to k satisfies the predicate A. Thus these rules constitute the semiring, whose elements are subsets of {A, B, C} and binary multiplication operation corresponds to applying all rules to subsets. In the example above, {B} multiplied by {C} is {A}, but {C} multiplied by {C} is empty set, because there isn't such kind of rule. The addition operation, you guessed it, is a simple union of subsets. At the beginning of the algorithm, we initialize matrix with some subsets (which ones don't matter), so the base of the algorithm is the answer for paths of length 1. Then we multiply the matrix by itself and receive an answer for paths of length 2. Then we get the union of matrices for paths of length 2 and 1 and can repeat multiplication to get answers for paths of length 3. And so on. I believe, that these iterations will converge :D In this algorithm, I need to create own function to implement set multiplication. And there is a case when the result of the operation is empty set (when there is no suitable rule). In the current version, I have to set the explicit value of this empty set, but in the semantics of algorithm, it is equal to an explicit value, which doesn't occur in the pattern, because it means "there isn't the path that satisfies at least one predicates between this vertices". That gets into serious troubles. The main terrible thing can happen even after the first matrix multiplication due to the appearance of a huge number of "zeroes" explicit values. Even if we clear all unnecessary values after each operation thanks to the selection operation, the zeroes values will come to us after multiplication, and before selection in the worst case will permeate the swap memory, get out of there and kill the process. And this is a real-life case. The other problem is performance due to many unnecessary operations. At first, We have to add implicit unnecessary value, at second, delete this value. It seems a little strange. So it would be very nice to be able to change the behaviour of operations (make it drop unnecessary values) or to return from user-defined operation function special value or something else. This will make it more flexible. In conclusion, I want to thank everyone) I am very pleased to participate in the conversation. |
I see.
Currently, if you're using SuiteSparse:GraphBLAS, it's a single call to
GxB_select after each multiplication,
to drop the zeros.
It would be very hard to change my internal functions to drop zeros during
the computation of GrB_mxm. There might
be variants of GrB_mxm that could exploit it though. I'd have to think it
over. (I currently have 10 different variants
of C=A*B ... for each 1,040 different semirings, as well, plus the generic
ones. Yes, I know that's crazy ...).
Some of the 10 variants might be able to drop zeros on the fly, to same
time and memory, but not very many of them.
Most will have to compute all of C and then drop zeros when they are done.
This is because of how the matrix
multiplication gets done. Even sequentially, most of the methods are best
done as two phases: one to do the
symbolic work (deciding what the pattern of the result is) and the second
phase computes the values. The first
symbolic phase doesn't know the values so it can't drop anything.
When computing in parallel, the multiphase approach is essential for
parallelism. Dropping on the fly, during the GrB_mxm,
makes the size of the result unpredictable, and thus not parallel.
But it would be possible to include a nondefault descriptor option which
would drop zeros *after* any
GrB* function is done, just before it returns its result to the caller. So
in that case,
the performance of the following 2 cases would be the same:
GrB_mxm (C, ...)
GxB_select (C, NULL, NULL, GxB_NONZERO, C, NULL, NULL) ;
or with a possible descriptor:
// this doesn't work yet; just a straw man
GrB_Descriptor_set (desc, GxB_DROP, true) ;
GrB_Descriptor_set (desc, GxB_DROPVALUE, 0) ;
GrB_mxm (C, ..., desc) ;
Since I do the GxB_select in parallel, it's not an in-place algorithm, in
either case. I have to create a new copy of
the matrix with the zeros dropped. The advantage of the latter example,
with the descriptor, would be that you
wouldn't have to create the descriptor all the time. You could just pass
the "drop descriptor" to all your favorite
GrB* functions.
I'll have to see what the C API committee thinks about this idea, of adding
the GxB_DROP* fields to the descriptor.
It's an extension (GxB*) but I like to get their feedback before I add
things like this, anyway.
But the descriptor would likely not speed things up very much, at least at
first, as compared to having you
call GxB_select at particular points in your code.
Is including GxB_select after your GrB_mxm (and any GrB* function) enough
to solve your memory thrashing problem?
…On Wed, Oct 16, 2019 at 8:04 PM simpletonDL ***@***.***> wrote:
Thank you very much for a very detailed explanation of this issue, it`s
great!)
I understand why automatic dropping of zeros isnt allowed in GraphBlas
and why it breaks the framework. So I dont pretend to include this
feature by default, but I would be very grateful if you, if possible, could
add the ability to change operation behaviour (like some entry in the
operation descriptor or something else). So I want to give an additional
more real-life example when it would be very useful and the selection
operation wouldn`t be enough.
The problem will appear if we want to change the graph dynamically, e.g.
adding edges step by step. Suppose we want to find all paths in the
directed graph which satisfy some conditions. For simplicity, lets assume
that conditions are predicates A, B, C so the entry of the matrix G that
corresponds to graph is a subset of {A, B, C}. So predicate P belongs to
G[i][j] iff there is a path from node i to j, which satisfies the predicate
P. Also, we have some rules, that allow merging two paths, whose final
vertices coincide. E.g. if some path from i to j satisfies the predicate B
and some path from j to k satisfy predicate C, then the path from i to k
satisfies the predicate A. Thus these rules constitute the semiring, whose
elements are subsets of {A, B, C} and binary multiplication operation
corresponds to applying all rules to subsets. In the example above, {B}
multiplied by {C} is {A}, but {C} multiplied by {C} is empty set, because
there isnt such kind of rule. The addition operation, you guessed it, is
a simple union of subsets.
At the beginning of the algorithm, we initialize matrix with some subsets
(which ones don't matter), so the base of the algorithm is the answer for
paths of length 1. Then we multiply the matrix by itself and receive an
answer for paths of length 2. Then we get the union of matrices for paths
of length 2 and 1 and can repeat multiplication to get answers for paths of
length 3. And so on. I believe, that these iterations will converge :D
In this algorithm, I need to create own function to implement set
multiplication. And there is a case when the result of the operation is
empty set (when there is no suitable rule). In the current version, I have
to set the explicit value of this empty set, but in the semantics of
algorithm, it is equal to an explicit value, which doesnt occur in the
pattern, because it means "there isnt the path that satisfies at least
one predicates between this vertices".
That gets into serious troubles. The main terrible thing can happen even
after the first matrix multiplication due to the appearance of a huge
number of "zeroes" explicit values. Even if we clear all unnecessary values
after each operation thanks to the selection operation, the zeroes values
will come to us after multiplication, and before selection in the worst
case will permeate the swap memory, get out of there and kill the process.
And this is a real-life case.
The other problem is performance due to many unnecessary operations. At
first, We have to add implicit unnecessary value, at second, delete this
value. It seems a little strange.
So it would be very nice to be able to change the behaviour of operations
(make it drop unnecessary values) or to return from user-defined operation
function special value or something else. This will make it more flexible.
In conclusion, I want to thank everyone) I am very pleased to participate
in the conversation.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#28?email_source=notifications&email_token=AEYIIOPIYQZHWVYFKN6PAKTQO62ZHA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBOMVMY#issuecomment-542952115>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEYIIOINHSMHC22GDD3HYETQO62ZHANCNFSM4JA4HHPA>
.
|
There is a problem exactly after GrB_mxm and before GxB_select, because zeroes respawn between these moments. GxB_select_ can reduce the problem, but it can`t solve it. If dropping identity values is impossible during the matrix multiplication (only for user-defined operations) due to algorithmic features, I will understand/ |
In this case, I think, the issue can be closed. Thank you very much for your answer! |
On Wed, Oct 16, 2019 at 6:33 PM Tim Davis ***@***.***> wrote:
I see.
Currently, if you're using SuiteSparse:GraphBLAS, it's a single call to
GxB_select after each multiplication,
to drop the zeros.
It would be very hard to change my internal functions to drop zeros during
the computation of GrB_mxm. There might
be variants of GrB_mxm that could exploit it though. I'd have to think it
over. (I currently have 10 different variants
of C=A*B ... for each 1,040 different semirings, as well, plus the generic
ones. Yes, I know that's crazy ...).
Some of the 10 variants might be able to drop zeros on the fly, to same
time and memory, but not very many of them.
Most will have to compute all of C and then drop zeros when they are done.
This is because of how the matrix
multiplication gets done. Even sequentially, most of the methods are best
done as two phases: one to do the
symbolic work (deciding what the pattern of the result is) and the second
phase computes the values. The first
symbolic phase doesn't know the values so it can't drop anything.
When computing in parallel, the multiphase approach is essential for
parallelism. Dropping on the fly, during the GrB_mxm,
makes the size of the result unpredictable, and thus not parallel.
But it would be possible to include a nondefault descriptor option which
would drop zeros *after* any
GrB* function is done, just before it returns its result to the caller. So
in that case,
the performance of the following 2 cases would be the same:
GrB_mxm (C, ...)
GxB_select (C, NULL, NULL, GxB_NONZERO, C, NULL, NULL) ;
or with a possible descriptor:
// this doesn't work yet; just a straw man
GrB_Descriptor_set (desc, GxB_DROP, true) ;
GrB_Descriptor_set (desc, GxB_DROPVALUE, 0) ;
GrB_mxm (C, ..., desc) ;
Since I do the GxB_select in parallel, it's not an in-place algorithm, in
either case. I have to create a new copy of
the matrix with the zeros dropped. The advantage of the latter example,
with the descriptor, would be that you
wouldn't have to create the descriptor all the time. You could just pass
the "drop descriptor" to all your favorite
GrB* functions.
I'll have to see what the C API committee thinks about this idea, of adding
the GxB_DROP* fields to the descriptor.
It's an extension (GxB*) but I like to get their feedback before I add
things like this, anyway.
We had implemented a generalized version of this drop back in the KDT days,
and it still exists in CombBLAS, though sparsely documented.
It is both unfortunate and reassuring at the same time to see all the
design decisions of KDT/CombBLAS percolating back up to GraphBLAS. I never
brought this topic up because the solution we had was ugly and I had hoped
some other solution would come up in the meantime.
Here is how it worked. Each scalar multiplication operator had an
additional field that they were allowed to modify: returnedSAID. If that
field was set to yes, then the subsequent addition would not happen and the
output would not be materialized. The symbolic phase could keep track of
the number of time returnedSAID was true and not allocate space for them.
This was our in-band signaling mechanism. The semiring implementer would
decide what that particular semiring’s SAID (which stood for “semiring
additive identity”) was. For built-in semirings, this is pretty obvious.
Let's say we wanted to eliminate all edges that do not satisfy a certain
criteria. In this case, it was some sort of "date" field being recent
enough on the edge payload and the edge type being a "retweet". Then the
scalar multiplication of the semiring would be something like (copied from
here:
https://people.eecs.berkeley.edu/~aydin/CombBLAS/html/_twitter_edge_8h_source.html
)
static VECTYPE filtered_select2nd(const TwitterEdge
<https://people.eecs.berkeley.edu/~aydin/CombBLAS/html/class_twitter_edge.html>
& arg1, const VECTYPE & arg2, time_t & sincedate)
{
....
if(arg1.isRetwitter
<https://people.eecs.berkeley.edu/~aydin/CombBLAS/html/class_twitter_edge.html#ab81eec42af325334d27c5aae09a6bf68>()
&& arg1.LastTweetBy
<https://people.eecs.berkeley.edu/~aydin/CombBLAS/html/class_twitter_edge.html#a06d6a686329cf6e18e1e8dbc5cab40fd>(sincedate))
// T1 is of type edges for BFS
{
return arg2;
}
else
{
SR::returnedSAID(true);
return VECTYPE();
}
}
I guess all I am saying is that we can incorporate this to GraphBLAS as you
suggested but the way you described would still materialize that “drop”
object in the sparse matrix, albeit temporarily, and then remove it.
In-band SAID signaling solved that very efficiently and in a rather general
way for us, albeit in a relatively ugly way.
…
But the descriptor would likely not speed things up very much, at least at
first, as compared to having you
call GxB_select at particular points in your code.
Is including GxB_select after your GrB_mxm (and any GrB* function) enough
to solve your memory thrashing problem?
On Wed, Oct 16, 2019 at 8:04 PM simpletonDL ***@***.***>
wrote:
> Thank you very much for a very detailed explanation of this issue, it`s
> great!)
>
> I understand why automatic dropping of zeros isnt allowed in GraphBlas
> and why it breaks the framework. So I dont pretend to include this
> feature by default, but I would be very grateful if you, if possible,
could
> add the ability to change operation behaviour (like some entry in the
> operation descriptor or something else). So I want to give an additional
> more real-life example when it would be very useful and the selection
> operation wouldn`t be enough.
>
> The problem will appear if we want to change the graph dynamically, e.g.
> adding edges step by step. Suppose we want to find all paths in the
> directed graph which satisfy some conditions. For simplicity, lets assume
> that conditions are predicates A, B, C so the entry of the matrix G that
> corresponds to graph is a subset of {A, B, C}. So predicate P belongs to
> G[i][j] iff there is a path from node i to j, which satisfies the
predicate
> P. Also, we have some rules, that allow merging two paths, whose final
> vertices coincide. E.g. if some path from i to j satisfies the predicate
B
> and some path from j to k satisfy predicate C, then the path from i to k
> satisfies the predicate A. Thus these rules constitute the semiring,
whose
> elements are subsets of {A, B, C} and binary multiplication operation
> corresponds to applying all rules to subsets. In the example above, {B}
> multiplied by {C} is {A}, but {C} multiplied by {C} is empty set, because
> there isnt such kind of rule. The addition operation, you guessed it, is
> a simple union of subsets.
>
> At the beginning of the algorithm, we initialize matrix with some subsets
> (which ones don't matter), so the base of the algorithm is the answer for
> paths of length 1. Then we multiply the matrix by itself and receive an
> answer for paths of length 2. Then we get the union of matrices for paths
> of length 2 and 1 and can repeat multiplication to get answers for paths
of
> length 3. And so on. I believe, that these iterations will converge :D
>
> In this algorithm, I need to create own function to implement set
> multiplication. And there is a case when the result of the operation is
> empty set (when there is no suitable rule). In the current version, I
have
> to set the explicit value of this empty set, but in the semantics of
> algorithm, it is equal to an explicit value, which doesnt occur in the
> pattern, because it means "there isnt the path that satisfies at least
> one predicates between this vertices".
>
> That gets into serious troubles. The main terrible thing can happen even
> after the first matrix multiplication due to the appearance of a huge
> number of "zeroes" explicit values. Even if we clear all unnecessary
values
> after each operation thanks to the selection operation, the zeroes values
> will come to us after multiplication, and before selection in the worst
> case will permeate the swap memory, get out of there and kill the
process.
> And this is a real-life case.
>
> The other problem is performance due to many unnecessary operations. At
> first, We have to add implicit unnecessary value, at second, delete this
> value. It seems a little strange.
>
> So it would be very nice to be able to change the behaviour of operations
> (make it drop unnecessary values) or to return from user-defined
operation
> function special value or something else. This will make it more
flexible.
>
> In conclusion, I want to thank everyone) I am very pleased to participate
> in the conversation.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <
#28?email_source=notifications&email_token=AEYIIOPIYQZHWVYFKN6PAKTQO62ZHA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBOMVMY#issuecomment-542952115
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AEYIIOINHSMHC22GDD3HYETQO62ZHANCNFSM4JA4HHPA
>
> .
>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#28?email_source=notifications&email_token=AAMJ7LZOK6NJIQXUI7UOHVTQO66GZA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBOOCMY#issuecomment-542957875>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMJ7LZWU2T7FSYCE6JKL6TQO66GZANCNFSM4JA4HHPA>
.
|
If the first part of matrix multiplication computes the pattern of result, then in any case memory will be allocated for identity values. So I understand that there is no way to increase memory in this case (may be, it works only for some special built-in types). |
I think there is a better solution, one that uses the predicate B as a
mask. There's no need to drop zeros, just don't compute them in the first
place (that is the purpose of the mask). In your earlier email, you wrote:
The problem will appear if we want to change the graph dynamically, e.g.
adding edges step by step. Suppose we want to find all paths in the
directed graph which satisfy some conditions. For simplicity, lets assume
that conditions are predicates A, B, C so the entry of the matrix G that
corresponds to graph is a subset of {A, B, C}. So predicate P belongs to
G[i][j] iff there is a path from node i to j, which satisfies the predicate
P. Also, we have some rules, that allow merging two paths, whose final
vertices coincide. E.g. if some path from i to j satisfies the predicate B
and some path from j to k satisfy predicate C, then the path from i to k
satisfies the predicate A. Thus these rules constitute the semiring, whose
elements are subsets of {A, B, C} and binary multiplication operation
corresponds to applying all rules to subsets. In the example above, {B}
multiplied by {C} is {A}, but {C} multiplied by {C} is empty set, because
there isnt such kind of rule. The addition operation, you guessed it, is a
simple union of subsets.
This sounds like the GraphBLAS operation G<B>=A*C'. Let me ask the
following. Is the following computation being done? I will write it as if
it considers all i and j, but don't fear, this is not what I do. Just the
mathematical specification:
Let A, B, C be square boolean matrices of dimension n. The matrix G will
be n by n and is currently empty.
for all i = 1 to n
for all j = 1 to n
if (B (i,j)) is true then
for all k = 1 to n
G (i,j) = G(i,j) OR (A(i,k) AND C (j,k))
If that is what you want to compute, then it is a very fast GrB_mxm
computation, G<B> = A*C'.
I do not take O(n^3) to do the above computation. The above is just a
simple mathematical specification
of what is computed by the following:
GrB_Descriptor_new (&desc) ;
GrB_Descriptor_set (desc, GrB_IN1, GrB_TRAN) ;
GrB_mxm (G, B, NULL, GxB_LOR_LAND_BOOL, A, C, desc) ;
The GxB_LOR_LAND_BOOL is the boolean monoid. If instead you want A and C
to be integer, and want to compute the following:
for all i = 1 to n
for all j = 1 to n
if (B (i,j)) is true then
for all k = 1 to n
G (i,j) = G(i,j) + (A(i,k) * C (k,j))
Then the computation is also fast, just with different semiring.
If A and C are stored in their default format, which is by row (CSR), then
this will use my masked dot product, internally.
That function is very fast, very parallel, and very memory efficient. It
will use no more than O (nne(B)) memory, where
nne(B) is the number of explicit entries present in B. I do not need to
transpose the matrix C to compute G<B>=A*C'.
If G is changing dynamically, then you might also consider using an
accumulator operator, like G<B> += A*C'.
The matrix B(i,j) does not have to be boolean. It can be any built-in
type, which are all typecastable to bool.
So in that case, the above pseudocode would read "if B(i,j) is nonzero
then...".
You may still want to drop zeros after the fact, if G(i,j) is computed yet
becomes explicitly zero.
That could happen if A or C have negative entries in them, or if there is
an accum operator and G(i,j)
starts out negative and then becomes zero after the accumulation occurs.
Is this what you want to compute?
…On Wed, Oct 16, 2019 at 9:50 PM simpletonDL ***@***.***> wrote:
If the first part of matrix multiplication computes the pattern of result,
then in any case memory will be allocated for identity values. So I
understand that there is no way to increase memory in this case (may be, it
works only for some special built-in types).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#28?email_source=notifications&email_token=AEYIIOJPLJ3AIIDICSJBSP3QO7HIVA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBOSAKI#issuecomment-542973993>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEYIIOLGIXUK6WGVNRP6G5LQO7HIVANCNFSM4JA4HHPA>
.
|
Oh, it’s interesting, I’ll think about it and try to do it, and say whether it worked out |
This is a great discussion about an issue that's been interesting and important (and sometimes controversial) going back to KDT, CombBLAS, and even sparse Matlab in the early 1990s. I just planted a link to it on GraphBLAS.org :-) |
Hello, I don`t understand how to make GraphBlass not write implicit zeroes (identity values). I found in the documentation the following:
What I should do, if I want to always drop identity values after some operations?
Below there is a simple example of matrix multiplication that generates identity (zero) value.
The output matrix contains one entry that equal to zero:
In the real task, I need to use custom types and custom operations, but at first, I want to solve this small problem. Can you help me, please?
The text was updated successfully, but these errors were encountered: