Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop identity values problem #28

Closed
simpletonDL opened this issue Oct 15, 2019 · 25 comments
Closed

Drop identity values problem #28

simpletonDL opened this issue Oct 15, 2019 · 25 comments

Comments

@simpletonDL
Copy link

Hello, I don`t understand how to make GraphBlass not write implicit zeroes (identity values). I found in the documentation the following:

The entries in the pattern of A can take on any value, including the implicit value, whatever it happens to be. This differs slightly from MATLAB, which always drops all explicit zeros from its sparse matrices. This is a minor difference but it cannot be done in GraphBLAS.

What I should do, if I want to always drop identity values after some operations?
Below there is a simple example of matrix multiplication that generates identity (zero) value.

    GrB_Matrix a, b;
    GrB_Matrix_new(&a, GrB_INT64, 2, 2);
    GrB_Matrix_new(&b,GrB_INT64, 2, 2);

    GrB_Matrix_setElement(a, 2, 0, 0);
    GrB_Matrix_setElement(a, -2, 0, 1);
    GrB_Matrix_setElement(b, 1, 0, 0);
    GrB_Matrix_setElement(b, 1, 1, 0);

    GrB_Monoid monoid;
    GrB_Semiring semiring;

    GrB_Monoid_new_INT64(&monoid, GrB_PLUS_INT64, (int64_t) 0);
    GrB_Semiring_new(&semiring, monoid, GrB_TIMES_INT64);

    GrB_Matrix matrix_new;
    GrB_Matrix_new(&matrix_new, GrB_INT64, 2, 2);

    GrB_mxm(matrix_new, GrB_NULL, GrB_NULL, semiring, a, b, GrB_NULL);
    GxB_print(matrix_new, GxB_SHORT);

The output matrix contains one entry that equal to zero:

...
row: 0 : 1 entries [0:0]
    column 0: int64 0

In the real task, I need to use custom types and custom operations, but at first, I want to solve this small problem. Can you help me, please?

@DrTimothyAldenDavis
Copy link
Member

DrTimothyAldenDavis commented Oct 15, 2019

A very good question. It points out a feature of GraphBLAS, since "zero" can differ depending on the semiring (in a path distance problem, for example, an edge of weight zero is very different than no edge at all). So zeros cannot be dropped automatically inside GraphBLAS.

But there are cases when you do want to delete entries, like all explicit zeros.

It takes a second step to delete entries from a matrix. If you are using SuiteSparse:GraphBLAS, then you can use the following to drop explicit zeros from the GrB_Matrix A. This works for any matrix, including any user-defined type.

GxB_select (A, NULL, NULL, GxB_NONZERO, A, NULL, NULL) ;

GxB_select can also be used to drop any other particular value (or range of values, using, say, GxB_GT_ZERO, which keeps only those entries greater than zero, dropping values that are zero or less). GxB_GT_ZERO only works for the 11 built-in types, while GxB_NONZERO works for any type, including user-defined types. For user-defined types, it checks to see if the bit pattern is all zero, and keeps those that have at least one 1 bit in them. So if your typedef is a struct with "holes" in it, this might not always work as expected.

If you are using another GraphBLAS library, you need to use the matrix as its own mask (assuming A has a built-in type, not a user-defined type).

GrB_assign (A, A, NULL, A, GrB_ALL, nrows, GrB_ALL, ncols, Replace) ;

where Replace is a descriptor with the replace option turned on.

If A has a user-defined type, you first have to create a boolean matrix, where M(i,j) = 0 if A(i,j) is zero, or M(i,j)=1 otherwise. That can be done with a user-defined typecast function, via GrB_apply:

void my_typecast_func (void *z, const void *x)
{
     bool result = 0 if x is zero, 1 if x is nonzero
     *((bool *) z) = result ;
}

GrB_UnaryOp_new (&My_typecast_function, my_typecast_func, GrB_BOOL, My_type) ;
GrB_Matrix_new (&M, GrB_BOOL, nrows, ncols) ;
GrB_apply (M, NULL, NULL, My_typecast_function, A, NULL) ;
GrB_assign (A, M, NULL, A, GrB_ALL, nrows, GrB_ALL, ncols, Replace) ;

(technically speaking, all the "NULL"s above should be GrB_NULL ... but NULL works the same as GrB_NULL in SuiteSparse:GraphBLAS).

@tgmattso
Copy link
Collaborator

tgmattso commented Oct 15, 2019 via email

@DrTimothyAldenDavis
Copy link
Member

DrTimothyAldenDavis commented Oct 15, 2019 via email

@gsvgit
Copy link

gsvgit commented Oct 16, 2019

Hello.

I have the same question. I clearly understand why it is not good idea to remove zero values. But what if I explicitly specify zero as an identity in the monoid? In path distance problem the identity is not zero, so we should not delete zeroes, but I think that we can remove minus infinity which is identity. So, the question is about identities: is it possible to drop identities out automatically during operations over sparse matrices?

@DrTimothyAldenDavis
Copy link
Member

DrTimothyAldenDavis commented Oct 16, 2019 via email

@ScottKolo
Copy link
Contributor

I agree that this is not something that should be done automatically, but it would be convenient to have a utility method or canonical way of doing it. The GxB_select approach seems to be that, so I also agree with the calls to get that into the standard (this application alone justifies it in my opinion).

I think the usual argument here is that not dropping identity values in some cases could result in a lot of fill-in down the road, leading to performance issues.

Maybe an LAGraph utility function would be a nice middle ground?

@DrTimothyAldenDavis
Copy link
Member

DrTimothyAldenDavis commented Oct 16, 2019 via email

@aydinbuluc
Copy link
Collaborator

aydinbuluc commented Oct 16, 2019 via email

@gsvgit
Copy link

gsvgit commented Oct 16, 2019

Ah... I see. @DrTimothyAldenDavis thank you for the explanation!

And even for such operation like GrB_mxm where we should specify semiring, we still have no enough information to drop identities of the given semiring automatically?
Suppose the next case.

  1. I have sparse matrices A and B without explicit zero values.
  2. I perform matrix multiplication A * B over semiring where zero is identity. The result is matrix C in which the value of some cells is explicit zero (because we do not drop it out), and the value of some cells is implicit zero. The first (and principal for me) question here is why the behavior of operation is not agreed with specified semiring? And the second is mentioned by @ScottKolo: such behavior can lead to poor performance.
  3. Now I want to use C in operation over semiring in which zero is not identity. And now I'm confused. Because in terms of the result type all implicit and explicit zeros are equal. But in terms of argument type (C is an argument of operation over another semiring) implicit values and explicit zeros are different.

So, I guess that

  1. The behavior of operation with specified semiring should be agreed with this semiring.
  2. If I want to switch from one semiring to another, I should do it explicitly by using the select function, for example.

@DrTimothyAldenDavis
Copy link
Member

DrTimothyAldenDavis commented Oct 16, 2019 via email

@mcmillan03
Copy link
Member

mcmillan03 commented Oct 16, 2019 via email

@mcmillan03
Copy link
Member

mcmillan03 commented Oct 16, 2019 via email

@DrTimothyAldenDavis
Copy link
Member

DrTimothyAldenDavis commented Oct 16, 2019 via email

@szarnyasg
Copy link
Contributor

szarnyasg commented Oct 16, 2019

Maybe GxB_keep would be a better name? The documentation says

Each entry A(i,j) is evaluated with the operator, which returns true if the entry is to be kept in the output, or false if it is not to appear in the output.

For me, this name works well for simple cases such as keeping the lower triangular part of a matrix. Not sure about the more complex cases (i.e. ones with a mask) though.

@tgmattso
Copy link
Collaborator

tgmattso commented Oct 16, 2019 via email

@mcmillan03
Copy link
Member

mcmillan03 commented Oct 16, 2019 via email

@simpletonDL
Copy link
Author

simpletonDL commented Oct 17, 2019

Thank you very much for a very detailed explanation of this issue, it's great!)

I understand why automatic dropping of zeros isn't allowed in GraphBlas and why it breaks the framework. So I don't pretend to include this feature by default, but I would be very grateful if you, if possible, could add the ability to change operation behaviour (like some entry in the operation descriptor or something else). So I want to give an additional more real-life example when it would be very useful and the selection operation wouldn't be enough.

The problem will appear if we want to change the graph dynamically, e.g. adding edges step by step. Suppose we want to find all paths in the directed graph which satisfy some conditions. For simplicity, let's assume that conditions are predicates A, B, C so the entry of the matrix G that corresponds to graph is a subset of {A, B, C}. So predicate P belongs to G[i][j] iff there is a path from node i to j, which satisfies the predicate P.

Also, we have some rules, that allow merging two paths, whose final vertices coincide. E.g. if some path from i to j satisfies the predicate B and some path from j to k satisfy predicate C, then the path from i to k satisfies the predicate A. Thus these rules constitute the semiring, whose elements are subsets of {A, B, C} and binary multiplication operation corresponds to applying all rules to subsets. In the example above, {B} multiplied by {C} is {A}, but {C} multiplied by {C} is empty set, because there isn't such kind of rule. The addition operation, you guessed it, is a simple union of subsets.

At the beginning of the algorithm, we initialize matrix with some subsets (which ones don't matter), so the base of the algorithm is the answer for paths of length 1. Then we multiply the matrix by itself and receive an answer for paths of length 2. Then we get the union of matrices for paths of length 2 and 1 and can repeat multiplication to get answers for paths of length 3. And so on. I believe, that these iterations will converge :D

In this algorithm, I need to create own function to implement set multiplication. And there is a case when the result of the operation is empty set (when there is no suitable rule). In the current version, I have to set the explicit value of this empty set, but in the semantics of algorithm, it is equal to an explicit value, which doesn't occur in the pattern, because it means "there isn't the path that satisfies at least one predicates between this vertices".

That gets into serious troubles. The main terrible thing can happen even after the first matrix multiplication due to the appearance of a huge number of "zeroes" explicit values. Even if we clear all unnecessary values after each operation thanks to the selection operation, the zeroes values will come to us after multiplication, and before selection in the worst case will permeate the swap memory, get out of there and kill the process. And this is a real-life case.

The other problem is performance due to many unnecessary operations. At first, We have to add implicit unnecessary value, at second, delete this value. It seems a little strange.

So it would be very nice to be able to change the behaviour of operations (make it drop unnecessary values) or to return from user-defined operation function special value or something else. This will make it more flexible.

In conclusion, I want to thank everyone) I am very pleased to participate in the conversation.

@DrTimothyAldenDavis
Copy link
Member

DrTimothyAldenDavis commented Oct 17, 2019 via email

@simpletonDL
Copy link
Author

There is a problem exactly after GrB_mxm and before GxB_select, because zeroes respawn between these moments. GxB_select_ can reduce the problem, but it can`t solve it. If dropping identity values is impossible during the matrix multiplication (only for user-defined operations) due to algorithmic features, I will understand/

@simpletonDL
Copy link
Author

In this case, I think, the issue can be closed. Thank you very much for your answer!

@aydinbuluc
Copy link
Collaborator

aydinbuluc commented Oct 17, 2019 via email

@simpletonDL
Copy link
Author

If the first part of matrix multiplication computes the pattern of result, then in any case memory will be allocated for identity values. So I understand that there is no way to increase memory in this case (may be, it works only for some special built-in types).

@DrTimothyAldenDavis
Copy link
Member

DrTimothyAldenDavis commented Oct 17, 2019 via email

@simpletonDL
Copy link
Author

Oh, it’s interesting, I’ll think about it and try to do it, and say whether it worked out

@johnrgilbert
Copy link

This is a great discussion about an issue that's been interesting and important (and sometimes controversial) going back to KDT, CombBLAS, and even sparse Matlab in the early 1990s. I just planted a link to it on GraphBLAS.org :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants