Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make length(A.nzval)==nnz(A) #30662 #30676

Closed
wants to merge 18 commits into from
Closed

Conversation

abraunst
Copy link
Contributor

@abraunst abraunst commented Jan 10, 2019

Adresses #30662. See also: #26560, #30435

@andreasnoack
Copy link
Member

I suspect that we'd have to audit the code more carefully for places that might use the current assumption. We might also want to add more checks for this since if we make this change then people should be able to rely on it consistently. First of all, we'd have to decide if we want to make this change.

@abraunst
Copy link
Contributor Author

I suspect that we'd have to audit the code more carefully for places that might use the current assumption. We might also want to add more checks for this since if we make this change then people should be able to rely on it consistently. First of all, we'd have to decide if we want to make this change.

Sure. I was not suggesting to rush things up. Thought that having a patch could help the discussion.

@KlausC
Copy link
Contributor

KlausC commented Jan 10, 2019

Compared to #30435 there is essentially one difference:
sizehint! is applied to the data with a size capped by the length of the new array.
The size of rowval/nzval is 0 in both cases.

@StefanKarpinski
Copy link
Sponsor Member

Wasn't the old guarantee looser than the new one? I.e. previously you could assume that nnz(A) ≤ length(A.nzval), which is still true, but now you know that it's an exact equality?

@stevengj stevengj added the domain:arrays:sparse Sparse arrays label Jan 10, 2019
@abraunst
Copy link
Contributor Author

abraunst commented Jan 11, 2019

Wasn't the old guarantee looser than the new one? I.e. previously you could assume that nnz(A) ≤ length(A.nzval), which is still true, but now you know that it's an exact equality?

Yes, exactly. Note that even though length(A.nzval)==length(similar(A, shape).nzval) held as well, it wasn't documented (and is questionable, for sure when prod(shape) < length(A.nzval)). My preference would be to go all the way and do not even sizehint! (i.e. along the lines of #30435) unless it is shown that it helps. But in the absence of that, this PR at least brings back consistency...

@KlausC
Copy link
Contributor

KlausC commented Jan 11, 2019

My opinon is, the sizehint! should not be performed automatically inside similar, because the impact on memory requirement is the same as with the original resize!.
I support @stevengj 's proposal to define sizehint! for SparseMatrixCSC.
For SparseVector the sizehint! would be handy as well.
I also support the idea to keep invariably length(S.rowval) == length(S.nzval) == nnz(S), while using sizehint! to allocate extra space, if that is required.
The argument in #30435, that removing the resize to maximal length is a breaking change, is not confirmed by reality yet, because up to now not a single example was found, where the allocated space is used.

@abraunst
Copy link
Contributor Author

abraunst commented Jan 11, 2019

My opinon is, the sizehint! should not be performed automatically inside similar, because the impact on memory requirement is the same as with the original resize!.
I support @stevengj 's proposal to define sizehint! for SparseMatrixCSC.
For SparseVector the sizehint! would be handy as well.
I also support the idea to keep invariably length(S.rowval) == length(S.nzval) == nnz(S), while using sizehint! to allocate extra space, if that is required.
The argument in #30435, that removing the resize to maximal length is a breaking change, is not confirmed by reality yet, because up to now not a single example was found, where the allocated space is used.

I agree. But given that there is no general consensus on removing the allocation, I think that at least the changes in this PR are an improvement in terms of consistency. I may have misunderstood, but I think that @stevengj's suggestion was not to remove the allocation from similar (just a bit of refactoring so as to leave a sizehint!(::SparseMatrixCSC, n) method that could help in other cases).

@stevengj
Copy link
Member

The new sizehint! should have a NEWS.md item.

@stevengj stevengj added needs news A NEWS entry is required for this change and removed needs news A NEWS entry is required for this change labels Jan 11, 2019
@abraunst
Copy link
Contributor Author

I suspect that we'd have to audit the code more carefully for places that might use the current assumption. We might also want to add more checks for this since if we make this change then people should be able to rely on it consistently. First of all, we'd have to decide if we want to make this change.

@andreasnoack what kind of checks would you suggest?

@Sacha0
Copy link
Member

Sacha0 commented Jan 21, 2019

Crossposting from #30662:

Ref. #20464 for previous discussion of this topic. Decoupling of buffer length and number of stored entries is a well established property of compressed sparse data structures, and is regularly exploited in sparse linear algebra libraries; ref. #20464 (comment) particularly for expansion on use cases. Best!

@andreasnoack
Copy link
Member

andreasnoack commented Jan 22, 2019

what kind of checks would you suggest?

Maybe we want to add some @assert length(A.nzval) == length(A.rowval) == A.colptr[end] - 1 in some of the places where we make use of this.

@abraunst
Copy link
Contributor Author

Crossposting from #30622:

Ref. #20464 for previous discussion of this topic. Decoupling of buffer length and number of stored entries is a well established property of compressed sparse data structures, and is regularly exploited in sparse linear algebra libraries; ref. #20464 (comment) particularly for expansion on use cases. Best!

I think that you meant #30435. I've replied here.

@ViralBShah
Copy link
Member

Should the proposed checks be part of this PR?

@andreasnoack
Copy link
Member

Should the proposed checks be part of this PR?

I think it would make sense to include them here but it might not be obvious where to but them. However, it would be great if @abraunst would skim through the code for uses of nnz and direct references to .colptr to see if it adding the check would be useful.

@abraunst
Copy link
Contributor Author

Should the proposed checks be part of this PR?

I think it would make sense to include them here but it might not be obvious where to but them. However, it would be great if @abraunst would skim through the code for uses of nnz and direct references to .colptr to see if it adding the check would be useful.

Maybe the best would be just to throw on construction if the buffer sizes are not right. If the user modifies the buffers post-construction, he's on his/her own, no?

@andreasnoack
Copy link
Member

If the user modifies the buffers post-construction, he's on his/her own, no?

Mostly but it wouldn't hurt if we could detect an inconsistency before a segfault. However, I was mainly thinking that these checks would be useful internally in SparseArrays to ensure that the array buffers stay consistent with the assumption within methods of the module.

@abraunst
Copy link
Contributor Author

abraunst commented Jan 30, 2019

So I hopefully implemented @andreasnoack's request. It was harder than what I planned...

With respect to the previous version:

  • I introducted a capacity function to query the size of the underlying allocated buffer of a Vector. I barely know what I'm doing here, so please check carefully (also, is the name ok?).
  • I added buffer checks in many crucial points. Consistency of buffers should be always mantained at the return of methods now.
  • I hopefully kept all carefully placed performance tricks regarding buffer allocations in broadcast/map... benchmark run desperately needed though.
  • I think that @Sacha0 should check the code regarding in particular sparse broadcast/map, as it seems he's the main author (or so says git blame)
  • I removed two tests targetting behaviour on "illegal" sparse structures
  • rebased on current master

@abraunst
Copy link
Contributor Author

abraunst commented Jan 30, 2019

Forgot SuiteSparse checks...

EDIT: indeed SuiteSparse builds many intermediate "illegal" SparseMatrixCSC. Will have a look as soon as possible.

Copy link
Contributor

@KlausC KlausC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully agree, with a few exceptions:

  • The new function name capacity might be misleading - I would prefer sizehint, because it is in a way the getter function corresponding to sizehint!.
  • The function sortBuffers! might be better called sort_CSC_buffers! or similar.

@abraunst
Copy link
Contributor Author

abraunst commented Jan 31, 2019

Could someone kindly invoke @nanosoldier to check performance regressions against master?

@abraunst
Copy link
Contributor Author

abraunst commented Mar 8, 2019

the provided length to sizehint! should be respected by the library, i.e. the allocation should be exactly the requested size

What does "exactly" mean? Even malloc has some overhead and granularity. Or for example, if the buffer is size 100 and you call sizehint!(a, 99) should we try to re-allocate it?

As exactly as possible of course. This was referred to enlarging the buffer, I think we should do nothing if the value passed to sizehint! is smaller than the buffer size (just because I understand that this is what GNU c++'s reserve does, no particular opinion of my own).

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Mar 8, 2019

As exactly as possible of course.

Anything is possible, but giving the exact buffer size you asked for is often bad for performance. It's the memory subsystem's job to worry about these things and do a good job, trading off speed and overhead appropriately. Insisting on exactness is a form of micromanaging the memory system.

@abraunst
Copy link
Contributor Author

abraunst commented Mar 8, 2019

As exactly as possible of course.

Define "possible". Anything is possible, but giving the exact buffer size you asked for is often bad for performance and it's the memory subsystem's job to worry about these things and do a good job. Insisting on exactness is a form of micromanaging the memory system.

In my opinion, the role of sizehint! should be essentially to allow micromanaging in (performance/memory critical) situations where the memory subsystem would not be able to do a good guess. Currently there is no way to do it... Of course, maybe I'm completely wrong and this was not the scope of that function. Still, I can imagine many situations where the user knows what is the correct resize but julia will overshoot by about a factor 2.

@StefanKarpinski
Copy link
Sponsor Member

The purpose of sizehint! is to allow pre-allocating the amount of memory you expect to need while still allowing you to use push! to incrementally add elements instead of needing to grow the vector and separately keep track of which elements are real and which are junk. This style of programming is also inherently tolerant of being off by a bit: if you do sizehint!(1_000_000) and the actual size ends up being 1_000_012 that's fine and you've still saved a lot of reallocation along the way.

What if Julia's memory subsystem decides that it only gives array memory regions whose sizes are multiples of 16 bytes? Should calling sizehint! just override that and force the memory subsystem to give you a chunk of memory that isn't a multiple of 16? What if the trailing remainder of memory up to the next multiple of 16 will never be used by any other array? If you allocate a 999_992-byte array of floats and then want to push another value onto it should we refuse to let you do so and force the entire thing to be copied even though nothing else could possibly be using the remaining 8 bytes?

@abraunst
Copy link
Contributor Author

abraunst commented Mar 8, 2019

The purpose of sizehint! is to allow pre-allocating the amount of memory you expect to need while still allowing you to use push! to incrementally add elements instead of needing to grow the vector and separately keep track of which elements are real and which are junk. This style of programming is also inherently tolerant of being off by a bit: if you do sizehint!(1_000_000) and the actual size ends up being 1_000_012 that's fine and you've still saved a lot of reallocation along the way.

Sure, my point is that the user is likely much more informed about what is the distribution of this possible "off by a bit" length than the memory subsystem. I'm probably too new to julia to be discussing this (if I'm talking nonsense, I apologize). But I feel the lack of fine control on memory allocation a bit unsettling (not for casual use of course, but for libraries and performance-critical code).

What if Julia's memory subsystem decides that it only gives array memory regions whose sizes are multiples of 16 bytes? Should calling sizehint! just override that and force the memory subsystem to give you a chunk of memory that isn't a multiple of 16? What if the trailing remainder of memory up to the next multiple of 16 will never be used by any other array? If you allocate a 999_992-byte array of floats and then want to push another value onto it should we refuse to let you do so and force the entire thing to be copied even though nothing else could possibly be using the remaining 8 bytes?

I feel like a straw man argument is being used here. Of course it's not really important to be precise to the bit (that was in my mind roughly the meaning of "as exactly as possible"); the point is that we are doing predictive pre-allocation and purposely allocating more (sometimes twice as much) than the user prediction.

EDIT: maybe it would be better to move this particular discussion somewhere else (e.g. Discourse), as it is separate from the PR. I still don't understand the outcome of triage WRT. this PR (which is not about performance at all).

@KlausC
Copy link
Contributor

KlausC commented Mar 9, 2019

I found this spot in sparsematrix.jl, which silently assumes, that nnz(XI) == length(XI.rowval) == length(XI.nzval). That means throw exception, if that is not the case.

rowval[(1 : nnzX[i]) .+ nnz_sofar] = X[i].rowval .+ mX_sofar
nzval[(1 : nnzX[i]) .+ nnz_sofar] = X[i].nzval

This one assumes (only) that length(XI.rowval) == length(XI.nzval):

if nnzX[i] == length(XI.rowval)
rowval[(1 : nnzX[i]) .+ nnz_sofar] = XI.rowval
nzval[(1 : nnzX[i]) .+ nnz_sofar] = XI.nzval
else
rowval[(1 : nnzX[i]) .+ nnz_sofar] = XI.rowval[1:nnzX[i]]
nzval[(1 : nnzX[i]) .+ nnz_sofar] = XI.nzval[1:nnzX[i]]
end

That underlines @StefanKarpinski 's statement

I just don’t think that’s a state that the matrix should normally be left in.

@abraunst
Copy link
Contributor Author

abraunst commented Mar 9, 2019

I found this spot in sparsematrix.jl, which silently assumes, that nnz(XI) == length(XI.rowval) == length(XI.nzval). That means throw exception

Yes, this is too error prone. Another example is the recently fixed #31187.

@Sacha0
Copy link
Member

Sacha0 commented Mar 10, 2019

I don't think it's all that drastic a change. The structure is still the same, the only question is whether extra room after the end of rowval and nzval is explicitly managed by the sparse matrix user or by Julia's memory system. That doesn't fundamentally change the data structure—the in-memory representation is exactly the same.

Though the data structure doesn't fundamentally change, moving away from in some sense explicitly managing memory, and towards reliance on the memory management system / concomitant use of higher level operations, is something of a paradigm shift for those of us used to a world where CSC/CSR by definition involves the former :). It implies writing different sorts of code and perhaps different sorts of thinking. It also seems like the logical conclusion of this direction would be removal of the last element of colptr, which would fundamentally change the data structure? Best!

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Mar 11, 2019

It also seems like the logical conclusion of this direction would be removal of the last element of colptr, which would fundamentally change the data structure?

I don't think that's really necessary—the one extra value isn't costing that much and allows these vectors to be passed as-is to libraries that expect this representation. This also relates to the reason, in my understanding, that capacity is useful in conjunction with this change: if you're calling a library that can make good use of extra capacity in the rowval and nzval vectors you can pass capacity(rowval) and capacity(nzval) instead of length(rowval) and length(nzval) and then call resize!(rowval) and resize!(nzval) with whatever the new last value in colptr is and everything should work fine. I guess the problem with that scenario is: what does the library do if there isn't enough room? I don't have an answer to that and I don't even know if this is a realistic scenario.

@StefanKarpinski
Copy link
Sponsor Member

I feel like a straw man argument is being used here. Of course it's not really important to be precise to the bit (that was in my mind roughly the meaning of "as exactly as possible"); the point is that we are doing predictive pre-allocation and purposely allocating more (sometimes twice as much) than the user prediction.

It's not intentionally a strawman argument at all. I was taking "as exactly as possible" literally, which would seem to object to this kind of small over-allocation. So it seems that the only issue is regarding large over-allocation. I would point out that large over-allocation is only a problem if you do it yourself and insist on filling values with something. If you let the system do it, it's free since the kernel doesn't actually allocate memory until you use it. So sure, we seem to be allocating 2x what's required, but most of those memory pages are not allocated at the time and will never be allocated if they're not used.

@abraunst
Copy link
Contributor Author

I feel like a straw man argument is being used here. Of course it's not really important to be precise to the bit (that was in my mind roughly the meaning of "as exactly as possible"); the point is that we are doing predictive pre-allocation and purposely allocating more (sometimes twice as much) than the user prediction.

It's not intentionally a strawman argument at all. I was taking "as exactly as possible" literally, which would seem to object to this kind of small over-allocation. So it seems that the only issue is regarding large over-allocation. I would point out that large over-allocation is only a problem if you do it yourself and insist on filling values with something. If you let the system do it, it's free since the kernel doesn't actually allocate memory until you use it. So sure, we seem to be allocating 2x what's required, but most of those memory pages are not allocated at the time and will never be allocated if they're not used.

I didn't know (or recall) this, thanks for taking your time to explain. I'm starting to understand then :-) What confused me the most is the fact that g++ seems to exactly allocate the requested size on reserve.

@Sacha0
Copy link
Member

Sacha0 commented Mar 12, 2019

I don't think that's really necessary—the one extra value isn't costing that much and allows these vectors to be passed as-is to libraries that expect this representation.

I agree that practically speaking it's neither necessary or advantageous to remove the last element of colptr at this stage. Rather my point was that hypothetically being able to remove the last element, and that that might be a logical step in the abstract, implies there is at least a paradigm shift involved for those familiar with traditional CSC (given that such removal is not possible with traditional CSC), even though the data structure may not strictly change at this stage. Best!

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Mar 13, 2019

What confused me the most is the fact that g++ seems to exactly allocate the requested size on reserve.

I'm curious about how you're determining this? If you're writing C or C++ code, you're allocating memory with malloc directly or indirectly, in which case your malloc implementation is also not really giving you an exact shrink-wrapped chunk of memory.

@abraunst
Copy link
Contributor Author

What confused me the most is the fact that g++ seems to exactly allocate the requested size on reserve.

I'm curious about how you're determining this? If you're writing C or C++ code, you're allocating memory with malloc directly or indirectly, in which case your malloc implementation is also not really giving you an exact shrink-wrapped chunk of memory.

What do you mean by shrink-wrapped? What I mean is just that the standard library does not seem to ask to malloc more than what was requested by the user.

#include <vector>
#include <iostream>

using namespace std;

int main()
{
        vector<int> v(1000000);
        cout << v.capacity() << endl;
        v.reserve(1000001);
        cout << v.capacity() << endl;
}

gives as output here

1000000
1000001

In contrast, in julia (using capacity from the PR) I get:

julia> v = fill(0, 1000000);

julia> capacity(v)
1000000

julia> sizehint!(v, 1000001);

julia> capacity(v)
2000000

I understand now that the pages are not necessarily mapped to physical memory until access, but still it seems to me that the c++ policy is different wrt how much to malloc. Is this correct?

@ViralBShah
Copy link
Member

Bump - what do we need to do here to get this over the hump?

@ViralBShah
Copy link
Member

Bumping here. What should we do here?

@ViralBShah
Copy link
Member

Bump @andreasnoack and @abraunst

@abraunst
Copy link
Contributor Author

abraunst commented Jul 7, 2019

Bump @andreasnoack and @abraunst

I kind of lost the stamina to insist pushing on this, as the outcome of triage wasn't clear (I asked for clarifications but did not receive one). I got simultaneously overswamped with other stuff. What I recall about this is:

  • The original changed was not about performance at all, just about consistency, and eliminating a source of error-proneness of having length(A.nzval)!=nnz(A)
  • The parallel overalocation scheme that is in place was designed by @Sacha0, and apparently was there for a reason. So, I tried to keep everything in place (this involved adding the capacity function) and just change the consistency part.
  • At some point, it was deemed that this parallel overallocation scheme may be actually unnecessary. I don't know if there is evidence to support this...

I don't think it would be wise to work on this anymore until a decision is made on how to proceed...

@StefanKarpinski
Copy link
Sponsor Member

Agree, sorry for the unused effort on your part! It may still end up getting used but you’ve certainly done your part and then some.

@abraunst
Copy link
Contributor Author

abraunst commented Jul 8, 2019

Agree, sorry for the unused effort on your part! It may still end up getting used but you’ve certainly done your part and then some.

NP, I'll try to contribute somewhere else when I get some time back, at least until the direction is more clear here 😄

@ViralBShah
Copy link
Member

Yes, I was also unclear on what the final resolution was here. Let's wait for @andreasnoack to chime in. It would be nice to resolve and get this done.

@StefanKarpinski
Copy link
Sponsor Member

The issue was not the doing (it's done) but the disagreement about whether or not it's desirable.

@andreasnoack
Copy link
Member

In the current form, I believe that this PR depends on #31287. Might be an option to just have an internal capacity function for now.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Apr 12, 2021

Can we eliminate the addition of capacity from this? I've split out one portion of this into #40444, but seems like most of the rest of this could also still be merged on top of that, by dropping those extra lines.

@abraunst
Copy link
Contributor Author

abraunst commented Apr 12, 2021

Can we eliminate the addition of capacity from this? I've split out one portion of this into #40444, but seems like most of the rest of this could also still be merged on top of that, by dropping those extra lines.

I can try to have a look at this again as soon as I get some time if no one else wants to do it. Unfortunately it seems that the push! problem got even worse in these three (EDIT: only two, but I overallocated one just in case) years, so the overallocation scheme seems to be still needed:

julia> function f()
           x=zeros(0);
           sizehint!(x, 10^7)
           for i=1:10^7
               push!(x, i)
           end
           x
       end
f (generic function with 1 method)

julia> function g()
           x=zeros(0);
           resize!(x, 10^7)
           @inbounds for i=1:10^7
               x[i]=i
           end
           x
       end
g (generic function with 1 method)

julia> @btime f();
  127.796 ms (2 allocations: 76.29 MiB)

julia> @btime g();
  36.686 ms (2 allocations: 76.29 MiB)

@vtjnash
Copy link
Sponsor Member

vtjnash commented Apr 19, 2021

Moved to #40523

@vtjnash vtjnash closed this Apr 19, 2021
@vtjnash vtjnash removed the status:triage This should be discussed on a triage call label Apr 22, 2021
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.