Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArrayDist storing distribution info for darrays. Deprecated d* functions... #5452

Closed
wants to merge 2 commits into from
Closed

ArrayDist storing distribution info for darrays. Deprecated d* functions... #5452

wants to merge 2 commits into from

Conversation

amitmurthy
Copy link
Contributor

This PR does the following:

  • Added ArrayDist which is an abstract type. Subtypes are SharedDist and ChunkedDist. Both store the array dimensions and list of processes participating in the distribution.
  • ChunkedDist also stores the chunked indexes for the DArray
  • d* functions have been deprecated in favor of the regular convenience constructors accepting a ChunkedDist

Will update documentation if the above is OK.

Updates to SharedArray using a SharedDist will be made in a separate PR.

@amitmurthy
Copy link
Contributor Author

cc: @JeffBezanson - please have a look.

dims::NTuple{N,Int}
pids::Vector{Int}

SharedDist(dims, pids) = new(dims, pids)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary, since default constructors now look like this. The default outer constructor will also infer N given dims.

@JeffBezanson
Copy link
Sponsor Member

Good. Just a couple small comments. Will definitely merge this after doc updates and rebase.

@amitmurthy
Copy link
Contributor Author

Updated docs and rebased.
Also closes #5481 .

@amitmurthy
Copy link
Contributor Author

Bump @JeffBezanson

end


size(ad::ArrayDist) = ad.dims
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be inconsistent with getindex below; i.e. the size does not give the valid range of indexes for getindex. That could be confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. How about size(ChunkedDist) returning the size of the indexes, while size(SharedDist) returns the length of the pids vector. Both describe the size of the distribution.

size(DArray) and size(SharedArray) will return the size of the complete array. There will not be a function to retrieve dims from an ArrayDist - folks have to use ad.dims

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed an update.

@JeffBezanson
Copy link
Sponsor Member

@StefanKarpinski @ViralBShah
I think this is definitely the way forward, since we can't keep adding letter prefixes to zeros (dzeros, etc.), but the syntax zeros(ChunkedDist(dims)) is a bit verbose. My first thought is that it won't pass the Viral and/or Alan tests. Maybe wrap ChunkedDist in a function dist for example? Any other name ideas?

@StefanKarpinski
Copy link
Sponsor Member

Yeah, this has always bothered me. It's a clear sign of lack of composability every time you have parallel universes of functions that differ by prefix or suffix. How about something like this zeros(Dist,dims) and zeros(Shared,dims).

@JeffBezanson
Copy link
Sponsor Member

Come to think of it, the name SharedDist is kind of an oxymoron, since "shared" and "distributed" are opposites in parallel programming terms. Renaming them Shared and Dist would make sense.

@amitmurthy
Copy link
Contributor Author

@StefanKarpinski , Dist represents 3 things

  • the dimensions of the array,
  • the pids participating in the distribution and
  • the distribution of the array over pids

So it can be something like zeros(Dist(dims; pids, dist)) where optional keyword arguments allow the user to specify the pids and dist, i.e., number of parts in each dimension. Default values for the same are workers() and a split along the larger dimension first.

For SharedArrays it can be zeros(Shared(dims; pids))

@amitmurthy
Copy link
Contributor Author

Closed due to #10334

@amitmurthy amitmurthy closed this Feb 28, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants