-
Notifications
You must be signed in to change notification settings - Fork 14
Huawei 1230 #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huawei 1230 #30
Conversation
|
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). The following commits have not yet signed CLA. 6d735ba | A complete overhaul of the HAN code. Among many other things:
Cleanup the fallback collective support.
Communicator: provide ompi_comm_split_with_info to split and provide info at the same time COLL HAN: use info keys instead of component-level variable to communicate topology level between abstraction layers
COLL HAN: Fix topology handling
Signed-off-by: Xi Luo [email protected] Conflicts: There was a bug allowing for partial packing of non-data elements (such as loop Signed-off-by: George Bosilca [email protected] Signed-off-by: Aboorva Devarajan [email protected] Signed-off-by: George Bosilca [email protected] Import the HAN collective into 4.1 [v4.1.x] pml/ucx: fix zero sized datatype transfers This code was invoked twice. Leave it solely in OPAL_CONFIGURE_SETUP, Signed-off-by: Jeff Squyres [email protected] If defined, use SOURCE_DATE_EPOCH environment variable; make the build Thanks Bernhard M. Wiedemann for bringing this to our attention. Fixes open-mpi#3759 NOTE: This was cherry-picked from master, and slightly modified / Signed-off-by: Gilles Gouaillardet [email protected] MacOS does not have "readlink -f" or "realpath", so use the Signed-off-by: Jeff Squyres [email protected] There are several different flavors of date(1) out there. Try a few Signed-off-by: Jeff Squyres [email protected] Signed-off-by: Joseph Schuchart [email protected] Signed-off-by: Joshua Hursey [email protected] Signed-off-by: Joshua Hursey [email protected] Signed-off-by: Jeff Squyres [email protected] OPAL: fix string buffer allocation for large env variables [v4.1.x] v4.1.x: coll/adapt and coll/han: fix trivial compiler warnings Coverity complained about uninitialized variables; ensure that they Signed-off-by: Jeff Squyres [email protected] Slightly improve comments and update some whitespace. No code or logic changes. Signed-off-by: Jeff Squyres [email protected] v4.1.x: Keyval parse tweaks v4.1.x: reproducible builds + portability fix v4.1.x: Update Internal PMIx to OpenPMIx v3.2.1rc1 Signed-off-by: Jeff Squyres [email protected] NEWS: More updates for v4.1.0 PGI (20.4) compiler do not define this intrinsic, so only build Signed-off-by: Gilles Gouaillardet [email protected] Signed-off-by: Jeff Squyres [email protected] No code or logic changes. Add commit about why it's ok to use $srcdir here Signed-off-by: Jeff Squyres [email protected] v4.1.x: Some getdate.sh fixes
Signed-off-by: Sergey Oblomov [email protected] Conflicts: op/avx: check for _mm512_mullo_epi64() AVX512 intrinsic Fix mistake in orterun(1) (i.e., mpirun(1)) with an example using the This is not a cherry-pick from master because PRRTE has replaced ORTE Signed-off-by: Jeff Squyres [email protected] Add descriptive definitions of "slot" and "processor element" at the Also add a little blurb in the --use-hwthread-cpus description about how This is not a cherry-pick from master because PRRTE has replaced ORTE Signed-off-by: Jeff Squyres [email protected] Add some nroff markup into the paragraph, just to clearly delineate This is not a cherry-pick from master because PRRTE has replaced ORTE Signed-off-by: Jeff Squyres [email protected] v4.1.x: orterun.1in: fix minor mistake in :PE=2 example and add more descriptions/explanations If PMIX_PACKAGE_RANK is available, uses this value to select between multiple Some of the information in master branch is not available for the multi-NIC Signed-off-by: Nikola Dancejic [email protected] Ensure we always pass the cpuset as well as the locality string for each Signed-off-by: Ralph Castain [email protected] The mca parameters coll_tuned_*_algorithm are ignored unless coll_tuned_use_dynamic_rules is true so mention that in the description. Signed-off-by: Joseph Schuchart [email protected] Signed-off-by: Joseph Schuchart [email protected] Bcast: scatter_allgather and scatter_allgather_ring expect N_elem >= N_procs In all cases, the implementations will fall back to a linear implementation, Signed-off-by: Joseph Schuchart [email protected] v4.1.x: Using package_rank to select between NIC of equal distance from the process Signed-off-by: Joseph Schuchart [email protected] These selections seem harmful in my measurements and don't seem to be Signed-off-by: Joseph Schuchart [email protected] Signed-off-by: Joshua Hursey [email protected] Fix some issues with dynamic algorithm selection in coll/tuned This commit removes the unnecessary call to Signed-off-by: Raghu Raja [email protected] v4.1.x: Update Internal PMIx to OpenPMIx v3.2.1 [v4.1.x] mtl/ofi: Check cq_data_size without querying providers again Signed-off-by: Valentin Petrov [email protected] Mark the node as "unusable" so it does not get included when computing Signed-off-by: Ralph Castain [email protected] V4.1.x coll/hcoll: svatterv inplace fix MPI_Ialltoallw() and friends take a const MPI_Datatype types[] argument. Signed-off-by: Gilles Gouaillardet [email protected] PML/UCX: improved error processing in MPI_Recv - v4.1 v4.1.x: Correctly skip the "mpirun" node when launching orted on it The total size depends on number of ranks so the usual ranges don't work. Signed-off-by: Joseph Schuchart [email protected] COLL TUNED: Use per-rank data size instead of total size for decision [4.1.x] Signed-off-by: Pak Lui [email protected] v4.1.x: oshmem/tools/oshmem_info: fix an issue with fortran keyword when comp… There are no manpages in v3.2. Signed-off-by: Ralph Castain [email protected] Remove PMIx man page setup The selectable list is sorted with lowest to highest priority so the Signed-off-by: Joseph Schuchart [email protected] Signed-off-by: Joseph Schuchart [email protected] Also make coll/tuned the default for shared memory communication Signed-off-by: Joseph Schuchart [email protected] This has shown to be more effective in achieving overlap Signed-off-by: Joseph Schuchart [email protected] Fix preference treatment in coll/base [v4.1.x] Signed-off-by: Joseph Schuchart [email protected] Signed-off-by: Joseph Schuchart [email protected]
Signed-off-by: Gilles Gouaillardet [email protected] This is a one-off commit for the release branches that fixes Signed-off-by: Gilles Gouaillardet [email protected] Thanks FX Coudert for reporting this issue and pointing Refs. open-mpi#8218 Signed-off-by: Gilles Gouaillardet [email protected] (back-ported from commit open-mpi/ompi@3f45ced) OSC RDMA: put memory for each process into separate pages [4.1.x] v4.1.x: configury reproducibility fixes v4.1.x: autogen.pl: patch libtool.m4 for OSX Big Sur Fixes open-mpi#8195. This PR doesn't fix all the warnings from open-mpi#8195, but This is an adaptation of 14aa5fa from Signed-off-by: Jeff Squyres [email protected] Release the hounds! Signed-off-by: Jeff Squyres [email protected] v4.1.x: fix many warnings VERSION: 4.1.0rc4 Exclude HAN, don't include it. Signed-off-by: Joseph Schuchart [email protected] v4.1.x: coll/han: fix coll preference selection in mca_coll_han_comm_create_new Signed-off-by: Leonid Genkin [email protected] Add some "const"s that needed to be applied here on the v4.1.x branch, Signed-off-by: Jeff Squyres [email protected] Remove the pack/unpack pragma around net/if.h on MacOS, which Signed-off-by: Brian Barrett [email protected] Open MPI doesn't support any transports on MacOS which require Fixes open-mpi#5671 Signed-off-by: Brian Barrett [email protected] v4.1.x: Fix missed compiler warnings Only get the locality string and output binding message when requested Signed-off-by: Ralph Castain [email protected] v4.1.x: Fix the verbose output in ess base Signed-off-by: Charles Shereda [email protected] Replace usage of the deprecated NB API of UCX with NBX v4.1.x: Fixed uninitialized memory access bug in base64 encoding Signed-off-by: Ralph Castain [email protected] PMIx reigstration callback functions are used when regitering PMIx This patch adjusts two such callback functions: Both of them employes the following code structure: static void xxx_callback(int status, } The workflow is:
The expected behavior of the registration callback functions therefore However, on ARM based systems, the expected behavior is not guaranteed To address this issue, this patch added a call to opal_atomic_wmb() Signed-off-by: Wei Zhang [email protected] [v4.1.x] ompi : add memory barrier in PMIx registration callback v4.1.x: Update PMIx to v3.2.2 Updating VERSION and NEWS for the 4.1.0rc5 release. Signed-off-by: Raghu Raja [email protected] VERSION: 4.1.0rc5 Assign all cpu's on node to the daemon Signed-off-by: Ralph Castain [email protected] v4.1.x: Update Slurm launch support this commit syncs ompio related directories in v4.1.x to master. The efforts to bring the lustre performance fixes and support for external32 data representation over were too overwhelming when dealing with every single pr individually. There are a very few minor modification that had to be done for syncing:
Tested so far with the ompio testsuite as well as hdf5-1.10.5 testsuite (testphdf5, t_shapesame, t_bigio) on an XFS file system. Signed-off-by: Edgar Gabriel [email protected] Huzzah! Signed-off-by: Jeff Squyres [email protected] ompio: resync v4.1 branch to master Signed-off-by: Jeff Squyres [email protected] v4.1.0: README and VERSION final updates Signed-off-by: Jeff Squyres [email protected] VERSION: Onward to v4.1.1
See open-mpi#6995 Signed-off-by: Bert Wesarg [email protected] Currently, mca_btl_ofi_put (get, aop, afop, acswp) will allocate In normal code path, this completion object when processing completion fi_write/fi_read/fi_atomic/fi_fetch_atomic/fi_compare_atomic, there will be no completion entry from libfabric, in this case the This patch address the issue by calling opal_free_list_return() in cherry picked from: 01f5d68 Signed-off-by: Wei Zhang [email protected] [v4.1.x] btl/ofi: fix memory leaks in error handling path oshmem/mca/sshmem: Fix build with
Signed-off-by: Joshua Hursey [email protected] Fixes open-mpi#8305 Fixes open-mpi#8340 Signed-off-by: George Bosilca [email protected] v4.1.x: Generalized request fix Signed-off-by: Tim Wickberg [email protected] If the user asks for a hostfile/hostlist inside of a managed allocation, Signed-off-by: Austen Lauria [email protected] For example: $. bsub -n 40 -m "node1 node2" mpirun -np 6 -host node1:2,node2:4 hostname would not map two hostname processes to node1 and four to node2. Signed-off-by: Austen Lauria [email protected] Signed-off-by: Joshua Hursey [email protected]
Signed-off-by: Joshua Hursey [email protected]
it is possible to skip some intrinsic tests by setting some environment variables to "no" before invoking configure:
try since the former is less likely to conflict with user provided CFLAGS Thanks Bart Oldeman for pointing this.
Refs. open-mpi#8323 Signed-off-by: Gilles Gouaillardet [email protected]
Due to the fact that some distro restrict the compiule architecture
Identify all the vectorial functions used and clasify them according to
Signed-off-by: George Bosilca [email protected] The test now has the ability to add a shift to all or to any of the Signed-off-by: George Bosilca [email protected] This patch added call to opal_set_using_threads() in orted/main.c, This is because orted used multiple threads. Without OPAL's multi-thread support, OPAL_RELEASE will not use This patch is applied to 4.1.x directly because orte has been Signed-off-by: Wei Zhang [email protected] Fix external PMIx v4.x check v4.1: Fix segv when launching with static ports v4.1.x: Fix a couple managed allocation issues. Bring the more flexible AVX* support in 4.1 [4.1.x] orte/orted: enable OPAL's mutli-thread support v4.1.x: mtl/ofi: Add mising cq_data_size in hints for ofi mtl icc does not define the AVX* macros if the corresponding -m architecture Signed-off-by: George Bosilca [email protected] This commit fixes a bug discovered while debugging issue open-mpi#8350 Running our testsuite on Mac OS revealed that posted a large number of non-blocking read/write operations leads to an error message on this platform. A fix is already available and will be committed shortly. The issue stems from limitations on macOs and the concurrent number of aio_read/aio_write operations that can be pending. While the code already handled that correctly for a single request, this bug exposed that the overall limited has to be respected across all pending requests. The solution is to invoke mca_common_ompio_progress if we cannot post new aio operations. Fixes issue open-mpi#8368 Signed-off-by: Edgar Gabriel [email protected] v4.1.x: Enable AVX support with Intel compilers v4.1.x: fbtl/posix: ensure progressing aio requests Revert "v4.1.x: Update Slurm launch support" opal_common_ucx_del_proc call fails if pmix doesn't implement fence_nb Signed-off-by: Sami Ilvonen [email protected] With this patch the best PML is selected earlier, before finalizing Signed-off-by: George Bosilca [email protected] For direct modex, all procs publish the selected pml module Signed-off-by: Dipti Kothari [email protected] Thanks to Andreas Lösel for bringing the outdated docs to our Signed-off-by: Jeff Squyres [email protected] Thanks to Andreas Lösel for raising the inaccurate statement to our Signed-off-by: Jeff Squyres [email protected] Signed-off-by: Ralph Castain [email protected] v4.1.x: MPI_Init_thread(3) man page updates v4.1.x: Update the PML selection/check logic to avoid direct modex "storms" Current implementation of pml check protocol causes extra (corresponds to master 36b64cb) Signed-off-by: Valentin Petrov [email protected] V4.1.x PML/UCX: don't do pml_check_selected call Benchmarks are showing better performance when not using the __atomic Signed-off-by: Nathan Hjelm [email protected] opal: disable the __atomic built-in atomics by default on AArch64 Download config.guess|sub from A future commit will install these files if they are newer than what Signed-off-by: Jeff Squyres [email protected] Per open-mpi#8410, have autogen.pl We also skip updating anything in the 3rd-party tree; we don't really Signed-off-by: Jeff Squyres [email protected] v4.1.x: Use newer config.guess / config.sub files when relevant Signed-off-by: Ralph Castain [email protected] In optimized builds, CFLAGS contains various optimizations such as -O3, To prevent CFLAGS from being polluted elsewhere in the make tree, build Fixes open-mpi#7757 Signed-off-by: Austen Lauria [email protected] Let Slurm know that our daemons are not MPI tasks v4.1.x: Make sure MPIR_Breakpoint() is compiled without CFLAGS. Add fence_nb to flux pmix MCA enums make it easier for users to see/set MCA flag values. Also For example, ompi_output shows all the valid values: Signed-off-by: Jeff Squyres [email protected] v4.1.x: op_avx: use MCA enum flags instead of integer values If MPI_MODE_SEQUENTIAL was used when opening the file, the special displacement MPI_DISPLACEMENT_CURRENT Signed-off-by: Edgar Gabriel [email protected] aio_return returns the number of bytes written/read, and can indicate a partial completion. Signed-off-by: Edgar Gabriel [email protected] you cannot access parts of a file if the file view contains a description This fix was triggered by an investigation into mpich/test/mpi/io/tst_fileview testcase. Signed-off-by: Edgar Gabriel [email protected] Pr/v4.1.x mpich3.4 tstsuite fixes This fixes a bug when ob1 was not selected as the pml but osc/rdma may be In the future btl/sm should be made more resilient. Fixes open-mpi#8434 Signed-off-by: Nathan Hjelm [email protected] Remove accidental double registration (which resulted in a double Signed-off-by: Jeff Squyres [email protected] opal_show_help() can dedup output across ranks when using mpirun. Print Signed-off-by: Raghu Raja [email protected] v4.1.x: op_avx: Fix MCA enum flags osc/rdma: ensure bml add_procs has been called for all local procs common/ofi: Use opal_show_help() to call out lack of locality info v4.1.x: UCX: initialize cuda from ucx pml component Signed-off-by: Raghu Raja [email protected] NEWS and VERSION updates for 4.1.1rc1 Signed-off-by: Ralph Castain [email protected] v4.1.x: Update PMIx to v3.2.3 This will replace the old "Signed-off-by checker" and "Commit email
Signed-off-by: Jeff Squyres [email protected] Signed-off-by: Jeff Squyres [email protected] Signed-off-by: Jeff Squyres [email protected] v4.1.x: git commit checker GitHub action This commit fixes a number of bugs in the handling of derived
References open-mpi#6275 Signed-off-by: Gilles Gouaillardet [email protected] This commit rearranges the accumulate code so that network AMOs can be Signed-off-by: Nathan Hjelm [email protected] This commit fixes a resource leak when using network atomics. There Signed-off-by: Nathan Hjelm [email protected] This fixes an issue in osc/rdma when AMOs are used for accumulate operations
Signed-off-by: Nathan Hjelm [email protected] Signed-off-by: Howard Pritchard [email protected] There were some instances where the exclusive lock needed some Signed-off-by: Austen Lauria [email protected] Osc fixes backport v4.1.x The functionality was migrated to Closes open-mpi#8508. [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=898743 Signed-off-by: Bert Wesarg [email protected] Signed-off-by: Yossi Itigin [email protected] Signed-off-by: Aboorva Devarajan [email protected] v4.1.x: ucx: disable version 1.8 Prevent a "deadlock" scenario, when one of the processes leave the Signed-off-by: George Bosilca [email protected] Prevent the establishment of new BTL connections during matching This completes and fixes current code for coll/Han:
Fix:
Signed-off-by: Emmanuel Brelle [email protected] v4.1.x: pml/ob1: fix build issue in CUDA path v4.1.x: Bull 2020 update of coll/han Github shows both the "outer" and "inner" names on the CI line in the Signed-off-by: Jeff Squyres [email protected] v4.1.x: git commit checker better name Add "pml_ucx_tls" parameter to control the transports to include or Add "pml_ucx_devices" parameter to control the devices which make UCX Signed-off-by: Yossi Itigin [email protected] v4.1.x: ucx: check supported transports and devices for setting priority Signed-off-by: Joshua Hursey [email protected] v4.1.x: fs/lustre: Remove unneeded includes Signed-off-by: Joshua Hursey [email protected] v4.1: Check for librt when building LSF support
Signed-off-by: Sergey Oblomov [email protected] To avoid checking sentinel process pointers to the original Signed-off-by: Aboorva Devarajan [email protected] Make sure the definition of the MPIR_Proctable Otherwise, the debugger (such as gdb) won't know Since the MPIR_proctable should be accessed from See issue: open-mpi#8563 Signed-off-by: Austen Lauria [email protected] v4.1.x: Fix case where debuggers cannot read the MPIR proctable. v4.1.x: ompi/group: fix proc pointer comparison in groups SPML/UCX: removed direct dependency to SPML UCX - v4.1 Signed-off-by: George Bosilca [email protected] v4.1.x: A new binomial scatter using packed data on intermediary processes. in order to work around a bug in older gcc versions on x86_64, It was recently found that this did introduce some performance regression, So simply use an asm memory globber to both workaround older gcc bugs Thanks S. Biplab Raut for bringing this issue to our attention. Refs. open-mpi#8603 Signed-off-by: Gilles Gouaillardet [email protected] (cherry picked from commit d7e3f87) Signed-off-by: Joshua Hursey [email protected] v4.1.x: gcc_builtin: fix performance regression on x86_64 v4.1.x: Fix/Cleanup the return value documentation for mpirun Signed-off-by: Raghu Raja [email protected] This PR reduces memory consumption in non-root and non-leaf processes of binomial tree algorithm for Scatter operation. Signed-off-by: Mikhail Kurnosov [email protected] Signed-off-by: Raghu Raja [email protected] v4.1.x: coll/base: reduce memory consumption in Scatter NEWS and VERSION updates for 4.1.1rc2 The text seems to have been copied from MPI_Win_allocate and was Signed-off-by: Joseph Schuchart [email protected] Signed-off-by: Yossi Itigin [email protected] The builtins used by default on Power have been This changes the defaults for all compilers sans xl, including: Previously, all of the above were using C11 or Bonus: Signed-off-by: Austen Lauria [email protected] Fix man page for MPI_Win_attach [4.1.x] v4.1.x: Powerpc atomics: Force usage of powerpc assembly. pml/ucx: ignore request leak by default, override by mca param This is a one-off commit for the release branch. Signed-off-by: Gilles Gouaillardet [email protected] ofi: fix typo in macro name Code snippet appears to be C not Fortran. Signed-off-by: Harumi Kuno [email protected] Actual file names have substring: xor_to_all Signed-off-by: Harumi Kuno [email protected] v4.1.x: man pages updates A performance regression was reported when using the workaround So only use the workaround on x86_64 when a busted GCC compiler is used. Thanks S. Biplab Raut for reporting this issue. Signed-off-by: Gilles Gouaillardet [email protected] (back-ported from commit open-mpi/ompi@711c8c2) v4.1.x: atomic/gcc_builtin: only apply the workaround when required. Fixes open-mpi#7308 Signed-off-by: Ralph Castain [email protected] When configured Signed-off-by: Ralph Castain [email protected] Add the userid to the vader backing file path
Signed-off-by: Sergey Oblomov [email protected] This patch is only in v4.x as code in v5.x was rewritten to use FI_HMEM Refs: 8762 [v4.1.x] mtl/ofi: Disable CUDA convertor for specified ofi providers OSHMEM/SEGMENT-REGISTRATION: added segment filtering - V4.1 Retrieve cpuset when configured with pmix rte Signed-off-by: Brian Barrett [email protected] dist: Prep for 4.1.1rc3 See open-mpi#8823 for the details. Signed-off-by: Artem Polyakov [email protected] See open-mpi#8823 for more details. Signed-off-by: Artem Polyakov [email protected] Signed-off-by: George Bosilca [email protected] Signed-off-by: Christoph Niethammer [email protected] Signed-off-by: Austen Lauria [email protected] This commit fixes the support for heterogeneous environments and Signed-off-by: George Bosilca [email protected] When unpacking a partial predefined element check the boundaries of the Signed-off-by: George Bosilca [email protected] Signed-off-by: George Bosilca [email protected] bot:notacherrypick pmix/pmix3x: Fix internal PMIx discovery logic. Backport/datatype Signed-off-by: Brian Barrett [email protected] dist: Update VERSION and README for v4.1.1rc4 📝 Please access here to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment: /check-cla to verify. Thanks.
|
|
|
No description provided.