Skip to content

Commit

Permalink
Spelling fixes and CI updates (#31)
Browse files Browse the repository at this point in the history
* updated to test the latest HDF5 release
* fixed spelling and added spell checker to GH actions
  • Loading branch information
brtnfld authored Sep 3, 2024
1 parent 3e52d5a commit 9c90cc3
Show file tree
Hide file tree
Showing 20 changed files with 71 additions and 51 deletions.
6 changes: 6 additions & 0 deletions .codespellrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Ref: https://github.com/codespell-project/codespell#using-a-config-file
[codespell]
skip = .git,.codespellrc
check-hidden = true
#ignore-regex =
#ignore-words-list =
14 changes: 14 additions & 0 deletions .github/workflows/codespell.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# GitHub Action to automate the identification of common misspellings in text files
# https://github.com/codespell-project/codespell
# https://github.com/codespell-project/actions-codespell
name: codespell
on: [push, pull_request]
permissions:
contents: read
jobs:
codespell:
name: Check for spelling errors
runs-on: ubuntu-latest
steps:
- uses: actions/[email protected]
- uses: codespell-project/actions-codespell@master
8 changes: 4 additions & 4 deletions .github/workflows/hdf5-latest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,19 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout vol-cache
uses: actions/checkout@v4.1.1
uses: actions/checkout@v3
- name: Checkout latest HDF5 release
run: |
wget https://github.com/HDFGroup/hdf5/releases/latest/download/hdf5.tar.gz
tar xzf hdf5.tar.gz
ln -sf hdf5-* hdf5
- name: Checkout Argobots
uses: actions/checkout@v4.1.1
uses: actions/checkout@v3
with:
repository: pmodels/argobots
path: abt
- name: Checkout vol-async
uses: actions/checkout@v4.1.1
uses: actions/checkout@v3
with:
repository: hpc-io/vol-async
path: vol-async
Expand Down Expand Up @@ -84,7 +84,7 @@ jobs:
ctest --output-on-failure
- name: Upload
uses: actions/upload-artifact@v4
uses: actions/upload-artifact@v3
with:
name: git.txt
path: ${{ runner.workspace }}/vol-cache/hdf5/git.txt
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ cd hdf5
make all install
```

When running configure, ake sure you **DO NOT** have the option "--disable-shared".
When running configure, make sure you **DO NOT** have the option "--disable-shared".

### Build Argobots library

Expand Down Expand Up @@ -118,7 +118,7 @@ Currently, we use environmental variables to enable and disable the cache functi
### Parallel write

* **write_cache.cpp** is the benchmark code for evaluating the parallel write performance. In this testing case, each MPI rank has a local
buffer BI to be written into a HDF5 file organized in the following way: [B0|B1|B2|B3]|[B0|B1|B2|B3]|...|[B0|B1|B2|B3]. The repeatition of [B0|B1|B2|B3] is the number of iterations
buffer BI to be written into a HDF5 file organized in the following way: [B0|B1|B2|B3]|[B0|B1|B2|B3]|...|[B0|B1|B2|B3]. The repetition of [B0|B1|B2|B3] is the number of iterations
* --dim D1 D2: dimension of the 2D array [BI] // this is the local buffer size
* --niter NITER: number of iterations. Notice that the data is accumulately written to the file.
* --scratch PATH: the location of the raw data
Expand Down Expand Up @@ -146,7 +146,7 @@ This will generate a hdf5 file, images.h5, which contains 8192 samples. Each 224

For the read benchmark, it is important to isolate the DRAM caching effect. By default, during the first iteration, the system will cache all the data on the memory (RSS), unless the memory capacity is not big enough to cache all the data. This ends up with a very high bandwidth at second iteration, and it is independent of where the node-local storage are.

To remove the cache / buffering effect for read benchmarks, one can allocate a big array that is close to the size of the RAM, so that it does not have any extra space to cache the input HDF5 file. This can be achieve by setting ```MEMORY_PER_PROC``` (memory per process in Giga Byte). **However, this might cause the compute node to crash.** The other way is to read dummpy files by seeting ```CACHE_NUM_FILES``` (number of dummpy files to read per process).
To remove the cache / buffering effect for read benchmarks, one can allocate a big array that is close to the size of the RAM, so that it does not have any extra space to cache the input HDF5 file. This can be achieve by setting ```MEMORY_PER_PROC``` (memory per process in Giga Byte). **However, this might cause the compute node to crash.** The other way is to read dummpy files by setting ```CACHE_NUM_FILES``` (number of dummpy files to read per process).

## Citation
If you use Cache VOL, please cite the following paper
Expand Down
4 changes: 2 additions & 2 deletions benchmarks/prepare_dataset.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */

/*
This file is for testing reading the data set in parallel in data paralle
This file is for testing reading the data set in parallel in data parallel
training. We assume that the dataset is in a single HDF5 file. Each dataset is
stored in the following way:
Expand All @@ -19,7 +19,7 @@
When we read the data, each rank will read a batch of sample randomly or
contiguously from the HDF5 file. Each sample has a unique id associate with
it. At the begining of epoch, we mannually partition the entire dataset with
it. At the beginning of epoch, we manually partition the entire dataset with
nproc pieces - where nproc is the number of workers.
*/
Expand Down
12 changes: 6 additions & 6 deletions benchmarks/read_cache.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */

/*
This code is to prototying the idea of incorparating node-local storage
This code is for prototyping the idea of incorporating node-local storage
into repeatedly read workflow. We assume that the application is reading
the same dataset periodically from the file system. Out idea is to bring
the data to the node-local storage in the first iteration, and read from
Expand Down Expand Up @@ -119,17 +119,17 @@ void clear_cache(char *rank) {

using namespace std;

int msleep(long miliseconds) {
int msleep(long milliseconds) {
struct timespec req, rem;

if (miliseconds > 999) {
req.tv_sec = (int)(miliseconds / 1000); /* Must be Non-Negative */
req.tv_nsec = (miliseconds - ((long)req.tv_sec * 1000)) *
if (milliseconds > 999) {
req.tv_sec = (int)(milliseconds / 1000); /* Must be Non-Negative */
req.tv_nsec = (milliseconds - ((long)req.tv_sec * 1000)) *
1000000; /* Must be in range of 0 to 999999999 */
} else {
req.tv_sec = 0; /* Must be Non-Negative */
req.tv_nsec =
miliseconds * 1000000; /* Must be in range of 0 to 999999999 */
milliseconds * 1000000; /* Must be in range of 0 to 999999999 */
}
return nanosleep(&req, &rem);
}
Expand Down
10 changes: 5 additions & 5 deletions benchmarks/write.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -44,17 +44,17 @@ void mkdirRecursive(const char *path, mode_t mode) {
mkdir(opath, mode);
}

int msleep(long miliseconds) {
int msleep(long milliseconds) {
struct timespec req, rem;

if (miliseconds > 999) {
req.tv_sec = (int)(miliseconds / 1000); /* Must be Non-Negative */
req.tv_nsec = (miliseconds - ((long)req.tv_sec * 1000)) *
if (milliseconds > 999) {
req.tv_sec = (int)(milliseconds / 1000); /* Must be Non-Negative */
req.tv_nsec = (milliseconds - ((long)req.tv_sec * 1000)) *
1000000; /* Must be in range of 0 to 999999999 */
} else {
req.tv_sec = 0; /* Must be Non-Negative */
req.tv_nsec =
miliseconds * 1000000; /* Must be in range of 0 to 999999999 */
milliseconds * 1000000; /* Must be in range of 0 to 999999999 */
}
return nanosleep(&req, &rem);
}
Expand Down
10 changes: 5 additions & 5 deletions benchmarks/write_cache.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,17 +31,17 @@
#include <unistd.h>
//#include "h5_async_lib.h"

int msleep(long miliseconds) {
int msleep(long milliseconds) {
struct timespec req, rem;

if (miliseconds > 999) {
req.tv_sec = (int)(miliseconds / 1000); /* Must be Non-Negative */
req.tv_nsec = (miliseconds - ((long)req.tv_sec * 1000)) *
if (milliseconds > 999) {
req.tv_sec = (int)(milliseconds / 1000); /* Must be Non-Negative */
req.tv_nsec = (milliseconds - ((long)req.tv_sec * 1000)) *
1000000; /* Must be in range of 0 to 999999999 */
} else {
req.tv_sec = 0; /* Must be Non-Negative */
req.tv_nsec =
miliseconds * 1000000; /* Must be in range of 0 to 999999999 */
milliseconds * 1000000; /* Must be in range of 0 to 999999999 */
}
return nanosleep(&req, &rem);
}
Expand Down
2 changes: 1 addition & 1 deletion docs/pdf-docs/cache_vol.tex
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ \subsection{Cache management policy and APIs}
\item \function{H5Fcache\_create} -- creating a cache in the system’s local storage if it is not yet been created. This is for parallel write.
\item \function{H5Fcache\_remove} -- removing the cache associated with the file in the system's local storage (This will call \function{H5LSremove\_cache}). After \function{H5Fcache\_remove} is called, \function{H5Dwrite} will directly write data to the parallel file system.
\end{itemize}
\item Dataset cace related functions (for read only)
\item Dataset cache related functions (for read only)
\begin{itemize}
\item \function{H5Dcache\_create}-- reserving space for the data
\item \function{H5Dcache\_remove} -- clearing the cache on the local storage related to the dataset Besides these, we will also have the following two functions for prefetching / reading data from the cache
Expand Down
2 changes: 1 addition & 1 deletion docs/source/bestpractices.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Write workloads
1) MPI Thread multiple should be enabled for optimal performance;
2) There should be enough compute work after the H5Dwrite calls to overlap with the data migration from the fast storage layer to the parallel file system;
3) The compute work should be inserted in between H5Dwrite and H5Dclose. For iterative checkpointing workloads, one can postpone the dataset close and group close calls after next iteration of compute. The API functions are provided to do this.
4) If there are multiple H5Dwrite calls issued consecutatively, one should pause the async excution first and then restart the async execution after all the H5Dwrite calls were issued.
4) If there are multiple H5Dwrite calls issued consecutatively, one should pause the async execution first and then restart the async execution after all the H5Dwrite calls were issued.
5) For check pointing workloads, it is better to open / close the file only once to avoid unnecessary overhead on setting and removing file caches.

An application may have the following HDF5 operations to write check point data:
Expand Down
2 changes: 1 addition & 1 deletion docs/source/cacheapi.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Beside using environment variable setup, the Cache VOL connector provides a set

This enable the finer control of the cache effect for any specific file through the file access property list. The environment variable "HDF5_CACHE_WR" and "HDF5_CACHE_RD" will enable or disable the cache effect for all the files. In our design, the environment variable override the specific setting from the file access property list.

* Pause/restart all async data migration operations. This is particular useful for the cases when we have multiple writes lauched consecutively. One can pause the async execution before the dataset write calls, and then start the async execution. This allows the main thread to stage all the data from different writes at once and then the I/O thread starts migrating them to the parallel file system, to avoid potential contension effect between the main thread and the I/O thread.
* Pause/restart all async data migration operations. This is particular useful for the cases when we have multiple writes launched consecutively. One can pause the async execution before the dataset write calls, and then start the async execution. This allows the main thread to stage all the data from different writes at once and then the I/O thread starts migrating them to the parallel file system, to avoid potential contension effect between the main thread and the I/O thread.

.. code-block::
Expand Down
2 changes: 1 addition & 1 deletion docs/source/gettingstarted.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Some configuration parameters used in the instructions:
export HDF5_VOL_DIR=/path/to/vols/install/dir
export ABT_DIR=/path/to/argobots/install/dir
We suggest the user to put all the VOL dynamic libraries (such as async, cache_ext, daos, ect) into the same folder: HDF5_VOL_DIR to allow stacking multiple connectors.
We suggest the user to put all the VOL dynamic libraries (such as async, cache_ext, daos, etc) into the same folder: HDF5_VOL_DIR to allow stacking multiple connectors.

Installation
============
Expand Down
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
HDF5 Cache VOL Connector
===============================================================

As the scientific computing enters in the to exascale and big dataera, the amount of data produced by the simulation is significantly increased. Meanwhile, data analytics and artificial intelligence have risen up to become two important pillars in scientific computing, both of which are data intensive workloads. Therefore, being able to store or load data efficiently to and from the storage system becomes increasingly important to scientific computing. On the hardware level, many pre-exascale and exascale systems are designed to be equiped with fast storage layer in between the compute node memory and the parallel file system. Examples include burst buffer NVMes SSDs on Summit, Theta, Polaris and the upcoming Frontier system. It is a challenging problem to effectively incorparate these fast storage layer to improve the parallel I/O performance.
As the scientific computing enters in the to exascale and big dataera, the amount of data produced by the simulation is significantly increased. Meanwhile, data analytics and artificial intelligence have risen up to become two important pillars in scientific computing, both of which are data intensive workloads. Therefore, being able to store or load data efficiently to and from the storage system becomes increasingly important to scientific computing. On the hardware level, many pre-exascale and exascale systems are designed to be equipped with fast storage layer in between the compute node memory and the parallel file system. Examples include burst buffer NVMes SSDs on Summit, Theta, Polaris and the upcoming Frontier system. It is a challenging problem to effectively incorporate these fast storage layer to improve the parallel I/O performance.

We design a HDF5 Cache Virtual Object Layer (VOL) connector that provides support for reading and writing data directly from / to the fast storage layer, while performing the data migration between the fast storage layer and permanent global parallel file system in the background, to allow hiding majority of the I/O overhead behind the computation of the application. The VOL framework provides an easy-to-use programming interface for the application to adapt.

Expand Down
Loading

0 comments on commit 9c90cc3

Please sign in to comment.