Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-organization of tutorials #284

Merged
merged 10 commits into from
Feb 5, 2024
File renamed without changes.
8 changes: 4 additions & 4 deletions tutorials/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
# Learn more at https://jupyterbook.org/customize/toc.html

format: jb-book
root: noisepy_scedc_tutorial.ipynb
root: noise_configuration.md
chapters:
- file: noise_configuration.md
- file: noisepy_datastore.ipynb
- file: get_started.ipynb
- file: noisepy_datastore.ipynb
- file: noisepy_scedc_tutorial.ipynb
- file: CLI.md
- file: cloud/aws.md
- file: README
Binary file added tutorials/asdf_data/CI.ABL/CI.ABL_CI.ABL.h5
Binary file not shown.
Binary file added tutorials/asdf_data/CI.ABL/CI.ABL_CI.ACP.h5
Binary file not shown.
Binary file added tutorials/asdf_data/CI.ABL/CI.ABL_CI.ADO.h5
Binary file not shown.
Binary file added tutorials/asdf_data/CI.ABL/CI.ABL_CI.AGM.h5
Binary file not shown.
Binary file added tutorials/asdf_data/CI.ABL/CI.ABL_CI.ALP.h5
Binary file not shown.
Binary file added tutorials/asdf_data/CI.ABL/CI.ABL_CI.ARV.h5
Binary file not shown.
Binary file added tutorials/asdf_data/CI.ABL/CI.ABL_CI.AVM.h5
Binary file not shown.
Binary file added tutorials/asdf_data/CI.ABL/CI.ABL_CI.BAI.h5
Binary file not shown.
Binary file added tutorials/asdf_data/CI.ABL/CI.ABL_CI.BAK.h5
Binary file not shown.
Binary file added tutorials/asdf_data/CI.ABL/CI.ABL_CI.BAR.h5
Binary file not shown.
26 changes: 18 additions & 8 deletions tutorials/noise_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,32 +4,42 @@ Welcome to NoisePy!

**Noisepy** is a software to compute large-scale cross correlations for HPC and Cloud infrastructure. The difference in using Noisepy for either infrastructure is the back-end data format that are either file-system (H5) or object-storage (npz/mseed) optimzed.

**NoisePy** also offers tools for ambient noise monitoring (velocity and attenuation) and for Earth imaging (measuring phase and group velocities).
**NoisePy** also offers tools for ambient noise monitoring (velocity and attenuation) and for Earth imaging (measuring phase and group velocities).

NoisePy leverages several efforts published, please consider
NoisePy leverages several efforts published, please consider

* Jiang, C., Denolle, M. 2020. NoisePy: a new high-performance python tool for ambient noise seismology. Seismological Research Letters. 91, 1853-1866. https://doi.10.1785/0220190364.
* Yuan C, Bryan J, Denolle M. Numerical comparison of time-, frequency-and wavelet-domain methods for coda wave interferometry. Geophysical Journal International. 2021 Aug;226(2):828-46. https://doi.org/10.1093/gji/ggab140
* Yang X, Bryan J, Okubo K, Jiang C, Clements T, Denolle MA. Optimal stacking of noise cross-correlation functions. Geophysical Journal International. 2023 Mar;232(3):1600-18. https://doi.org/10.1093/gji/ggac410


We gratefully acknowledge support from the [Packard Fundation](https://www.packard.org)
We gratefully acknowledge support from the [Packard Foundation](https://www.packard.org)


## NoisePy Workflow

Noisepy uses various steps:
0. [optional] data download: for users who want to work entirely locally, this step prepares and organize the data in a ``DataStore``.
1. Cross correlations: data may be streamed from the DataStore, which can be hosted on the Cloud, pre-processing and cross correlations are done for each time chunk (e.g., one day for broadband data). Cross-correlations are saved for each time chunck in ``CCStore``.
<img src="./docs_old/figures/data_flow.png">
The data processing in NoisePy consists of three steps:

1. **(Optional) Step 0 - Download**: The `download()` function or the `noisepy download` CLI command can be used to download data from an FDSN web service. Alternatively, data from an [S3 bucket](https://s3.console.aws.amazon.com/s3/buckets/scedc-pds) can be copied locally using the `aws` CLI, or streamed directly from S3. for users who want to work entirely locally, this step prepares and organize the data in a ``DataStore``.
2. **Step 1 - Cross Correlation**: Computes cross correlaton for pairs of stations/channels. This can done with either the `cross_correlate()` function or the `noisepy cross_correlate` CLI command.
3. **Step 2 - Stacking**: This steps takes the cross correlation computations across multiple timespans and stacks them for a given station/channel pair. This can done with either the `stack_cross_correlations()` function or the `noisepy stack` CLI command.

### Data Storage

NoisePy accesses data through 3 "DataStore" abstract classes: `DataStore`, `CrossCorrelationDataStore` and `StackStore`. Concrete implementations are provided for ASDF (H5), miniSEED, Zarr, TileDB, npy formats.

0. [optional] data download: for users who want to work entirely locally, this step prepares and organize the data in a ``RawDataStore``.
1. Cross correlations: data may be streamed from the DataStore, which can be hosted on the Cloud, pre-processing and cross correlations are done for each time chunk (e.g., one day for broadband data). Cross-correlations are saved for each time chunck in ``CrossCorrelationDataStore``.
2. Stacking: Data is aggregated and stacked over all time periods. Stacked data will be stored in ``StackStore``.

Workflow is described in the figure below.
<img src="../docs_old/figures/data_flow.png">

## Applications
### Monitoring
NoisePy includes various functions to measure dv/v. Please check the tutorials. The software will read the ``CCstore`` to aggregate and measure dv/v. The outputs are tabular data in CSV.
Link HERE.
NoisePy includes various functions to measure dv/v. Please check the tutorials. The software will read the ``CrossCorrelationDataStore`` to aggregate and measure dv/v. The outputs are tabular data in CSV.

### Imaging
NoisePy includes functions to measure phase and group velocity dispersion curve measurements. The software will read the ``StackStore`` and ouput curves as tabular data in CSV.

Expand Down
2 changes: 1 addition & 1 deletion tutorials/noisepy_scedc_tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -504,7 +504,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
"version": "3.10.12"
}
},
"nbformat": 4,
Expand Down
251 changes: 251 additions & 0 deletions tutorials/noisepy_scoped_download_s3_store.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,251 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Download S3-hosted Noisepy Data\n",
"\n",
"This notebook is designed to query cross-corelations data calculated by noisepy, hosted on S3, and downloaded locally.\n",
"\n",
"This notebook assumes that you have installed the noisepy package. It installs Python tools for MongoDB, queries our SCOPED data base, and parse the S3-hosted data into the ASDF H5 data format."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: pymongo in /Users/marinedenolle/miniconda3/envs/noisepy/lib/python3.10/site-packages (4.6.1)\n",
"Requirement already satisfied: dnspython<3.0.0,>=1.16.0 in /Users/marinedenolle/miniconda3/envs/noisepy/lib/python3.10/site-packages (from pymongo) (2.4.2)\n"
]
}
],
"source": [
"!pip install pymongo"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Pinged your deployment. You successfully connected to MongoDB!\n"
]
}
],
"source": [
"from pymongo.mongo_client import MongoClient\n",
"from pymongo.server_api import ServerApi\n",
"\n",
"uri = \"mongodb+srv://user:[email protected]/?retryWrites=true&w=majority\"\n",
"\n",
"# Create a new client and connect to the server\n",
"client = MongoClient(uri, server_api=ServerApi('1'))\n",
"\n",
"# Send a ping to confirm a successful connection\n",
"try:\n",
" client.admin.command('ping')\n",
" print(\"Pinged your deployment. You successfully connected to MongoDB!\")\n",
"except Exception as e:\n",
" print(e)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
mdenolle marked this conversation as resolved.
Show resolved Hide resolved
"text": [
"Station Pair Collection:\n",
"{'_id': ObjectId('656f38bbd5ca665876e72eb6'), 'station0': 'ABL', 'station1': 'ABL', 'starttime': '2022-01-01T00:00:00+00:00', 'endtime': '2023-01-01T00:00:00+00:00', 'storage_mode': 'url', 'url': 's3://scoped-noise/scedc_CI_2022_stack/CI.ABL/CI.ABL/2022_01_01_00_00_00T2023_01_01_00_00_00.tar.gz'}\n",
"{'_id': ObjectId('656f38bbd5ca665876e72eb7'), 'station0': 'ABL', 'station1': 'ACP', 'starttime': '2022-01-01T00:00:00+00:00', 'endtime': '2023-01-01T00:00:00+00:00', 'storage_mode': 'url', 'url': 's3://scoped-noise/scedc_CI_2022_stack/CI.ABL/CI.ACP/2022_01_01_00_00_00T2023_01_01_00_00_00.tar.gz'}\n",
"{'_id': ObjectId('656f38bbd5ca665876e72eb8'), 'station0': 'ABL', 'station1': 'ADO', 'starttime': '2022-01-01T00:00:00+00:00', 'endtime': '2023-01-01T00:00:00+00:00', 'storage_mode': 'url', 'url': 's3://scoped-noise/scedc_CI_2022_stack/CI.ABL/CI.ADO/2022_01_01_00_00_00T2023_01_01_00_00_00.tar.gz'}\n",
"{'_id': ObjectId('656f38bbd5ca665876e72eb9'), 'station0': 'ABL', 'station1': 'AGM', 'starttime': '2022-01-01T00:00:00+00:00', 'endtime': '2023-01-01T00:00:00+00:00', 'storage_mode': 'url', 'url': 's3://scoped-noise/scedc_CI_2022_stack/CI.ABL/CI.AGM/2022_01_01_00_00_00T2023_01_01_00_00_00.tar.gz'}\n",
"{'_id': ObjectId('656f38bbd5ca665876e72eba'), 'station0': 'ABL', 'station1': 'ALP', 'starttime': '2022-01-01T00:00:00+00:00', 'endtime': '2023-01-01T00:00:00+00:00', 'storage_mode': 'url', 'url': 's3://scoped-noise/scedc_CI_2022_stack/CI.ABL/CI.ALP/2022_01_01_00_00_00T2023_01_01_00_00_00.tar.gz'}\n",
"{'_id': ObjectId('656f38bbd5ca665876e72ebb'), 'station0': 'ABL', 'station1': 'ARV', 'starttime': '2022-01-01T00:00:00+00:00', 'endtime': '2023-01-01T00:00:00+00:00', 'storage_mode': 'url', 'url': 's3://scoped-noise/scedc_CI_2022_stack/CI.ABL/CI.ARV/2022_01_01_00_00_00T2023_01_01_00_00_00.tar.gz'}\n",
"{'_id': ObjectId('656f38bbd5ca665876e72ebc'), 'station0': 'ABL', 'station1': 'AVM', 'starttime': '2022-01-01T00:00:00+00:00', 'endtime': '2023-01-01T00:00:00+00:00', 'storage_mode': 'url', 'url': 's3://scoped-noise/scedc_CI_2022_stack/CI.ABL/CI.AVM/2022_01_01_00_00_00T2023_01_01_00_00_00.tar.gz'}\n",
"{'_id': ObjectId('656f38bbd5ca665876e72ebd'), 'station0': 'ABL', 'station1': 'BAI', 'starttime': '2022-01-01T00:00:00+00:00', 'endtime': '2023-01-01T00:00:00+00:00', 'storage_mode': 'url', 'url': 's3://scoped-noise/scedc_CI_2022_stack/CI.ABL/CI.BAI/2022_01_01_00_00_00T2023_01_01_00_00_00.tar.gz'}\n",
"{'_id': ObjectId('656f38bbd5ca665876e72ebe'), 'station0': 'ABL', 'station1': 'BAK', 'starttime': '2022-01-01T00:00:00+00:00', 'endtime': '2023-01-01T00:00:00+00:00', 'storage_mode': 'url', 'url': 's3://scoped-noise/scedc_CI_2022_stack/CI.ABL/CI.BAK/2022_01_01_00_00_00T2023_01_01_00_00_00.tar.gz'}\n",
"{'_id': ObjectId('656f38bbd5ca665876e72ebf'), 'station0': 'ABL', 'station1': 'BAR', 'starttime': '2022-01-01T00:00:00+00:00', 'endtime': '2023-01-01T00:00:00+00:00', 'storage_mode': 'url', 'url': 's3://scoped-noise/scedc_CI_2022_stack/CI.ABL/CI.BAR/2022_01_01_00_00_00T2023_01_01_00_00_00.tar.gz'}\n",
"\n",
"Station Collection:\n",
"{'_id': ObjectId('656eac931fec6658167a34fd'), 'name': 'ABL', 'elevation': -1.7976931348623157e+308, 'lat': -1.7976931348623157e+308, 'location': '', 'lon': -1.7976931348623157e+308, 'network': 'CI'}\n",
"ABL\n",
"{'_id': ObjectId('656eac931fec6658167a3546'), 'name': 'ACP', 'elevation': -1.7976931348623157e+308, 'lat': -1.7976931348623157e+308, 'location': '', 'lon': -1.7976931348623157e+308, 'network': 'CI'}\n",
"ACP\n",
"{'_id': ObjectId('656eac931fec6658167a355d'), 'name': 'ADO', 'elevation': -1.7976931348623157e+308, 'lat': -1.7976931348623157e+308, 'location': '', 'lon': -1.7976931348623157e+308, 'network': 'CI'}\n",
"ADO\n",
"{'_id': ObjectId('656eac931fec6658167a3575'), 'name': 'AGM', 'elevation': -1.7976931348623157e+308, 'lat': -1.7976931348623157e+308, 'location': '', 'lon': -1.7976931348623157e+308, 'network': 'CI'}\n",
"AGM\n",
"{'_id': ObjectId('656eac931fec6658167a3597'), 'name': 'ALP', 'elevation': -1.7976931348623157e+308, 'lat': -1.7976931348623157e+308, 'location': '', 'lon': -1.7976931348623157e+308, 'network': 'CI'}\n",
"ALP\n",
"{'_id': ObjectId('656eac931fec6658167a35f4'), 'name': 'ARV', 'elevation': -1.7976931348623157e+308, 'lat': -1.7976931348623157e+308, 'location': '', 'lon': -1.7976931348623157e+308, 'network': 'CI'}\n",
"ARV\n",
"{'_id': ObjectId('656eac931fec6658167a3650'), 'name': 'AVM', 'elevation': -1.7976931348623157e+308, 'lat': -1.7976931348623157e+308, 'location': '', 'lon': -1.7976931348623157e+308, 'network': 'CI'}\n",
"AVM\n",
"{'_id': ObjectId('656eac931fec6658167a369b'), 'name': 'BAI', 'elevation': -1.7976931348623157e+308, 'lat': -1.7976931348623157e+308, 'location': '', 'lon': -1.7976931348623157e+308, 'network': 'CI'}\n",
"BAI\n",
"{'_id': ObjectId('656eac931fec6658167a3702'), 'name': 'BAK', 'elevation': -1.7976931348623157e+308, 'lat': -1.7976931348623157e+308, 'location': '', 'lon': -1.7976931348623157e+308, 'network': 'CI'}\n",
"BAK\n",
"{'_id': ObjectId('656eac931fec6658167a374c'), 'name': 'BAR', 'elevation': -1.7976931348623157e+308, 'lat': -1.7976931348623157e+308, 'location': '', 'lon': -1.7976931348623157e+308, 'network': 'CI'}\n",
"BAR\n"
]
}
],
"source": [
"db = client.scoped_noise\n",
"station_pair_collection = db.station_pair\n",
"station_collection = db.station\n",
"\n",
"# Query the first 10 records in the station_pair collection\n",
"station_pair_records = station_pair_collection.find().limit(10)\n",
"\n",
"print(\"Station Pair Collection:\")\n",
"for record in station_pair_records:\n",
" print(record)\n",
"\n",
"# Query the first 10 records in the station collection\n",
"station_records = station_collection.find().limit(10)\n",
"\n",
"print(\"\\nStation Collection:\")\n",
"for record in station_records:\n",
" print(record)\n",
" sta_source = record[\"name\"]\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2023-12-06 14:30:45,060 4717313536 INFO numpystore.__init__(): store creating at s3://scoped-noise/scedc_CI_2022_stack/, mode=a, storage_options={'s3': {'anon': False}}\n",
"2023-12-06 14:30:45,061 4717313536 INFO numpystore.__init__(): Numpy store created at s3://scoped-noise/scedc_CI_2022_stack/\n",
"2023-12-06 14:30:52,062 4717313536 INFO hierarchicalstores._load_src(): Loading directory cache for CI.ABL - ix: 0\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Time to get station pairs: 6.9995436668396 seconds\n",
"Time to get timespans: 0.1572427749633789 seconds\n",
"Timespan: 2022-01-01T00:00:00+0000 - 2023-01-01T00:00:00+0000\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
". Memory: 388 MB: 100%|██████████| 10/10 [00:00<00:00, 31.01it/s]\n",
"2023-12-06 14:30:52,556 4717313536 INFO utils.log_raw(): TIMING: 0.3359 secs. for loading 10 stacks\n"
]
}
],
"source": [
"import os\n",
"import noisepy\n",
"from noisepy.seis.asdfstore import ASDFStackStore\n",
"from noisepy.seis.numpystore import NumpyStackStore\n",
"import time as time\n",
"\n",
"stack_data_path = \"s3://scoped-noise/scedc_CI_2022_stack/\"\n",
"S3_STORAGE_OPTIONS = {\"s3\": {\"anon\": False}}\n",
"stack_store = NumpyStackStore(stack_data_path, storage_options=S3_STORAGE_OPTIONS)\n",
"\n",
"# Get list of station pairs (~47k pairs)\n",
"t0=time.time()\n",
"pairs = stack_store.get_station_pairs()\n",
"t1=time.time() \n",
"print(f\"Time to get station pairs: {t1-t0} seconds\")\n",
"# Get the first timespan available for the first pair\n",
"t2=time.time()\n",
"ts = stack_store.get_timespans(*pairs[0])[0]\n",
"t3=time.time()\n",
"print(f\"Time to get timespans: {t3-t2} seconds\")\n",
"print(f\"Timespan: {ts}\")\n",
"\n",
"# Read some stacks (10?) from S3/numpy\n",
"stacks_10 = stack_store.read_bulk(ts, pairs[0:10]) \n",
"\n",
"# write them to ASDF\n",
"output= \"./asdf_data\"\n",
"os.makedirs(output, exist_ok=True)\n",
"asdf_store = ASDFStackStore(output)\n",
"for ((src,rec), stacks) in stacks_10:\n",
" asdf_store.append(ts, src, rec, stacks)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ASDF file [format version: 1.0.3]: 'asdf_data/CI.ABL/CI.ABL_CI.ABL.h5' (239.7 KB)\n",
"\tContains 0 event(s)\n",
"\tContains waveform data from 0 station(s).\n",
"\tContains 1 type(s) of auxiliary data: Allstack_linear\n"
]
},
{
"data": {
"text/plain": [
"(8001,)"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pyasdf\n",
"\n",
"\n",
"df = pyasdf.ASDFDataSet(\"./asdf_data/CI.ABL/CI.ABL_CI.ABL.h5\", mode=\"r\")\n",
"print(df)\n",
"df.auxiliary_data.Allstack_linear.ZZ.data.shape"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "noisepy",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading