noisepy · niyiyu · Feb 5, 2024 · Jan 29, 2024 · Jan 29, 2024 · Jan 29, 2024
@@ -2,10 +2,10 @@
 # Learn more at https://jupyterbook.org/customize/toc.html
 
 format: jb-book
-root: noisepy_scedc_tutorial.ipynb
+root: noise_configuration.md
 chapters:
-- file: noise_configuration.md
-- file: noisepy_datastore.ipynb
 - file: get_started.ipynb
+- file: noisepy_datastore.ipynb
+- file: noisepy_scedc_tutorial.ipynb
+- file: CLI.md
 - file: cloud/aws.md
-- file: README
diff --git a/tutorials/asdf_data/CI.ABL/CI.ABL_CI.ABL.h5 b/tutorials/asdf_data/CI.ABL/CI.ABL_CI.ABL.h5
diff --git a/tutorials/asdf_data/CI.ABL/CI.ABL_CI.ACP.h5 b/tutorials/asdf_data/CI.ABL/CI.ABL_CI.ACP.h5
diff --git a/tutorials/asdf_data/CI.ABL/CI.ABL_CI.ADO.h5 b/tutorials/asdf_data/CI.ABL/CI.ABL_CI.ADO.h5
diff --git a/tutorials/asdf_data/CI.ABL/CI.ABL_CI.AGM.h5 b/tutorials/asdf_data/CI.ABL/CI.ABL_CI.AGM.h5
diff --git a/tutorials/asdf_data/CI.ABL/CI.ABL_CI.ALP.h5 b/tutorials/asdf_data/CI.ABL/CI.ABL_CI.ALP.h5
diff --git a/tutorials/asdf_data/CI.ABL/CI.ABL_CI.ARV.h5 b/tutorials/asdf_data/CI.ABL/CI.ABL_CI.ARV.h5
diff --git a/tutorials/asdf_data/CI.ABL/CI.ABL_CI.AVM.h5 b/tutorials/asdf_data/CI.ABL/CI.ABL_CI.AVM.h5
diff --git a/tutorials/asdf_data/CI.ABL/CI.ABL_CI.BAI.h5 b/tutorials/asdf_data/CI.ABL/CI.ABL_CI.BAI.h5
diff --git a/tutorials/asdf_data/CI.ABL/CI.ABL_CI.BAK.h5 b/tutorials/asdf_data/CI.ABL/CI.ABL_CI.BAK.h5
diff --git a/tutorials/asdf_data/CI.ABL/CI.ABL_CI.BAR.h5 b/tutorials/asdf_data/CI.ABL/CI.ABL_CI.BAR.h5
@@ -4,32 +4,42 @@ Welcome to NoisePy!
 
 **Noisepy** is a software to compute large-scale cross correlations for HPC and Cloud infrastructure. The difference in using Noisepy for either infrastructure is the back-end data format that are either file-system (H5) or object-storage (npz/mseed) optimzed.
 
- **NoisePy** also offers tools for ambient noise monitoring (velocity and attenuation) and for Earth imaging (measuring phase and group velocities).
+**NoisePy** also offers tools for ambient noise monitoring (velocity and attenuation) and for Earth imaging (measuring phase and group velocities).
 
- NoisePy leverages several efforts published, please consider
+NoisePy leverages several efforts published, please consider
 
 * Jiang, C., Denolle, M. 2020. NoisePy: a new high-performance python tool for ambient noise seismology. Seismological Research Letters. 91, 1853-1866. https://doi.10.1785/0220190364.
 * Yuan C, Bryan J, Denolle M. Numerical comparison of time-, frequency-and wavelet-domain methods for coda wave interferometry. Geophysical Journal International. 2021 Aug;226(2):828-46.  https://doi.org/10.1093/gji/ggab140
 * Yang X, Bryan J, Okubo K, Jiang C, Clements T, Denolle MA. Optimal stacking of noise cross-correlation functions. Geophysical Journal International. 2023 Mar;232(3):1600-18. https://doi.org/10.1093/gji/ggac410
 
 
-We gratefully acknowledge support from the [Packard Fundation](https://www.packard.org)
+We gratefully acknowledge support from the [Packard Foundation](https://www.packard.org)
 
 
 ## NoisePy Workflow
 
-Noisepy uses various steps:
-0. [optional] data download: for users who want to work entirely locally, this step prepares and organize the data in a ``DataStore``.
-1. Cross correlations: data may be streamed from the DataStore, which can be hosted on the Cloud, pre-processing and cross correlations are done for each time chunk (e.g., one day for broadband data). Cross-correlations are saved for each time chunck in ``CCStore``.
+<img src="./docs_old/figures/data_flow.png">
+The data processing in NoisePy consists of three steps:
+
+1. **(Optional) Step 0 - Download**: The `download()` function or the `noisepy download` CLI command can be used to download data from an FDSN web service. Alternatively, data from an [S3 bucket](https://s3.console.aws.amazon.com/s3/buckets/scedc-pds) can be copied locally using the `aws` CLI, or streamed directly from S3. for users who want to work entirely locally, this step prepares and organize the data in a ``DataStore``.
+2. **Step 1 - Cross Correlation**: Computes cross correlaton for pairs of stations/channels. This can done with either the `cross_correlate()` function or the `noisepy cross_correlate` CLI command.
+3. **Step 2 - Stacking**: This steps takes the cross correlation computations across multiple timespans and stacks them for a given station/channel pair. This can done with either the `stack_cross_correlations()` function or the `noisepy stack` CLI command.
+
+### Data Storage
+
+NoisePy accesses data through 3 "DataStore" abstract classes: `DataStore`, `CrossCorrelationDataStore` and `StackStore`. Concrete implementations are provided for ASDF (H5), miniSEED, Zarr, TileDB, npy formats.
+
+0. [optional] data download: for users who want to work entirely locally, this step prepares and organize the data in a ``RawDataStore``.
+1. Cross correlations: data may be streamed from the DataStore, which can be hosted on the Cloud, pre-processing and cross correlations are done for each time chunk (e.g., one day for broadband data). Cross-correlations are saved for each time chunck in ``CrossCorrelationDataStore``.
 2. Stacking: Data is aggregated and stacked over all time periods. Stacked data will be stored in ``StackStore``.
 
 Workflow is described in the figure below.
 <img src="../docs_old/figures/data_flow.png">
 
 ## Applications
 ### Monitoring
-NoisePy includes various functions to measure dv/v. Please check the tutorials. The software will read the ``CCstore`` to aggregate and measure dv/v. The outputs are tabular data in CSV.
-Link HERE.
+NoisePy includes various functions to measure dv/v. Please check the tutorials. The software will read the ``CrossCorrelationDataStore`` to aggregate and measure dv/v. The outputs are tabular data in CSV.
+
 ### Imaging
 NoisePy includes functions to measure phase and group velocity dispersion curve measurements. The software will read the ``StackStore`` and ouput curves as tabular data in CSV.
 

@@ -504,7 +504,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.10.13"
+      "version": "3.10.12"
     }
   },
   "nbformat": 4,

@@ -0,0 +1,251 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Download S3-hosted Noisepy Data\n",
+    "\n",
+    "This notebook is designed to query cross-corelations data calculated by noisepy, hosted on S3, and downloaded locally.\n",
+    "\n",
+    "This notebook assumes that you have installed the noisepy package. It installs Python tools for MongoDB, queries our SCOPED data base, and parse the S3-hosted data into the ASDF H5 data format."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Requirement already satisfied: pymongo in /Users/marinedenolle/miniconda3/envs/noisepy/lib/python3.10/site-packages (4.6.1)\n",
+      "Requirement already satisfied: dnspython<3.0.0,>=1.16.0 in /Users/marinedenolle/miniconda3/envs/noisepy/lib/python3.10/site-packages (from pymongo) (2.4.2)\n"
+     ]
+    }
+   ],
+   "source": [
+    "!pip install pymongo"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Pinged your deployment. You successfully connected to MongoDB!\n"
+     ]
+    }
+   ],
+   "source": [
+    "from pymongo.mongo_client import MongoClient\n",
+    "from pymongo.server_api import ServerApi\n",
+    "\n",
+    "uri = \"mongodb+srv://user:[email protected]/?retryWrites=true&w=majority\"\n",
+    "\n",
+    "# Create a new client and connect to the server\n",
+    "client = MongoClient(uri, server_api=ServerApi('1'))\n",
+    "\n",
+    "# Send a ping to confirm a successful connection\n",
+    "try:\n",
+    "    client.admin.command('ping')\n",
+    "    print(\"Pinged your deployment. You successfully connected to MongoDB!\")\n",
+    "except Exception as e:\n",
+    "    print(e)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Station Pair Collection:\n",
+      "{'_id': ObjectId('656f38bbd5ca665876e72eb6'), 'station0': 'ABL', 'station1': 'ABL', 'starttime': '2022-01-01T00:00:00+00:00', 'endtime': '2023-01-01T00:00:00+00:00', 'storage_mode': 'url', 'url': 's3://scoped-noise/scedc_CI_2022_stack/CI.ABL/CI.ABL/2022_01_01_00_00_00T2023_01_01_00_00_00.tar.gz'}\n",
+      "{'_id': ObjectId('656f38bbd5ca665876e72eb7'), 'station0': 'ABL', 'station1': 'ACP', 'starttime': '2022-01-01T00:00:00+00:00', 'endtime': '2023-01-01T00:00:00+00:00', 'storage_mode': 'url', 'url': 's3://scoped-noise/scedc_CI_2022_stack/CI.ABL/CI.ACP/2022_01_01_00_00_00T2023_01_01_00_00_00.tar.gz'}\n",
+      "{'_id': ObjectId('656f38bbd5ca665876e72eb8'), 'station0': 'ABL', 'station1': 'ADO', 'starttime': '2022-01-01T00:00:00+00:00', 'endtime': '2023-01-01T00:00:00+00:00', 'storage_mode': 'url', 'url': 's3://scoped-noise/scedc_CI_2022_stack/CI.ABL/CI.ADO/2022_01_01_00_00_00T2023_01_01_00_00_00.tar.gz'}\n",
+      "{'_id': ObjectId('656f38bbd5ca665876e72eb9'), 'station0': 'ABL', 'station1': 'AGM', 'starttime': '2022-01-01T00:00:00+00:00', 'endtime': '2023-01-01T00:00:00+00:00', 'storage_mode': 'url', 'url': 's3://scoped-noise/scedc_CI_2022_stack/CI.ABL/CI.AGM/2022_01_01_00_00_00T2023_01_01_00_00_00.tar.gz'}\n",
+      "{'_id': ObjectId('656f38bbd5ca665876e72eba'), 'station0': 'ABL', 'station1': 'ALP', 'starttime': '2022-01-01T00:00:00+00:00', 'endtime': '2023-01-01T00:00:00+00:00', 'storage_mode': 'url', 'url': 's3://scoped-noise/scedc_CI_2022_stack/CI.ABL/CI.ALP/2022_01_01_00_00_00T2023_01_01_00_00_00.tar.gz'}\n",
+      "{'_id': ObjectId('656f38bbd5ca665876e72ebb'), 'station0': 'ABL', 'station1': 'ARV', 'starttime': '2022-01-01T00:00:00+00:00', 'endtime': '2023-01-01T00:00:00+00:00', 'storage_mode': 'url', 'url': 's3://scoped-noise/scedc_CI_2022_stack/CI.ABL/CI.ARV/2022_01_01_00_00_00T2023_01_01_00_00_00.tar.gz'}\n",
+      "{'_id': ObjectId('656f38bbd5ca665876e72ebc'), 'station0': 'ABL', 'station1': 'AVM', 'starttime': '2022-01-01T00:00:00+00:00', 'endtime': '2023-01-01T00:00:00+00:00', 'storage_mode': 'url', 'url': 's3://scoped-noise/scedc_CI_2022_stack/CI.ABL/CI.AVM/2022_01_01_00_00_00T2023_01_01_00_00_00.tar.gz'}\n",
+      "{'_id': ObjectId('656f38bbd5ca665876e72ebd'), 'station0': 'ABL', 'station1': 'BAI', 'starttime': '2022-01-01T00:00:00+00:00', 'endtime': '2023-01-01T00:00:00+00:00', 'storage_mode': 'url', 'url': 's3://scoped-noise/scedc_CI_2022_stack/CI.ABL/CI.BAI/2022_01_01_00_00_00T2023_01_01_00_00_00.tar.gz'}\n",
+      "{'_id': ObjectId('656f38bbd5ca665876e72ebe'), 'station0': 'ABL', 'station1': 'BAK', 'starttime': '2022-01-01T00:00:00+00:00', 'endtime': '2023-01-01T00:00:00+00:00', 'storage_mode': 'url', 'url': 's3://scoped-noise/scedc_CI_2022_stack/CI.ABL/CI.BAK/2022_01_01_00_00_00T2023_01_01_00_00_00.tar.gz'}\n",
+      "{'_id': ObjectId('656f38bbd5ca665876e72ebf'), 'station0': 'ABL', 'station1': 'BAR', 'starttime': '2022-01-01T00:00:00+00:00', 'endtime': '2023-01-01T00:00:00+00:00', 'storage_mode': 'url', 'url': 's3://scoped-noise/scedc_CI_2022_stack/CI.ABL/CI.BAR/2022_01_01_00_00_00T2023_01_01_00_00_00.tar.gz'}\n",
+      "\n",
+      "Station Collection:\n",
+      "{'_id': ObjectId('656eac931fec6658167a34fd'), 'name': 'ABL', 'elevation': -1.7976931348623157e+308, 'lat': -1.7976931348623157e+308, 'location': '', 'lon': -1.7976931348623157e+308, 'network': 'CI'}\n",
+      "ABL\n",
+      "{'_id': ObjectId('656eac931fec6658167a3546'), 'name': 'ACP', 'elevation': -1.7976931348623157e+308, 'lat': -1.7976931348623157e+308, 'location': '', 'lon': -1.7976931348623157e+308, 'network': 'CI'}\n",
+      "ACP\n",
+      "{'_id': ObjectId('656eac931fec6658167a355d'), 'name': 'ADO', 'elevation': -1.7976931348623157e+308, 'lat': -1.7976931348623157e+308, 'location': '', 'lon': -1.7976931348623157e+308, 'network': 'CI'}\n",
+      "ADO\n",
+      "{'_id': ObjectId('656eac931fec6658167a3575'), 'name': 'AGM', 'elevation': -1.7976931348623157e+308, 'lat': -1.7976931348623157e+308, 'location': '', 'lon': -1.7976931348623157e+308, 'network': 'CI'}\n",
+      "AGM\n",
+      "{'_id': ObjectId('656eac931fec6658167a3597'), 'name': 'ALP', 'elevation': -1.7976931348623157e+308, 'lat': -1.7976931348623157e+308, 'location': '', 'lon': -1.7976931348623157e+308, 'network': 'CI'}\n",
+      "ALP\n",
+      "{'_id': ObjectId('656eac931fec6658167a35f4'), 'name': 'ARV', 'elevation': -1.7976931348623157e+308, 'lat': -1.7976931348623157e+308, 'location': '', 'lon': -1.7976931348623157e+308, 'network': 'CI'}\n",
+      "ARV\n",
+      "{'_id': ObjectId('656eac931fec6658167a3650'), 'name': 'AVM', 'elevation': -1.7976931348623157e+308, 'lat': -1.7976931348623157e+308, 'location': '', 'lon': -1.7976931348623157e+308, 'network': 'CI'}\n",
+      "AVM\n",
+      "{'_id': ObjectId('656eac931fec6658167a369b'), 'name': 'BAI', 'elevation': -1.7976931348623157e+308, 'lat': -1.7976931348623157e+308, 'location': '', 'lon': -1.7976931348623157e+308, 'network': 'CI'}\n",
+      "BAI\n",
+      "{'_id': ObjectId('656eac931fec6658167a3702'), 'name': 'BAK', 'elevation': -1.7976931348623157e+308, 'lat': -1.7976931348623157e+308, 'location': '', 'lon': -1.7976931348623157e+308, 'network': 'CI'}\n",
+      "BAK\n",
+      "{'_id': ObjectId('656eac931fec6658167a374c'), 'name': 'BAR', 'elevation': -1.7976931348623157e+308, 'lat': -1.7976931348623157e+308, 'location': '', 'lon': -1.7976931348623157e+308, 'network': 'CI'}\n",
+      "BAR\n"
+     ]
+    }
+   ],
+   "source": [
+    "db = client.scoped_noise\n",
+    "station_pair_collection = db.station_pair\n",
+    "station_collection = db.station\n",
+    "\n",
+    "# Query the first 10 records in the station_pair collection\n",
+    "station_pair_records = station_pair_collection.find().limit(10)\n",
+    "\n",
+    "print(\"Station Pair Collection:\")\n",
+    "for record in station_pair_records:\n",
+    "    print(record)\n",
+    "\n",
+    "# Query the first 10 records in the station collection\n",
+    "station_records = station_collection.find().limit(10)\n",
+    "\n",
+    "print(\"\\nStation Collection:\")\n",
+    "for record in station_records:\n",
+    "    print(record)\n",
+    "    sta_source = record[\"name\"]\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "2023-12-06 14:30:45,060 4717313536 INFO numpystore.__init__(): store creating at s3://scoped-noise/scedc_CI_2022_stack/, mode=a, storage_options={'s3': {'anon': False}}\n",
+      "2023-12-06 14:30:45,061 4717313536 INFO numpystore.__init__(): Numpy store created at s3://scoped-noise/scedc_CI_2022_stack/\n",
+      "2023-12-06 14:30:52,062 4717313536 INFO hierarchicalstores._load_src(): Loading directory cache for CI.ABL - ix: 0\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Time to get station pairs: 6.9995436668396 seconds\n",
+      "Time to get timespans: 0.1572427749633789 seconds\n",
+      "Timespan: 2022-01-01T00:00:00+0000 - 2023-01-01T00:00:00+0000\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      ". Memory:   388 MB: 100%|██████████| 10/10 [00:00<00:00, 31.01it/s]\n",
+      "2023-12-06 14:30:52,556 4717313536 INFO utils.log_raw(): TIMING: 0.3359 secs. for loading 10 stacks\n"
+     ]
+    }
+   ],
+   "source": [
+    "import os\n",
+    "import noisepy\n",
+    "from noisepy.seis.asdfstore import ASDFStackStore\n",
+    "from noisepy.seis.numpystore import NumpyStackStore\n",
+    "import time as time\n",
+    "\n",
+    "stack_data_path = \"s3://scoped-noise/scedc_CI_2022_stack/\"\n",
+    "S3_STORAGE_OPTIONS = {\"s3\": {\"anon\": False}}\n",
+    "stack_store = NumpyStackStore(stack_data_path, storage_options=S3_STORAGE_OPTIONS)\n",
+    "\n",
+    "# Get list of station pairs (~47k pairs)\n",
+    "t0=time.time()\n",
+    "pairs = stack_store.get_station_pairs()\n",
+    "t1=time.time() \n",
+    "print(f\"Time to get station pairs: {t1-t0} seconds\")\n",
+    "# Get the first timespan available for the first pair\n",
+    "t2=time.time()\n",
+    "ts = stack_store.get_timespans(*pairs[0])[0]\n",
+    "t3=time.time()\n",
+    "print(f\"Time to get timespans: {t3-t2} seconds\")\n",
+    "print(f\"Timespan: {ts}\")\n",
+    "\n",
+    "# Read some stacks (10?) from S3/numpy\n",
+    "stacks_10 = stack_store.read_bulk(ts, pairs[0:10]) \n",
+    "\n",
+    "# write them to ASDF\n",
+    "output= \"./asdf_data\"\n",
+    "os.makedirs(output, exist_ok=True)\n",
+    "asdf_store = ASDFStackStore(output)\n",
+    "for ((src,rec), stacks) in stacks_10:\n",
+    "    asdf_store.append(ts, src, rec, stacks)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "ASDF file [format version: 1.0.3]: 'asdf_data/CI.ABL/CI.ABL_CI.ABL.h5' (239.7 KB)\n",
+      "\tContains 0 event(s)\n",
+      "\tContains waveform data from 0 station(s).\n",
+      "\tContains 1 type(s) of auxiliary data: Allstack_linear\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "(8001,)"
+      ]
+     },
+     "execution_count": 18,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import pyasdf\n",
+    "\n",
+    "\n",
+    "df = pyasdf.ASDFDataSet(\"./asdf_data/CI.ABL/CI.ABL_CI.ABL.h5\", mode=\"r\")\n",
+    "print(df)\n",
+    "df.auxiliary_data.Allstack_linear.ZZ.data.shape"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "noisepy",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}