TMIFVP

TMI for Fun Vertical Profiles

or

Template Matching to Identify Features in Vertical Profiles

Collaborators

Anna-Lena Deppenmeier: NCAR

Daniel Wang: Virginia Institute of Marine Science, William & Mary

Karina Ramos Musalem: University of British Columbia

Kathy Gunn: University of Miami

Paige Lavin: University of Washington

Regina Lionheart: University of Washington

The Problem

Interested in finding a specific feature in your depth profiles? Want to know where those features are clustered? Then TMIVP, built upon the ideas of the visionary non-conformist demigod Procrustes, is for you.

Between unpredictable mixed layer temperature profiles, various clines, and nutrient spikes, understanding and searching through CTD profiles (along with other types of profiles) are a persistant issue in environmental science. TMIVP aims to match and quantify the difference between a template profile and user-submitted data, clearly highlighting features for quality matching scores and location of problematic or interesting attributes. More broadly, this pipeline could be used to analyze and compare many types of graphs and profiles, including but not limited to:

Michaelis-Menten population curves to characterize population growth
Metabolomic and proteomic spectrographic peaks to understand analytical drift, matrix effects and other obscuring variations
Environmental time series data to identify outliers

Below, our cut-to-fit-the-bed template matching pipeline.

The Solution

Our group used the principles of cross-correlation to vertically search profiles and match expected oceanographic profile feature templates to real user data.

TMIFVP currently supplies three typical templates for matching.

Steps to Achieve Profile Nirvana

1. User provides two 1-dimensional arrays to the interface, one of which is temperature (for the purpose of this example; multiple variables are possible) and the other depth. Those two arrays are returned as a table and visualized profiles.

Example Table and Graph

Depth = y	Temperature = x
5	29.66
23	27.62
32	24.81
50	22.81
59	21.58
68	21.15
77	20.38

2. TMIFVP provides a "template" shape that has typical numbers and dimensions for an oceanographic profile. This will act as a pattern matching feature for real data.

The three templates currently used for matching are 1. Exponential, 2. Gaussian, and 3. Step profile.
Templates should be scaled towards the smaller end of the spectrum: the cross-correlation won't work if the template is larger than the profile. Additionally, if the template is too large it will smooth the entire profile. Instead of selecting small similar features, it may grab spaces between features that have a similar shape.
There are several template options in 2d array format. Below is an example of a possible template. Choose the template that you would expect to find in your data depending on location, season, etc.

This would be used to match real user data, such as the example below.

3. TMIFVP uses the SciPy cross-correlation function to match the two arrays: pattern & profile.

The cross-correlation function will find the location of a possible matching pattern. Maximum correlation occurs when they are the same size.
Profile is segmented depending on where the maximum correlation occurs.

4. Optimize template to match the chosen segment.

Within each segment:

In case a step template: Use Procrustes analysis to align template to the profile chunk. In statistics, Procrustes analysis is a form of statistical shape analysis used to analyse the distribution of a set of shapes. The function handles the scaling of the template matching for you. The name Procrustes (Greek: Προκρούστης) refers to a bandit from Greek mythology who made his victims fit his bed either by stretching their limbs or cutting them off (source: Wikipedia).

In case of Gaussian or exponential templates, use a normal fit.

For all templates, calculate the similarity using the root means square error, or RMSE. RMSE is a frequently used measure of the differences between values (sample or population values) predicted by a model or an estimator and the values observed. The RMSD represents the square root of the second sample moment of the differences between predicted values and observed values or the quadratic mean of these differences (source: Wikipedia).

TBD

5. Find best template-chunk pair for each profile, ie, the highest similarity score.

This produces a quality score from 0 - 1, resulting from matching the template and the chunk of the profile. TBD

6. Produce figures and output from TMIFVP.

Histogram of quality distribution

>

Seaborn jointplot of physical location and quality score relationship.

Map of profile locations, shaded by quality. The darker the shade, the higher the quality score.

Background Reading

Useful links:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.correlate.html

A short explanation of cross correlation and convolution (slides)

https://users.cs.northwestern.edu/~pardo/courses/eecs352/lectures/MPM16-topic8-Convolution.pdf

A slightly graphic description of the inspiration for the Procrustes analysis.

https://en.wikipedia.org/wiki/Procrustes

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
data		data
old_work		old_work
.gitignore		.gitignore
Fake_data.ipynb		Fake_data.ipynb
MASTER-Exponential.ipynb		MASTER-Exponential.ipynb
NorthAtlanticCTDData_A05_AR01_1998.mat		NorthAtlanticCTDData_A05_AR01_1998.mat
README.md		README.md
ReadInNAtlCTDFile.py		ReadInNAtlCTDFile.py
SADCP_TAO5b_uv_5_140W_gridded.cdf		SADCP_TAO5b_uv_5_140W_gridded.cdf
data_glider_ctd_casts.mat		data_glider_ctd_casts.mat
g_fitting.py		g_fitting.py
kathy_figures.ipynb		kathy_figures.ipynb
kg_final_figures (1).ipynb		kg_final_figures (1).ipynb
read_ADCP.py		read_ADCP.py
read_glider_ctd_data.ipynb		read_glider_ctd_data.ipynb
scr.py		scr.py
signal_correlation_calculation.ipynb		signal_correlation_calculation.ipynb
template_matching.py		template_matching.py
test_glider_data.ipynb		test_glider_data.ipynb
tmivp_main.py		tmivp_main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TMIFVP

TMI for Fun Vertical Profiles

or

Template Matching to Identify Features in Vertical Profiles

Collaborators

The Problem

The Solution

Steps to Achieve Profile Nirvana

1. User provides two 1-dimensional arrays to the interface, one of which is temperature (for the purpose of this example; multiple variables are possible) and the other depth. Those two arrays are returned as a table and visualized profiles.

2. TMIFVP provides a "template" shape that has typical numbers and dimensions for an oceanographic profile. This will act as a pattern matching feature for real data.

3. TMIFVP uses the SciPy cross-correlation function to match the two arrays: pattern & profile.

4. Optimize template to match the chosen segment.

5. Find best template-chunk pair for each profile, ie, the highest similarity score.

6. Produce figures and output from TMIFVP.

Background Reading

About

Releases

Packages

Contributors 6

Languages

oceanhackweek/ohw19-projects-TMIVP

Folders and files

Latest commit

History

Repository files navigation

TMIFVP

TMI for Fun Vertical Profiles

or

Template Matching to Identify Features in Vertical Profiles

Collaborators

The Problem

The Solution

Steps to Achieve Profile Nirvana

1. User provides two 1-dimensional arrays to the interface, one of which is temperature (for the purpose of this example; multiple variables are possible) and the other depth. Those two arrays are returned as a table and visualized profiles.

2. TMIFVP provides a "template" shape that has typical numbers and dimensions for an oceanographic profile. This will act as a pattern matching feature for real data.

3. TMIFVP uses the SciPy cross-correlation function to match the two arrays: pattern & profile.

4. Optimize template to match the chosen segment.

5. Find best template-chunk pair for each profile, ie, the highest similarity score.

6. Produce figures and output from TMIFVP.

Background Reading

About

Resources

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages