Reverse Image Search

Summary

My goal is to do reverse image search on a target interference pattern to find other interference patterns similar to it.

Exploration

I found some interference such as the following from mnt_blpd7/datax/dl/GBT_57436_51432_HIP77257_fine.h5.

I chose a much smaller portion of the data as the target interference to improve performance.

Model

I tried using ResNet50 and a model developed by Peter Ma. In the end, ResNet50 with imagenet worked out better for the purpose of reverse image search.

model = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

Generating Input Data

The variable interval is the difference between the starting point (f_start) and the stopping point (f_stop) of each image. The variable step is the difference between the starting points (f_start) of each image. Both variables can be changed if needed. I set the interval to a small number (256 * 2.79e-6) to improve the efficiency of the program. The variable step can be set to a smaller number to get more accurate result.

I used skimage.transform.resize to resize each interval of frequency to the shape of (1, 224, 224, 3), which is the shape ResNet50 uses. The resulting data_list is a list of arrays with the shape. Each array is perceived as an image when passed in the feature extraction function. They are compared with each other to find the nearest neighbor among them.

start = 1530
stop = 1535
interval = 256 * 2.79e-6    # The difference between the starting point (`f_start`) and the stopping point (`f_stop`)
step = interval             # The difference between the starting points (`f_start`) of each interval
data_list = []
wf = blimpy.Waterfall(url, load_data=True, f_start=start, f_stop=stop)
for i in np.arange(start, stop, step):
    fstart, fstop = round(i, 3), i + interval
    _, sub_data = wf.grab_data(f_start=fstart, f_stop=fstop)
    resized_data = resize(sub_data, (1, 224, 224, 3))
    data_list.append(resized_data)

Data Preprocessing

I used logarithm on the data and then scaled the data to numbers between 0 and 1.

def preprocess_input(data):
    log_input = np.log(data)
    scale_input = (log_input - log_input.min()) / log_input.max()
    return scale_input

Feature Extraction

This function preprocesses the input array using the preprocess_input function above. Then it generates the features of the input array.

def extract_features(input_arr, model):
    input_shape = (224, 224, 3)
    preprocessed_arr = preprocess_input(input_arr)
    features = model.predict(preprocessed_arr, verbose = 0)
    flattened_features = features.flatten()
    normalized_features = flattened_features / norm(flattened_features)
    return normalized_features

Generating Features

This part of the code applies extract_features function to each array in the data_list generated and stores the features in the feature_list.

feature_list = []
for i in range(len(data_list)):
    data = data_list[i]
    feature_list.append(extract_features(data, model))

Finding the Nearest Neighbor

I imported NearestNeighbors from sklearn.neighbors to find the nearest neighbor using cosine similarity and Euclidean distance. In this case, both yielded the same result.

neighbors = NearestNeighbors(n_neighbors=5, algorithm='brute', metric='cosine').fit(feature_list)

or,

neighbors = NearestNeighbors(n_neighbors=5, algorithm='brute', metric='euclidean').fit(feature_list)

Finding the Nearest Neighbor of a Certain Interval

The following image is the pattern with a f_start of 1530 + 256 * 2.79e-6 * 3015 and an interval of 256 * 2.79e-6. This pattern would be the 3015th (zero-index) of the feature_list.

start = 1530 + 256 * 2.79e-6 * 3015
stop = start + 256 * 2.79e-6
wf.plot_waterfall(f_start=start, f_stop=stop)

distances, indices = neighbors.kneighbors([feature_list[3015]])

The indices are [3015, 3014, 5538, 3348, 3981]. They are ordered in ascending order of their distance from the 3015th pattern. The first one would be the target pattern itself.

# The 1st nearest neighbor except itself
start = 1530 + 256 * 2.79e-6 * 3014
stop = start + 256 * 2.79e-6
wf.plot_waterfall(f_start=start, f_stop=stop)

# The 2nd nearest neighbor except itself
start = 1530 + 256 * 2.79e-6 * 5538
stop = start + 256 * 2.79e-6
wf.plot_waterfall(f_start=start, f_stop=stop)

# The 3rd nearest neighbor except itself
start = 1530 + 256 * 2.79e-6 * 3348
stop = start + 256 * 2.79e-6
wf.plot_waterfall(f_start=start, f_stop=stop)

# The 4th nearest neighbor except itself
start = 1530 + 256 * 2.79e-6 * 3981
stop = start + 256 * 2.79e-6
wf.plot_waterfall(f_start=start, f_stop=stop)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
data_discovery_poster.pdf		data_discovery_poster.pdf
reverse_image_search_v2.ipynb		reverse_image_search_v2.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reverse Image Search

Summary

Exploration

Model

Generating Input Data

Data Preprocessing

Feature Extraction

Generating Features

Finding the Nearest Neighbor

Finding the Nearest Neighbor of a Certain Interval

About

Releases

Packages

Languages

CorrWu/SETI-reverse_image_search

Folders and files

Latest commit

History

Repository files navigation

Reverse Image Search

Summary

Exploration

Model

Generating Input Data

Data Preprocessing

Feature Extraction

Generating Features

Finding the Nearest Neighbor

Finding the Nearest Neighbor of a Certain Interval

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages