Skip to content
This repository has been archived by the owner on Jul 11, 2023. It is now read-only.

Review tool for large "projects" #1019

Open
baagaard-usgs opened this issue Oct 5, 2022 · 5 comments
Open

Review tool for large "projects" #1019

baagaard-usgs opened this issue Oct 5, 2022 · 5 comments
Labels
feature Major new feature

Comments

@baagaard-usgs
Copy link
Collaborator

baagaard-usgs commented Oct 5, 2022

Difficulty of per-earthquake reports for large "projects"

For gmprocess "projects" with hundreds of earthquakes, it is difficult to quickly review the data using the report and station maps that are one file per earthquake. For example, viewing all of the waveforms results in opening hundreds of PDF files. Likewise, viewing all of the station maps results in opening hundreds of HTML files. Additionally, we often want to view only a subset of the information.

Desired Solution

Use cases

  • Review all records with status==passed to check for records that we want to fail
  • Review all records with status==failed to check for records we want to pass
  • Review records for problematic earthquakes (multiple earthquakes, large fraction of failed records)
  • Review records for problematic stations (large fraction of failed records)

Earthquake-centric view

  • Main table of earthquakes (one row per earthquake)
    • Each row shows earthquake id, magnitude, date, total number of records, number of records that passed, number of records that failed
    • User selection of earthquakes (table rows) (single, multiple, or range)
  • Option to view secondary table with one row per record for earthquakes selected from main table
    • Each row shows network code, station code, channel code (e.g., HH,HN), epicentral distance, pass/fail status
    • User selection of records (table rows) (single, multiple, or range)
    • View waveforms for selected records (table rows)
    • View map of selected earthquakes and stations (be able to click on station to view waveforms)
  • Table rows can be sorted based on any column

Station-centric view

  • Main table of stations (one row per station)
    • Each row shows network code, station code, channel code (e.g., HN,HH), total number of records, number of records that passed, number of records that failed
    • User selection of stations (table rows) (single, multiple, or range)
  • Option to view secondary table with one row per record for stations selected from main table
    • Each row shows earthquake id, earthquake magnitude, date, epicentral distance, pass/fail status
    • User selection of records (table rows) (single, multiple, or range)
    • View waveforms for selected records (table rows)
    • View map of selected earthquakes and stations (be able to click on station to view waveforms)
  • Table rows can be sorted based on any column

Waveform panel

  • View single station record at a time with ability to go advance to next/previous selected station record
  • Current station record being viewed is highlighted in corresponding row in table and symbol on map

Map panel

  • Shows current selected earthquakes and stations
  • Color station symbols by pass/fail status and whether record is currently selected
  • Color earthquake symbols by whether record is currently selected

Implementation Ideas

  • Could be a Python or Web-based application (JavaScript based application may be easiest to maintain in the long-term given USGS Earthquake Hazards Program expertise)
  • Pieces of this tool already exist
  • Can start with subset of features and interaction and incrementally add more
  • Waveform plots could be images (like the PDF reports) stored on the filesystem
  • Panels (tables, map, waveforms) could become pop-out windows (click on button to display panel as a separate window)
  • Should probably have database rather than HDF5 workspace files back-end
@baagaard-usgs
Copy link
Collaborator Author

@emthompson-usgs @gferragu Having just spent a couple days manually reviewing 8000+ records for Turkey, I would like to push up the priority for a GUI review tool. I think we should break the development into stages to speed up initial delivery.

Stages

  1. Simple review GUI tool that collects data from the HDF5 workspace files. Display images of waveforms rather than interactive plotting. Allow display of only "passed" stations.
  2. Add more functionality in terms of viewing record information (view/hide provenance and processing information), show station(s) on a map, and being able to manually mark "passed" records as "failed" and save list of manually marked stations to a file.
  3. Ability to generate interactive plot (static image would be default for faster loading) of station waveforms.
  4. Replace HDF5 backend with a database.

GUI framework

I came across pynecone as a Python framework for generating web applications. It is built upon the ReactJS framework. This might be more sustainable than a JavaScript application, because more of us know Python.

@emthompson-usgs
Copy link
Member

@baagaard-usgs I fully support this idea. To try to get this done as efficiently as possible, I think we need to try to develop a plan to divvy up tasks.

@gferragu
Copy link
Collaborator

@baagaard-usgs @emthompson-usgs I've been looking through Pynecone for a bit and it sounds awesome. If it facilitated more involvement in development it could be very useful. There is a good bit of Python dev needed already in the current gm-app though, for what it's worth. If there's a desire I can articulate that via Gitlab more. I'm a little concerned about how new it is though (alpha in Dec 2022) and how we distribute/host it. I don't know enough about how the Pynecone distribution works (their servers? USGS servers?), but having a tried and tested framework that can be packaged as a cross-platform app without hosting is a point for Electron, in my opinion. Not muddling through JS as much is a big point for Pynecone though, as I am not a JS expert. I have some notes I'll post in Teams, along with a link to the current (albeit fledgling) status of the Electron app code

@baagaard-usgs
Copy link
Collaborator Author

@gferragu Notes on the current status and Electron would be very helpful. I, too, am concerned about the lifespan onfpynecone. Additionally, the examples all look really simple and doing what we want may require something more sophisticated.

@gferragu
Copy link
Collaborator

@baagaard-usgs yep definitely no-frills, but sussing out a baseline for interaction with gmprocess via your scripting updates, I/O for HDF5 workspace files, simple SQLite database interaction, Flask-Electron communication, rendering processes, etc.

The goal, I think, was to eventually incorporate (1) a fleshed out gm-db database schema based on Mike's motionfetcher repo that we could really lean into rather than hdf5 workspaces and (2) Embed the more advanced and Python-based plotting of Hadi's Dash/Plotly code to make a more cohesive review tool

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature Major new feature
Projects
None yet
Development

No branches or pull requests

3 participants