Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increasing CUAHSI WDC-related AOI search limit beyond 1500 km2 #2756

Closed
emiliom opened this issue Apr 3, 2018 · 9 comments
Closed

Increasing CUAHSI WDC-related AOI search limit beyond 1500 km2 #2756

emiliom opened this issue Apr 3, 2018 · 9 comments

Comments

@emiliom
Copy link
Contributor

emiliom commented Apr 3, 2018

I've investigated CUAHSI WDC search API performance issues, following up on our last Monitor MW call. Briefly, I:

  • Reviewed existing, relevant catalog API's.
  • Performed a series of API performance tests with 1° x 1° AOI boxes (8,500 - 10,700 km2) across the country, and compiled the results.
  • Contacted CUAHSI (Tony Castronova and Martin Seul) to ask about their use of SOLR and what catalog API's were new (relative to our development efforts last Summer) and recommended.
  • Reviewed previous Azavea work and findings on this topic last year, during BiG-CZ portal development efforts

Summary of my findings and recommendations

  • The only new relevant API is GetSeriesMetadataCountOrData. Its response is consistently slower than the one we currently use, GetSeriesCatalogForBox2. The only near-term potential use I can foresee is its capability to return only a count of the number of series records it will return; that response is extremely fast, and may be used to guide further client actions (including self pagination).
  • All catalog API's are leveraging SOLR
  • No existing catalog API is paginated.
  • We should stick with using the current catalog API, GetSeriesCatalogForBox2
  • Azavea tests in Oct 2017 concluded that an 8,000 km2 was unworkable. The various tests involved issues of time outs, problems with the Python SOAP library suds, and an internal application code involving caching where a problem was occurring (it looks like that caching is no longer done)
  • I did not encounter any actual failures either on the CUAHSI WDC server end or my client end.
  • I see no current reason why the AOI could not be safely enlarged to at least 3,000 km2, possibly 5,000 km2. Results will be slower, and we'll need to decide what's acceptable. I believe 3,000 km2 will not cause any unacceptable slowdowns to users.
  • CUAHSI is open to providing more performant API's, including direct access to SOLR requests. This will undoubtedly increase server and client performance. But it's unclear how long it'll take CUAHSI to do this, and whether Azavea has time/funds to make corresponding changes.
  • There are strategies we can explore on the client side, and suggestions to CUAHSI for providing resources that could enable smarter client searching. But all would require development time.

Detailed information and discussion of API tests and related previous Azavea assessments

  1. My API performance tests and comparisons are summarized in the table below. The Jupyter notebook I used for this assessment, CUAHSI_HISCentral_AOI_service_tests.ipynb, can be accessed here. See the descriptions at the top of the notebook. This notebook was run once for each AOI listed in the table. (The specific results shown in the notebook snapshot (for the "1° N of the above PA/DRB point" AOI) differ from the ones listed in the table, because the data are dynamic and factors such as CUAHSI server loads and network latency are not constant. The results in the notebook were run today, Monday April 2 at 3:40pm PT, while results in the table above were run on Saturday March 24 (weekend server loads are probably lighter).). Each result is for a search based on a 1° x 1° square box ("square" in lat-lon coordinates) centered at the center point listed. Search requests were issued with suds-jurko. The last 3 columns show response times (including suds processing time) for 3 API's:
    • GSCFB2 = GetSeriesCatalogForBox2 (currently used in the MMW portal)
    • GSCFB3 = GetSeriesCatalogForBox3
    • GSMCD = GetSeriesMetadataCountOrData (the newer API we're investigating)
Location latlon center AOI (km2) series count non-grid series count GSCFB2 GSCFB3 GSMCD
Texas, south of Austin 30.0, -97.5 10,707 5,288 4,488 20.5 s 53.0 s 36.9 s
Just N of the Schuykil river near Philly 40.1, -75.5 9,457 23,001 22,205 86.0 s 181.0 s 178.0 s
1° N of the above PA/DRB point 41.1, -75.5 9,317 16,744 15,944 60.0 s 110.0 s 128.0 s
Central Iowa 42.0, -93.0 9,188 1,618 818 6.77 s 12.4 s 11.2 s
Halfway between Olympia, WA and Portland, OR 46.5, -123.0 8,511 9,226 8,426 44.7 s 73.0 s 69.0 s
  1. The API currently used in the portal, GetSeriesCatalogForBox2, clearly yields the fastest response times. I believe this is due simply to the fact that it handles a slimmer set of meatadata attributes, compared to the other two API's.

  2. I did not encounter any strict failures on either the WDC server end or the client (my laptop) side. Requests that returned more records (as many as 23K, close to the 25K limit at least for GetSeriesMetadataCountOrData) were simply slower, but never actually failed. Client Python SOAP processing and deserialization with "suds" never failed either, unlike what Azavea reported in Oct 2017. The only possible reason I can think for the failures reported by Azavea is if they were using the original, very old and unmantained suds package rather than its fork and more current replacement, suds-jurko, which I used.

  3. Notes:

cc @aufdenkampe

@emiliom
Copy link
Contributor Author

emiliom commented Apr 4, 2018

Don Setiawan (UW) has deployed the Wikiwatershed/MMW App locally on his laptop, for development and testing.

We figured out where the CUAHSI WDC AOI limit of 1,500 km2 was set, and changed it to 5,000 km2. We then ran a test search on a squarish polygon search area that's 4,093 km2 (the actual area of the enclosing rectangular AOI issued to the CUAHSI WDC catalog API would most likely be larger), and was able to get a response (4,954 records). See screenshot below.

This generally confirms my suggestion that increasing the AOI limit to 3,000 km2, if not larger, is most likely just fine; specially after ensuring suds-jurko was being used, as we did.

screenshot from 2018-04-03 15-16-13

@rajadain
Copy link
Member

Shares points with #2760

@rajadain rajadain self-assigned this Apr 18, 2018
rajadain added a commit that referenced this issue Apr 19, 2018
Monitor: Increase Area of Interest Limit with suds-jurko

Connects #2756
Connects #2760
@aufdenkampe
Copy link
Member

Is this on staging? I would love to test it!

@rajadain
Copy link
Member

Staging deployments are currently failing due to a third party dependency failure, but I'll comment here as soon as it is ready.

@rajadain
Copy link
Member

@aufdenkampe this is now on staging:

image

Sorry for the delay.

@emiliom
Copy link
Contributor Author

emiliom commented Apr 23, 2018

Thanks @rajadain. I tried it out, using the Brandywine-Christina HUC 8 (1,960 km2), and CUAHSI WDC search stops with an error icon (FYI, the CINERGI search also yields the error icon).

For our reference, what's the new AOI size being used on staging? Based on your exchanges at #2784 (comment), it seems like it's 5,000 km2, but I'm not totally sure.

@lsetiawan and I are able to run searches larger than the one I'm reporting on here, in terms of AOI HUC 8 polygon size, on his laptop deployment; I can't imagine that his laptop has more resources than your staging cloud environment. Anyway, we'll try to run this specific HUC 8 search later today, and report back.

@lsetiawan
Copy link
Contributor

(Emilio here, masquerading as Don) We've run a similar AOI test on Don's laptop with his app deployment. HUC polygon selection is not enabled in his deployment, so we created an AOI using free-draw that roughly matched the Brandywine-Christina HUC 8, but was a bit larger (2,235 vs 1,960 km2). The WDC search took around a minute, but completed successfully, returning just short of 5,000 records. See screenshot.

So, we don't know why it fails on the staging app.

screenshot from 2018-04-23 13-26-59

@rajadain
Copy link
Member

@emiliom our staging environments are running on smaller machines than production in an effort to keep hosting costs down. We just upped the staging VM from a t2.micro with 1GB RAM to a t2.small which has 2GB, and I can now see results for Brandywine-Christina.

We're also expanding our development VMs in #2803 to allow larger areas of interest to be run while in development. I had to increase my app VM's allocation from 1GB to 2GB RAM to get this shape to work. Did you and Don have to do the same?

@emiliom
Copy link
Contributor Author

emiliom commented Apr 25, 2018

@rajadain thanks for the info and update. I do realize that the staging VM was bound to be less capable than the production VM, for the reasons you state. Out of curiosity (and for comparison), how much RAM is allocated to the production VM?

The deployment we're using is just Don's mid-range newish laptop, as is. Obviously we have no intent to recreate or approximate a hardware environment that looks like the cloud-based staging or production allocations used for the Wikiwatershed app. Our goal is focused on the development work for adding the new Water Quality Portal catalog; the CUAHSI WDC tests have just been a side benefit and opportunity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants