Increasing CUAHSI WDC-related AOI search limit beyond 1500 km2 #2756

emiliom · 2018-04-03T16:18:49Z

I've investigated CUAHSI WDC search API performance issues, following up on our last Monitor MW call. Briefly, I:

Reviewed existing, relevant catalog API's.
Performed a series of API performance tests with 1° x 1° AOI boxes (8,500 - 10,700 km2) across the country, and compiled the results.
Contacted CUAHSI (Tony Castronova and Martin Seul) to ask about their use of SOLR and what catalog API's were new (relative to our development efforts last Summer) and recommended.
Reviewed previous Azavea work and findings on this topic last year, during BiG-CZ portal development efforts

Summary of my findings and recommendations

The only new relevant API is GetSeriesMetadataCountOrData. Its response is consistently slower than the one we currently use, GetSeriesCatalogForBox2. The only near-term potential use I can foresee is its capability to return only a count of the number of series records it will return; that response is extremely fast, and may be used to guide further client actions (including self pagination).
All catalog API's are leveraging SOLR
No existing catalog API is paginated.
We should stick with using the current catalog API, GetSeriesCatalogForBox2
Azavea tests in Oct 2017 concluded that an 8,000 km2 was unworkable. The various tests involved issues of time outs, problems with the Python SOAP library suds, and an internal application code involving caching where a problem was occurring (it looks like that caching is no longer done)
I did not encounter any actual failures either on the CUAHSI WDC server end or my client end.
I see no current reason why the AOI could not be safely enlarged to at least 3,000 km2, possibly 5,000 km2. Results will be slower, and we'll need to decide what's acceptable. I believe 3,000 km2 will not cause any unacceptable slowdowns to users.
CUAHSI is open to providing more performant API's, including direct access to SOLR requests. This will undoubtedly increase server and client performance. But it's unclear how long it'll take CUAHSI to do this, and whether Azavea has time/funds to make corresponding changes.
There are strategies we can explore on the client side, and suggestions to CUAHSI for providing resources that could enable smarter client searching. But all would require development time.

Detailed information and discussion of API tests and related previous Azavea assessments

My API performance tests and comparisons are summarized in the table below. The Jupyter notebook I used for this assessment, CUAHSI_HISCentral_AOI_service_tests.ipynb, can be accessed here. See the descriptions at the top of the notebook. This notebook was run once for each AOI listed in the table. (The specific results shown in the notebook snapshot (for the "1° N of the above PA/DRB point" AOI) differ from the ones listed in the table, because the data are dynamic and factors such as CUAHSI server loads and network latency are not constant. The results in the notebook were run today, Monday April 2 at 3:40pm PT, while results in the table above were run on Saturday March 24 (weekend server loads are probably lighter).). Each result is for a search based on a 1° x 1° square box ("square" in lat-lon coordinates) centered at the center point listed. Search requests were issued with suds-jurko. The last 3 columns show response times (including suds processing time) for 3 API's:
- GSCFB2 = GetSeriesCatalogForBox2 (currently used in the MMW portal)
- GSCFB3 = GetSeriesCatalogForBox3
- GSMCD = GetSeriesMetadataCountOrData (the newer API we're investigating)

Location	latlon center	AOI (km2)	series count	non-grid series count	GSCFB2	GSCFB3	GSMCD
Texas, south of Austin	30.0, -97.5	10,707	5,288	4,488	20.5 s	53.0 s	36.9 s
Just N of the Schuykil river near Philly	40.1, -75.5	9,457	23,001	22,205	86.0 s	181.0 s	178.0 s
1° N of the above PA/DRB point	41.1, -75.5	9,317	16,744	15,944	60.0 s	110.0 s	128.0 s
Central Iowa	42.0, -93.0	9,188	1,618	818	6.77 s	12.4 s	11.2 s
Halfway between Olympia, WA and Portland, OR	46.5, -123.0	8,511	9,226	8,426	44.7 s	73.0 s	69.0 s

The API currently used in the portal, GetSeriesCatalogForBox2, clearly yields the fastest response times. I believe this is due simply to the fact that it handles a slimmer set of meatadata attributes, compared to the other two API's.
I did not encounter any strict failures on either the WDC server end or the client (my laptop) side. Requests that returned more records (as many as 23K, close to the 25K limit at least for GetSeriesMetadataCountOrData) were simply slower, but never actually failed. Client Python SOAP processing and deserialization with "suds" never failed either, unlike what Azavea reported in Oct 2017. The only possible reason I can think for the failures reported by Azavea is if they were using the original, very old and unmantained suds package rather than its fork and more current replacement, suds-jurko, which I used.
Notes:
- The old BiG-CZ portal MMW issue Increase AoI limits on BiG CZ Analyze #2388 is still open. I suggest we close it in favor of a new discussion here.
- For a bit more background, see my current issue elsewhere, on the BiG-CZ Portal repo.

cc @aufdenkampe

The text was updated successfully, but these errors were encountered:

emiliom · 2018-04-04T05:48:34Z

Don Setiawan (UW) has deployed the Wikiwatershed/MMW App locally on his laptop, for development and testing.

We figured out where the CUAHSI WDC AOI limit of 1,500 km2 was set, and changed it to 5,000 km2. We then ran a test search on a squarish polygon search area that's 4,093 km2 (the actual area of the enclosing rectangular AOI issued to the CUAHSI WDC catalog API would most likely be larger), and was able to get a response (4,954 records). See screenshot below.

This generally confirms my suggestion that increasing the AOI limit to 3,000 km2, if not larger, is most likely just fine; specially after ensuring suds-jurko was being used, as we did.

rajadain · 2018-04-18T19:30:28Z

Shares points with #2760

Monitor: Increase Area of Interest Limit with suds-jurko Connects #2756 Connects #2760

aufdenkampe · 2018-04-19T20:23:55Z

Is this on staging? I would love to test it!

rajadain · 2018-04-19T21:17:44Z

Staging deployments are currently failing due to a third party dependency failure, but I'll comment here as soon as it is ready.

rajadain · 2018-04-23T18:04:40Z

@aufdenkampe this is now on staging:

Sorry for the delay.

emiliom · 2018-04-23T19:02:46Z

Thanks @rajadain. I tried it out, using the Brandywine-Christina HUC 8 (1,960 km2), and CUAHSI WDC search stops with an error icon (FYI, the CINERGI search also yields the error icon).

For our reference, what's the new AOI size being used on staging? Based on your exchanges at #2784 (comment), it seems like it's 5,000 km2, but I'm not totally sure.

@lsetiawan and I are able to run searches larger than the one I'm reporting on here, in terms of AOI HUC 8 polygon size, on his laptop deployment; I can't imagine that his laptop has more resources than your staging cloud environment. Anyway, we'll try to run this specific HUC 8 search later today, and report back.

lsetiawan · 2018-04-23T20:31:17Z

(Emilio here, masquerading as Don) We've run a similar AOI test on Don's laptop with his app deployment. HUC polygon selection is not enabled in his deployment, so we created an AOI using free-draw that roughly matched the Brandywine-Christina HUC 8, but was a bit larger (2,235 vs 1,960 km2). The WDC search took around a minute, but completed successfully, returning just short of 5,000 records. See screenshot.

So, we don't know why it fails on the staging app.

rajadain · 2018-04-25T17:37:13Z

@emiliom our staging environments are running on smaller machines than production in an effort to keep hosting costs down. We just upped the staging VM from a t2.micro with 1GB RAM to a t2.small which has 2GB, and I can now see results for Brandywine-Christina.

We're also expanding our development VMs in #2803 to allow larger areas of interest to be run while in development. I had to increase my app VM's allocation from 1GB to 2GB RAM to get this shape to work. Did you and Don have to do the same?

emiliom · 2018-04-25T18:01:39Z

@rajadain thanks for the info and update. I do realize that the staging VM was bound to be less capable than the production VM, for the reasons you state. Out of curiosity (and for comparison), how much RAM is allocated to the production VM?

The deployment we're using is just Don's mid-range newish laptop, as is. Obviously we have no intent to recreate or approximate a hardware environment that looks like the cloud-based staging or production allocations used for the Wikiwatershed app. Our goal is focused on the development work for adding the new Water Quality Portal catalog; the CUAHSI WDC tests have just been a side benefit and opportunity.

emiliom added the Monitor label Apr 3, 2018

emiliom mentioned this issue Apr 3, 2018

Replace Python suds dependency with suds-jurko #2760

Closed

aufdenkampe added high-priority and removed high-priority labels Apr 17, 2018

arottersman added the in progress label Apr 18, 2018

rajadain self-assigned this Apr 18, 2018

rajadain mentioned this issue Apr 18, 2018

Monitor: Increase Area of Interest Limit with suds-jurko #2784

Merged

rajadain added in review and removed in progress labels Apr 18, 2018

rajadain added a commit that referenced this issue Apr 19, 2018

Merge pull request #2784 from WikiWatershed/tt/monitor-suds-jurko

3911bd8

Monitor: Increase Area of Interest Limit with suds-jurko Connects #2756 Connects #2760

arottersman added tested/verified and removed in review labels Apr 19, 2018

caseycesari closed this as completed Apr 19, 2018

caseycesari removed the tested/verified label Apr 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increasing CUAHSI WDC-related AOI search limit beyond 1500 km2 #2756

Increasing CUAHSI WDC-related AOI search limit beyond 1500 km2 #2756

emiliom commented Apr 3, 2018 •

edited

Loading

emiliom commented Apr 4, 2018

rajadain commented Apr 18, 2018

aufdenkampe commented Apr 19, 2018

rajadain commented Apr 19, 2018

rajadain commented Apr 23, 2018

emiliom commented Apr 23, 2018

lsetiawan commented Apr 23, 2018

rajadain commented Apr 25, 2018

emiliom commented Apr 25, 2018

Increasing CUAHSI WDC-related AOI search limit beyond 1500 km2 #2756

Increasing CUAHSI WDC-related AOI search limit beyond 1500 km2 #2756

Comments

emiliom commented Apr 3, 2018 • edited Loading

I've investigated CUAHSI WDC search API performance issues, following up on our last Monitor MW call. Briefly, I:

Summary of my findings and recommendations

Detailed information and discussion of API tests and related previous Azavea assessments

emiliom commented Apr 4, 2018

rajadain commented Apr 18, 2018

aufdenkampe commented Apr 19, 2018

rajadain commented Apr 19, 2018

rajadain commented Apr 23, 2018

emiliom commented Apr 23, 2018

lsetiawan commented Apr 23, 2018

rajadain commented Apr 25, 2018

emiliom commented Apr 25, 2018

emiliom commented Apr 3, 2018 •

edited

Loading