Patches for v6.5 release #748

abyrd · 2021-09-17T09:17:49Z

I am accumulating any patches here leading up to the September 2021 v6.5 release.

In addition to the items described in detail below, this PR contains various small but important fixes to:

normalize file extensions on DataSources in FileStorage
change some API endpoint paths and parameters to match UI usage
add range and null checking to some API parameters
not tag workers with projectId (workers can migrate between projects)
ensure all sidecar files of a shapefile are pulled from S3 before reading it

In-memory upload FileItems

I ran into a problem creating new opportunity datasets. On upload, a checkState that all uploaded items are readable files failed. This is because unlike Bundle and DataSource uploads, OpportunityDataset uploads were not using the method HttpUtils.getRequestFiles() which customizes the FileUploadItemFactory to ensure even tiny files (like Shapefile projection sidecars) are actually put in a disk file instead of memory. In the case of creating opportunity dataset grids from shapefiles, this wasn't hurting anything because the conversion process explicitly writes the shapefile FileItems out to a temporary directory. Rather than remove the checkState (which actually does check required conditions for other code paths handling uploaded files) I just changed all such code paths to use the same HttpUtils utility method.

Long term it would probably be better to remove the zero-threshold factory, allowing small FileItems (including non-file form fields) to remain in memory, then revise and use moveFilesIntoStorage to explicitly FileItem.write() to a temporary location or directly into FileStorage.

Confusion was caused here by the fact that contrary to its documentation, FileItem does not return null when getStoreLocation is called on an in-memory FileItem. It returns the name of a temporary file where it would save the data if the file were to exceed the size threshold.

LODES extract progress and DataGroups

We were extracting LODES data in a background task that shows up on /api/activity, but not reporting any progress, not putting the resulting layers into a datagroup, and not reporting the WorkProduct. Because we still have the "classic" LODES upload status reports, I considered just bypassing the background task activity reporting, but didn't want to move backward. So with this change we are now reporting both kinds of progress. The new one shows much better incremental progress, so I believe we could now just stop displaying the classic progress updates at the top left of the UI. Observations: I noticed seeing the two side by side that the new progress text transitions from green to blue upon completion, while in the classic progress the entire background flips from blue to green when it finishes. The new progress reporting is clearly better, but the "flip to green" effect of the old progress reporting is more satisfying, and I always feel more instantly aware that the task has finished. This is now reporting a work product of OPPORTUNITY_DATASET with group=true, which the UI does not know how to link to. Suggested approach for the release is to leave both progress mechanisms enabled, but temporarily stop setting the workProduct on the backend so the UI doesn't try to display a broken link.

Note also that OpportunityDatasets have a sourceId field which should probably be dataSourceId to match AggregationArea, but I'm not changing this now because it might require a database migration.

All parameters are required so a missing (null) header value will already prevent startup. Throwing an exception here fails immediately, preventing reporting of any other missing parameters.

Use standard HttpUtils.getRequestFiles in OpportunityDatasetController which causes all uploaded files to be disk files rather than in-memory byte buffers. Uploads were failing due to small Shapefile sidecars not being written to disk. Also added clearer messages to the checkState calls that were failing.

Add a dataGroupId to OpportunityDatasets. We were assigning an (unsaved) DataSource, but not putting them in DataGroups. Also stronger typing on parameters (pass DataGroup not String id)

also generalized one log message for local use

workers do not stick to a single projectId, only to a single bundleId

previously we were using the uppercase enum value names

on newly started backends, we were fetching only the shp and not others

This allows us to remove the old non-array fields in mongo records for older regional analysis results by copying their values into single-element arrays. We will still find older files by fallback.

Also return empty string instead of null in LocalFilesController and document why - when Spark framework sees a null response it thinks the path is not mapped to a controller.

Regional output array migration

… release-patches

src/main/java/com/conveyal/file/LocalFileStorage.java

fixes #750

some of these scenarios are huge and we're considering removing them from the database

…ations

These are usually built in bundleController and cached. When they were rebuilt by gtfsCache, the feedId was falling back on the filename (which was a bundle-scoped feedId rather than just the feedId) or even on the feedId declared by the feed itself, not known to be unique.

Add "Access-Control-Max-Age" header set to one day to all preflight requests. This allows browsers to cache the "Options" preflight request instead of sending it for each HTTP request.

Not generic nodes etc. which may lack lat/lon values

Add "Access-Control-Max-Age" header

Assertions elsewhere assume no travel time will exceed the request.maxTripDurationMinutes limit. This appears to only be an issue when cutoff times are lower than access mode-specifc travel time limits

This was observed in testing when running many single point and regional analyses. Perhaps a regional worker started up and was assigned the same IP address as a single point worker that had recently shut down from inactivity. The resulting exception must be caught so this IP will be unregistered for this workerCategory and a new one started/found.

returning partial regional analysis results was only half-implemented, which was causing us to mishandle requests on incomplete jobs.

Ideally we'd infer from the database record that this analysis does not have any gridded results and immediately return a 404. As a short-term fix, we check assertions only when an older-format file is actually found, and return a 404 to cover a wider range of cases: files that are missing because they are not even supposed to exist, and files that should exist but have gone missing for other unknown reasons.

rather than silently taking a legacy code path

Add method to check for spanning 180 degree meridian. Move methods into GeometryUtils for reuse. Check envelope of uploaded OSM and GTFS data. Check envelope of WebMercatorGridPointSet when created. Check envelope of StreetLayer when it's loaded. Text describing range-checked object is now substituted into messages.

Consistently throw more general exception type. Check that data actually contains points near the antimeridian.

these files should be valid except for the fact that they cross 180 deg

This reverts commit f399f90. This is a double-revert, re-establishing the original use of getPriority methods instead of priority fields. The hastily applied patch where we shifted to using the field instead of methods was based on a false belief that old MapDBs had this field. In fact the field was a temporary addition during development, and the original state was without the field.

We don't want to save them to the MapDB file anyway, and simply adding error instances to MapDB files causes (de)serializer configuration to be stored in the file. This makes it more fragile: any change to error class fields can break reopening MapDB files even after all instances of those errors have been cleared out. We can't readily stream the errors out to JSON without accumulating in memory, because we also want to summarize them for storage in Mongo. Conceivably we could build up the summary at the same time as streaming them out to JSON.

ansoncfit

Approving for merge to dev

abyrd and others added 18 commits September 17, 2021 14:02

remove redundant validation

0829723

All parameters are required so a missing (null) header value will already prevent startup. Throwing an exception here fails immediately, preventing reporting of any other missing parameters.

DataGroups and progress on seamless census import

c7a306c

Add a dataGroupId to OpportunityDatasets. We were assigning an (unsaved) DataSource, but not putting them in DataGroups. Also stronger typing on parameters (pass DataGroup not String id)

fix capitalization in /api/dataSource endpoint

333c363

also generalized one log message for local use

set regionId in analysisRequest

f14ad3c

expect regionId in request from UI

4c59ef8

remove projectId from worker tags

0dae909

workers do not stick to a single projectId, only to a single bundleId

use normalized lowercase file format extension

c3b472e

previously we were using the uppercase enum value names

prefetch shapefile sidecar files before conversion

117624b

on newly started backends, we were fetching only the shp and not others

try newest regional file name format and fall back

6adbc99

This allows us to remove the old non-array fields in mongo records for older regional analysis results by copying their values into single-element arrays. We will still find older files by fallback.

try-with-resources to close ShapefileReader

702ab3b

validate that nameProperty exists and is not GEOM

6063b28

use same API path as UI for aggregation grid URLs

48558a4

Also return empty string instead of null in LocalFilesController and document why - when Spark framework sees a null response it thinks the path is not mapped to a controller.

initialize variables immediately on declaration

6d74ad6

Merge pull request #749 from conveyal/regional-output-array-migration

eef85d5

Regional output array migration

mark files read-only on non-POSIX systems

dcdd7b9

Merge branch 'release-patches' of https://github.com/conveyal/r5 into…

69ac45a

… release-patches

Update UI version in Cypress tests

5d4bf9b

abyrd commented Sep 27, 2021

View reviewed changes

src/main/java/com/conveyal/file/LocalFileStorage.java Outdated Show resolved Hide resolved

abyrd and others added 11 commits September 28, 2021 14:10

use cross-platform setWritable(bool,bool)

edfd24e

replace magic number with symbolic quasi-constant

e2709d6

fixes #750

make utility methods static, elevate visibility

c41399e

allow fetching scenario from filestorage not mongo

e29c863

some of these scenarios are huge and we're considering removing them from the database

Use new DATASOURCES category in pickup and congestion polygon modific…

b9729a4

…ations

correct log messages when injecting feedId

54bbaf1

Add max age header

84bef0c

Add "Access-Control-Max-Age" header set to one day to all preflight requests. This allows browsers to cache the "Options" preflight request instead of sending it for each HTTP request.

fix: actually throw exception

31aa206

Filter GTFSController responses to only include stops

501e5b4

Not generic nodes etc. which may lack lat/lon values

Merge pull request #755 from conveyal/add-max-age-header

b9a1014

Add "Access-Control-Max-Age" header

Override access mode time limit with global limit

ab05f16

Assertions elsewhere assume no travel time will exceed the request.maxTripDurationMinutes limit. This appears to only be an issue when cutoff times are lower than access mode-specifc travel time limits

abyrd force-pushed the release-patches branch from e15bff0 to ab05f16 Compare October 25, 2021 08:23

abyrd and others added 17 commits October 25, 2021 17:09

Merge branch 'dev' into release-patches

407b1f7

add software product name to SoftwareVersion

8d95c8f

return 404 for results of incomplete analyses

0fa9905

returning partial regional analysis results was only half-implemented, which was causing us to mishandle requests on incomplete jobs.

Fix possible NPE by using Objects.equals to compare graphIds

894f976

comments, close OSM file after reading

1269e1c

Fix regional analyses with freeform destinations

52c2ab1

log a warning when file format is unrecognized

4a9363c

rather than silently taking a legacy code path

Fix geographic extent tests

8d2a6e0

Consistently throw more general exception type. Check that data actually contains points near the antimeridian.

add Fiji Ferry antimeridian GTFS and OSM test data

802b142

these files should be valid except for the fact that they cross 180 deg

update UI version for cypress tests

4c99918

add test csv file with excessive geographic extent

1767f23

factor out setReadOnly helper method

bfe2946

ansoncfit marked this pull request as ready for review November 11, 2021 01:23

ansoncfit approved these changes Nov 11, 2021

View reviewed changes

ansoncfit merged commit 0f49a4c into dev Nov 11, 2021

ansoncfit deleted the release-patches branch November 11, 2021 01:24

ansoncfit mentioned this pull request Jul 27, 2022

Increase limit on geographic extents for grids, OSM, and GTFS #819

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patches for v6.5 release #748

Patches for v6.5 release #748

abyrd commented Sep 17, 2021 •

edited

Loading

ansoncfit left a comment

Patches for v6.5 release #748

Patches for v6.5 release #748

Conversation

abyrd commented Sep 17, 2021 • edited Loading

In-memory upload FileItems

LODES extract progress and DataGroups

ansoncfit left a comment

Choose a reason for hiding this comment

abyrd commented Sep 17, 2021 •

edited

Loading