Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patches for v6.5 release #748

Merged
merged 47 commits into from
Nov 11, 2021
Merged

Patches for v6.5 release #748

merged 47 commits into from
Nov 11, 2021

Conversation

abyrd
Copy link
Member

@abyrd abyrd commented Sep 17, 2021

I am accumulating any patches here leading up to the September 2021 v6.5 release.

In addition to the items described in detail below, this PR contains various small but important fixes to:

  • normalize file extensions on DataSources in FileStorage
  • change some API endpoint paths and parameters to match UI usage
  • add range and null checking to some API parameters
  • not tag workers with projectId (workers can migrate between projects)
  • ensure all sidecar files of a shapefile are pulled from S3 before reading it

In-memory upload FileItems

I ran into a problem creating new opportunity datasets. On upload, a checkState that all uploaded items are readable files failed. This is because unlike Bundle and DataSource uploads, OpportunityDataset uploads were not using the method HttpUtils.getRequestFiles() which customizes the FileUploadItemFactory to ensure even tiny files (like Shapefile projection sidecars) are actually put in a disk file instead of memory. In the case of creating opportunity dataset grids from shapefiles, this wasn't hurting anything because the conversion process explicitly writes the shapefile FileItems out to a temporary directory. Rather than remove the checkState (which actually does check required conditions for other code paths handling uploaded files) I just changed all such code paths to use the same HttpUtils utility method.

Long term it would probably be better to remove the zero-threshold factory, allowing small FileItems (including non-file form fields) to remain in memory, then revise and use moveFilesIntoStorage to explicitly FileItem.write() to a temporary location or directly into FileStorage.

Confusion was caused here by the fact that contrary to its documentation, FileItem does not return null when getStoreLocation is called on an in-memory FileItem. It returns the name of a temporary file where it would save the data if the file were to exceed the size threshold.

LODES extract progress and DataGroups

We were extracting LODES data in a background task that shows up on /api/activity, but not reporting any progress, not putting the resulting layers into a datagroup, and not reporting the WorkProduct. Because we still have the "classic" LODES upload status reports, I considered just bypassing the background task activity reporting, but didn't want to move backward. So with this change we are now reporting both kinds of progress. The new one shows much better incremental progress, so I believe we could now just stop displaying the classic progress updates at the top left of the UI. Observations: I noticed seeing the two side by side that the new progress text transitions from green to blue upon completion, while in the classic progress the entire background flips from blue to green when it finishes. The new progress reporting is clearly better, but the "flip to green" effect of the old progress reporting is more satisfying, and I always feel more instantly aware that the task has finished. This is now reporting a work product of OPPORTUNITY_DATASET with group=true, which the UI does not know how to link to. Suggested approach for the release is to leave both progress mechanisms enabled, but temporarily stop setting the workProduct on the backend so the UI doesn't try to display a broken link.

Note also that OpportunityDatasets have a sourceId field which should probably be dataSourceId to match AggregationArea, but I'm not changing this now because it might require a database migration.

abyrd and others added 18 commits September 17, 2021 14:02
All parameters are required so a missing (null) header value will
already prevent startup. Throwing an exception here fails immediately,
preventing reporting of any other missing parameters.
Use standard HttpUtils.getRequestFiles in OpportunityDatasetController
which causes all uploaded files to be disk files rather than in-memory
byte buffers. Uploads were failing due to small Shapefile sidecars not
being written to disk.

Also added clearer messages to the checkState calls that were failing.
Add a dataGroupId to OpportunityDatasets. We were assigning an (unsaved)
DataSource, but not putting them in DataGroups.

Also stronger typing on parameters (pass DataGroup not String id)
also generalized one log message for local use
workers do not stick to a single projectId, only to a single bundleId
previously we were using the uppercase enum value names
on newly started backends, we were fetching only the shp and not others
This allows us to remove the old non-array fields in mongo records for
older regional analysis results by copying their values into
single-element arrays. We will still find older files by fallback.
Also return empty string instead of null in LocalFilesController and
document why - when Spark framework sees a null response it thinks the
path is not mapped to a controller.
abyrd and others added 11 commits September 28, 2021 14:10
some of these scenarios are huge and we're considering removing them
from the database
These are usually built in bundleController and cached. When they were
rebuilt by gtfsCache, the feedId was falling back on the filename
(which was a bundle-scoped feedId rather than just the feedId) or even
on the feedId declared by the feed itself, not known to be unique.
Add "Access-Control-Max-Age" header set to one day to all preflight requests. This allows browsers to cache the "Options" preflight request instead of sending it for each HTTP request.
Not generic nodes etc. which may lack lat/lon values
Add "Access-Control-Max-Age" header
Assertions elsewhere assume no travel time will exceed the
request.maxTripDurationMinutes limit. This appears to only be an issue
when cutoff times are lower than access mode-specifc travel time limits
abyrd and others added 17 commits October 25, 2021 17:09
This was observed in testing when running many single point and regional
analyses. Perhaps a regional worker started up and was assigned the
same IP address as a single point worker that had recently shut down
from inactivity. The resulting exception must be caught so this IP will
be unregistered for this workerCategory and a new one started/found.
returning partial regional analysis results was only half-implemented,
which was causing us to mishandle requests on incomplete jobs.
Ideally we'd infer from the database record that this analysis does not
have any gridded results and immediately return a 404. As a short-term
fix, we check assertions only when an older-format file is actually
found, and return a 404 to cover a wider range of cases: files that are
missing because they are not even supposed to exist, and files that
should exist but have gone missing for other unknown reasons.
rather than silently taking a legacy code path
Add method to check for spanning 180 degree meridian.
Move methods into GeometryUtils for reuse.
Check envelope of uploaded OSM and GTFS data.
Check envelope of WebMercatorGridPointSet when created.
Check envelope of StreetLayer when it's loaded.
Text describing range-checked object is now substituted into messages.
Consistently throw more general exception type.
Check that data actually contains points near the antimeridian.
these files should be valid except for the fact that they cross 180 deg
This reverts commit f399f90.
This is a double-revert, re-establishing the original use of
getPriority methods instead of priority fields. The hastily
applied patch where we shifted to using the field instead of
methods was based on a false belief that old MapDBs had this
field. In fact the field was a temporary addition during
development, and the original state was without the field.
We don't want to save them to the MapDB file anyway, and simply adding
error instances to MapDB files causes (de)serializer configuration to be
stored in the file. This makes it more fragile: any change to error
class fields can break reopening MapDB files even after all instances
of those errors have been cleared out.

We can't readily stream the errors out to JSON without accumulating in
memory, because we also want to summarize them for storage in Mongo.
Conceivably we could build up the summary at the same time as streaming
them out to JSON.
@ansoncfit ansoncfit marked this pull request as ready for review November 11, 2021 01:23
Copy link
Member

@ansoncfit ansoncfit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving for merge to dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants