Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'get_aggregation_unit' action and 'aggregation_unit' params for all spatial aggregates #5278

Merged
merged 46 commits into from
Aug 12, 2022

Conversation

jc-harrison
Copy link
Member

@jc-harrison jc-harrison commented Jul 7, 2022

Closes #5141, #4816

I have:

  • Formatted any Python files with black
  • Brought the branch up to date with master
  • Added any relevant Github labels
  • Added tests for any new additions
  • Added or updated any relevant documentation
  • Added an Architectural Decision Record (ADR), if appropriate
  • Added an MPLv2 License Header if appropriate
  • Updated the Changelog

Description

Adds a new flowmachine server action, get_aggregation_unit, which returns the aggregation unit associated with a spatial aggregate query (or None if the query is not a spatial aggregate). To support this, all exposed query class instances corresponding to a spatial aggregate query now have an aggregation_unit attribute or property.

Also adds a dump-only aggregation_unit parameter to all spatial aggregates that do not have an explicit aggregation_unit parameter - a dump-only field is not accepted as an input parameter, but it is included in the API spec (marked as "read-only") so that FlowAPI can use this parameter to determine whether a query kind is a spatial aggregate when generating permission scopes. The dump-only aggregation_unit fields also mean that the aggregation_unit property of an exposed query will be included if the query is serialised - while that's a nice feature, it's not something we currently rely on for anything.

Adds a docs section on exposing a new query kind, since there are an increasing number of steps that need to be included in an exposed query.

Also a few changes to query schemas that are not directly related to the original aims of this PR:

  • Defines OneOfQuerySchema (subclass of OneOfSchema) for nesting a choice of multiple sub-query schemas. Consequences are that all exposed query classes now need a query_kind class attribute, and we no longer need a workaround in FlowAPI to mark query_kind parameters as required.
  • Fixes some custom validators so that errors in nested fields do not cause validation errors to be masked by key/attribute errors (Modal location aggregation-unit consistency validation masks other validation errors #4816)
  • Ensures that all required parameters are marked as such in the query schemas
  • Defines a Direction field, to replace the duplicated code used for direction fields in several query schemas

@codecov
Copy link

codecov bot commented Jul 7, 2022

Codecov Report

Merging #5278 (64a9ff7) into master (7a4594a) will decrease coverage by 17.27%.
The diff coverage is 86.02%.

@@             Coverage Diff             @@
##           master    #5278       +/-   ##
===========================================
- Coverage   93.43%   76.15%   -17.28%     
===========================================
  Files         276      277        +1     
  Lines       10724    10847      +123     
  Branches     1274     1284       +10     
===========================================
- Hits        10020     8261     -1759     
- Misses        576     2333     +1757     
- Partials      128      253      +125     
Impacted Files Coverage Δ
flowapi/flowapi/api_spec.py 35.13% <ø> (-57.89%) ⬇️
flowapi/flowapi/permissions.py 83.92% <ø> (-9.83%) ⬇️
flowclient/flowclient/query_specs.py 100.00% <ø> (ø)
...hine/core/server/query_schemas/aggregation_unit.py 68.42% <40.00%> (-31.58%) ⬇️
...machine/flowmachine/core/server/action_handlers.py 57.05% <56.52%> (-29.48%) ⬇️
...machine/core/server/query_schemas/custom_fields.py 80.30% <66.66%> (-1.37%) ⬇️
...ine/core/server/query_schemas/visited_most_days.py 66.66% <66.66%> (+6.66%) ⬆️
...re/server/query_schemas/mobility_classification.py 91.30% <75.00%> (-8.70%) ⬇️
...achine/core/server/query_schemas/labelled_flows.py 92.30% <76.92%> (-7.70%) ⬇️
...core/server/query_schemas/unique_visitor_counts.py 80.00% <78.57%> (-5.00%) ⬇️
... and 160 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@jc-harrison jc-harrison marked this pull request as draft July 7, 2022 12:39

In a new file 'flowmachine/core/server/query_schemas/my_query.py', define a new class `MyQueryExposed`. This class is responsible for constructing the appropriate `MyQuery` object from parameter values supplied in an API call.

There are two options here: if users should be able to select a random sample of the rows from this query result, the exposed query class should inherit from [`BaseExposedQueryWithSampling`](../flowmachine/flowmachine/core/server/query_schemas/base_query_with_sampling/#class-baseexposedquerywithsampling) (this is usually appropriate for _individual-level_ queries). If it does not make sense to allow random sampling of the query result (as is usually the case for _aggregate_ queries), the exposed query class should inherit from [`BaseExposedQuery`](../flowmachine/flowmachine/core/server/query_schemas/base_exposed_query/#class-baseexposedquery).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might consider changing the names of these really.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, could do. What would you suggest?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about BaseIndividualQuery and BaseAggregateQuery, but I'm not sure it makes sense given that it may not be universally the case that these are the appropriate bases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yes, I don't know whether it will always be the case that we'd want to allow sampling for all non-aggregates and never for aggregates.

I wonder whether we'd be better-off rolling the sampling stuff into BaseExposedQuery (i.e. apply sampling if there's a sampling parameter, otherwise don't), and turning BaseQueryWithSamplingSchema into a mixin - then we don't have to worry about having two different bases.

For this action handler the `action_params` are exactly the query kind plus the
parameters needed to construct the query.
"""
def _load_query_object(params: dict) -> "BaseExposedQuery":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might make sense to make this a class method on BaseExposedQuery?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not straightforward without leading to circular imports. And on reflection, I'm not sure it makes all that much sense anyway - the core of this function is just FlowmachineQuerySchema().load(params); the rest is just catching appropriate errors and manipulating them into a form that's convenient for constructing an appropriate ZMQReply, so I think it makes sense to keep this alongside the action handlers.

query_kind = fields.String(validate=OneOf(["active_at_reference_location_counts"]))
unique_locations = fields.Nested(UniqueLocationsSchema())
reference_locations = fields.Nested(ReferenceLocationSchema())
query_kind = fields.String(validate=OneOf([__model__.query_kind]), required=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved into BaseSchema?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not, I think - the validator needs to get the query kind from __model__.query_kind, and this has to be set when defining the class. It would be neater if we could move this into BaseSchema, but I can't see a way to do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use __init_subclass__ to attach it to the class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh true, yes, that should work.

Copy link
Member Author

@jc-harrison jc-harrison Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, no, doesn't seem to work. I tried adding

def __init_subclass__(cls, **kwargs):
    super().__init_subclass__(**kwargs)
    if hasattr(cls, "__model__"):
        cls.query_kind = fields.String(validate=OneOf([cls.__model__.query_kind]), required=True)

and while subclasses get a query_kind class attribute as expected, it doesn't get included as a field to be deserialised. Looks like marshmallow is doing some magic behind the scenes that doesn't play nicely with this approach.


In a new file 'flowmachine/core/server/query_schemas/my_query.py', define a new class `MyQueryExposed`. This class is responsible for constructing the appropriate `MyQuery` object from parameter values supplied in an API call.

There are two options here: if users should be able to select a random sample of the rows from this query result, the exposed query class should inherit from [`BaseExposedQueryWithSampling`](../flowmachine/flowmachine/core/server/query_schemas/base_query_with_sampling/#class-baseexposedquerywithsampling) (this is usually appropriate for _individual-level_ queries). If it does not make sense to allow random sampling of the query result (as is usually the case for _aggregate_ queries), the exposed query class should inherit from [`BaseExposedQuery`](../flowmachine/flowmachine/core/server/query_schemas/base_exposed_query/#class-baseexposedquery).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about BaseIndividualQuery and BaseAggregateQuery, but I'm not sure it makes sense given that it may not be universally the case that these are the appropriate bases.

@jc-harrison jc-harrison marked this pull request as ready for review July 7, 2022 14:50
@cypress
Copy link

cypress bot commented Jul 7, 2022



Test summary

43 0 0 0Flakiness 0


Run details

Project FlowAuth
Status Passed
Commit 64a9ff7
Started Aug 12, 2022 12:27 PM
Ended Aug 12, 2022 12:40 PM
Duration 13:25 💡
OS Linux Debian - 10.5
Browser Electron 100

View run in Cypress Dashboard ➡️


This comment has been generated by cypress-bot as a result of this project's GitHub integration settings. You can manage this integration in this project's settings in the Cypress Dashboard

@jc-harrison jc-harrison added enhancement New feature or request FlowMachine Issues related to FlowMachine bug Something isn't working labels Jul 8, 2022
@jc-harrison jc-harrison requested a review from greenape July 8, 2022 16:12
@jc-harrison jc-harrison added the ready-to-merge Label indicating a PR is OK to automerge label Aug 12, 2022
@mergify mergify bot merged commit 65a2a05 into master Aug 12, 2022
@mergify mergify bot deleted the explicit-aggregation-unit branch August 12, 2022 12:41
@Thingus Thingus mentioned this pull request Aug 16, 2022
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request FlowMachine Issues related to FlowMachine ready-to-merge Label indicating a PR is OK to automerge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explicit 'aggregation_unit' for all spatial aggregates
2 participants