Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(feature-flags): optimize UX and maintenance #563

Merged

Conversation

heitorlessa
Copy link
Contributor

@heitorlessa heitorlessa commented Jul 26, 2021

Issue #, if available: #494

Description of changes:

Checklist

New UX

from typing import List

from aws_lambda_powertools.utilities.feature_flags import FeatureFlags, AppConfigStore

app_config = AppConfigStore(
    environment="test",
    application="powertools",
    name="test_conf_name",
    cache_seconds=600,
)

feature_flags: FeatureFlags = FeatureFlags(store=app_config)

is_my_feature_active: bool = feature_flags.evaluate(name="my_feature", context={}, default=False)

# TODO: This needs reviewing, possibly meant as List[Dict[str, bool]]???
all_features: List[str] = feature_flags.get_enabled_features(context={})

Using envelope argument to pluck feature flags in an arbitrary key from fetched config

app_config = AppConfigStore(
    environment="test",
    application="powertools",
    name="test_conf_name",
    cache_seconds=600,
    envelope="features"    # feature toggles are under `features` key in fetched config
)

feature_flags: FeatureFlags = FeatureFlags(store=app_config)

Changes

Major changes made during initial doc writing - Due to the size, we're splitting this PR to separate actual docs.

  • General
    • Rename feature_toggles to feature_flags
    • Refactor error loggers to contextual exceptions
  • Improve exceptions
    • Consider an additional fields to improve exception handling such as error type, feature, etc
      • String concatenation ended up slowing things down, dropping it
    • Consider fine grained ConfigurationError instead of a single ConfigurationError
    • Rename it to InvalidSchemaError
  • Refactor Schema to ease maintenance
    • Simplify schema by removing features key
    • Make each rule in rules unique by using Dict over List, and remove rule_name
    • Review long and redundant names such as "feature": { "feature_default_value"...}, "rules": [{"rule_name"...}]
    • Rename feature_default_Value to default
    • Rename value_when_applies to when_match
    • Refactor schema into multiple validators to ease maintenance
    • Use module logger over class logger
    • Create classes for each validation over multiple methods
    • Add typing extensions as feature flags and other features become difficult to maintain without things like TypedDict, improved generics for mypy, etc.
      - Note: MyPy doesn't support TypedDic when you use a variable name as a dict key; ignoring it
    • Rename Action to RuleAction enum
    • Suggest what RuleActions are valid when invalid is provided
  • Refactor ConfigurationStore to FeatureFlags
    • ConfigurationStore to FeatureFlags
    • rules_context to context
    • Rename get_feature method to evaluate
    • Rename get_all_enabled_feature_toggles to get_enabled_features
  • Refactor SchemaFetcher to StoreProvider
    • SchemaFetcher to StoreProvider
    • Use base.py for interfaces for consistency (e.g. Metrics, Tracer, etc.)
    • AppConfigFetcher to AppConfigStore
    • AppConfig construct parameter names for consistency (e.g. configuration_name -> name, service -> application)
    • Rename value_if_missing param to default to match Python consistency (e.g. os.getenv("VAR", default=False))
    • Rename get_configuration method to get_json_configuration
  • Complete docstrings and logging
    • Store section
    • FeatureFlags section
    • Schema section
    • Add schema specification
  • Test conditional value key being a dictionary
  • Load test

Time allowing

  • Alternative to testing feature flag private method e.g. test_is_rule_matched_no_matches

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@pull-request-size pull-request-size bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jul 26, 2021
@boring-cyborg boring-cyborg bot added the documentation Improvements or additions to documentation label Jul 26, 2021
@codecov-commenter
Copy link

codecov-commenter commented Jul 26, 2021

Codecov Report

Merging #563 (a0ae9da) into develop (dfe42b1) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop     #563   +/-   ##
========================================
  Coverage    99.95%   99.95%           
========================================
  Files          113      113           
  Lines         4477     4533   +56     
  Branches       243      245    +2     
========================================
+ Hits          4475     4531   +56     
  Partials         2        2           
Impacted Files Coverage Δ
aws_lambda_powertools/shared/jmespath_utils.py 100.00% <100.00%> (ø)
...bda_powertools/utilities/feature_flags/__init__.py 100.00% <100.00%> (ø)
...da_powertools/utilities/feature_flags/appconfig.py 100.00% <100.00%> (ø)
..._lambda_powertools/utilities/feature_flags/base.py 100.00% <100.00%> (ø)
...a_powertools/utilities/feature_flags/exceptions.py 100.00% <100.00%> (ø)
...owertools/utilities/feature_flags/feature_flags.py 100.00% <100.00%> (ø)
...ambda_powertools/utilities/feature_flags/schema.py 100.00% <100.00%> (ø)
...wertools/utilities/idempotency/persistence/base.py 99.36% <100.00%> (ø)
aws_lambda_powertools/utilities/validation/base.py 100.00% <100.00%> (ø)
...ambda_powertools/utilities/validation/validator.py 100.00% <100.00%> (ø)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dfe42b1...a0ae9da. Read the comment docs.

@pull-request-size pull-request-size bot removed the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jul 30, 2021
@pull-request-size pull-request-size bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 30, 2021
@heitorlessa heitorlessa changed the title docs(feature_flags): initial skeleton docs(feature_flags): new feature flags utility Jul 30, 2021
@heitorlessa
Copy link
Contributor Author

Rough understanding of the schema as I'm going through before documenting it

## Schema Rules

# `features` MUST be present
# `feature_default_value` MUST be present
# MUST be at least ONE feature
# `rules` MIGHT be present
# `rule_name` must be present
# `rule_default_value` must be BOOL

@heitorlessa
Copy link
Contributor Author

heitorlessa commented Aug 2, 2021

We also need to look into how to make it less error prone when creating feature flags whether that's for prod use or unit testing.

There are number of conditions when validation could fail and crafting the dictionary by hand will likely lead to us causing production issues.

@michaelbrewer
Copy link
Contributor

@risenberg-cyberark @heitorlessa - this is looking good. Are we still labeling this as beta? In case there are some small refactoring later.

@heitorlessa
Copy link
Contributor Author

@risenberg-cyberark @heitorlessa - this is looking good. Are we still labeling this as beta? In case there are some small refactoring later.

After this refactoring, I'd go with not labelling as beta, unless something major shows up during the load testing.

After launch, we could also consider a generic AnalyticsProvider with a emit method. This will enrich customers' understanding of their customers and what features might have had enough data points (used, not used).

Two things pending then we merge: 1/ Exception Handling: InvalidSchema over ConfigurationError, and catch Store exceptions and raise accordingly, 2 Load testing: I'm slightly concerned about performance so need to check whether Store cache is sufficient or if we need to reduce the amount of get_configuration calls that happen all the time -- could be 1ms for all validation logic, dunno.

@heitorlessa
Copy link
Contributor Author

Exceptions are now improved. Just need an extra test for condition values that are not just string to be sure we got it right.

I'll load test tomorrow, merge it, and get on with the docs with @am29d to launch it by Friday EOD/Monday in the worst case scenario (life happens!)

@heitorlessa heitorlessa marked this pull request as ready for review August 3, 2021 18:18
@heitorlessa heitorlessa added this to the 1.19.0 milestone Aug 3, 2021
@michaelbrewer
Copy link
Contributor

@heitorlessa minor patch update

diff --git a/aws_lambda_powertools/utilities/feature_flags/__init__.py b/aws_lambda_powertools/utilities/feature_flags/__init__.py
index 1edbea1..5514a50 100644
--- a/aws_lambda_powertools/utilities/feature_flags/__init__.py
+++ b/aws_lambda_powertools/utilities/feature_flags/__init__.py
@@ -1,5 +1,4 @@
-"""Advanced feature toggles utility
-"""
+"""Advanced feature toggles utility"""
 from .appconfig import AppConfigStore
 from .base import StoreProvider
 from .exceptions import ConfigurationStoreError
diff --git a/aws_lambda_powertools/utilities/feature_flags/appconfig.py b/aws_lambda_powertools/utilities/feature_flags/appconfig.py
index ea3f5dd..6dd6292 100644
--- a/aws_lambda_powertools/utilities/feature_flags/appconfig.py
+++ b/aws_lambda_powertools/utilities/feature_flags/appconfig.py
@@ -69,10 +69,13 @@ class AppConfigStore(StoreProvider):
         """
         try:
             # parse result conf as JSON, keep in cache for self.max_age seconds
-            config = self._conf_store.get(
-                name=self.name,
-                transform=TRANSFORM_TYPE,
-                max_age=self.cache_seconds,
+            config = cast(
+                dict,
+                self._conf_store.get(
+                    name=self.name,
+                    transform=TRANSFORM_TYPE,
+                    max_age=self.cache_seconds,
+                ),
             )
 
             if self.envelope:
@@ -80,6 +83,6 @@ class AppConfigStore(StoreProvider):
                     data=config, envelope=self.envelope, jmespath_options=self.jmespath_options
                 )
 
-            return cast(dict, config)
+            return config
         except (GetParameterError, TransformParameterError) as exc:
             raise ConfigurationStoreError("Unable to get AWS AppConfig configuration file") from exc

@boring-cyborg boring-cyborg bot added the internal Maintenance changes label Aug 4, 2021
@heitorlessa
Copy link
Contributor Author

When doing the load testing I found a silent bug - We're swallowing all types of exceptions, including AccessDenied when fetching config from AppConfig.

I'll work on a generic StoreClientError and ensure these and future exceptions from any providers are bubbled up correctly

@heitorlessa heitorlessa changed the title refactor(feature_flags): optimize UX and maintenance WIP refactor(feature_flags): optimize UX and maintenance Aug 4, 2021
@heitorlessa heitorlessa changed the title WIP refactor(feature_flags): optimize UX and maintenance refactor(feature_flags): optimize UX and maintenance Aug 4, 2021
@heitorlessa
Copy link
Contributor Author

heitorlessa commented Aug 4, 2021

Function memory: 128M
Load params: 10 concurrent reqs, 2K TPS

First load test with a single feature being evaluated

All virtual users finished
Summary report @ 22:27:28(+0200) 2021-08-04
  Scenarios launched:  10
  Scenarios completed: 10
  Requests completed:  2000
  Mean response/sec: 161.03
  Response time (msec):
    min: 30
    max: 1122
    median: 49
    p95: 67
    p99: 155.5
  Scenario counts:
    0: 10 (100%)
  Codes:
    200: 2000

Second load test with 100 features that are all evaluated to True**

All virtual users finished
Summary report @ 22:45:41(+0200) 2021-08-04
  Scenarios launched:  10
  Scenarios completed: 10
  Requests completed:  2000
  Mean response/sec: 148.92
  Response time (msec):
    min: 39
    max: 1284
    median: 52
    p95: 79.5
    p99: 150.5
  Scenario counts:
    0: 10 (100%)
  Codes:
    200: 2000

Third load test without Tracer

All virtual users finished
Summary report @ 22:52:12(+0200) 2021-08-04
  Scenarios launched:  10
  Scenarios completed: 10
  Requests completed:  2000
  Mean response/sec: 154.68
  Response time (msec):
    min: 39
    max: 1013
    median: 49
    p95: 71
    p99: 119.5
  Scenario counts:
    0: 10 (100%)
  Codes:
    200: 2000

Fourth load test without Tracer with a 5m duration

All virtual users finished
Summary report @ 22:57:47(+0200) 2021-08-04
  Scenarios launched:  10
  Scenarios completed: 10
  Requests completed:  2000
  Mean response/sec: 7.13
  Response time (msec):
    min: 42
    max: 179
    median: 49
    p95: 59
    p99: 97
  Scenario counts:
    0: 10 (100%)
  Codes:
    200: 2000

Code snippet:

from aws_lambda_powertools import Tracer
from aws_lambda_powertools.utilities.typing import LambdaContext
from aws_lambda_powertools.event_handler.api_gateway import ApiGatewayResolver
from aws_lambda_powertools.utilities.feature_flags import FeatureFlags, AppConfigStore

app_config = AppConfigStore(
    application="powertools",
    environment="test",
    name="test_conf_name",
    cache_seconds=600,
)

tracer = Tracer()
app = ApiGatewayResolver()
feature_flags: FeatureFlags = FeatureFlags(store=app_config)


@app.get("/hello")
def hello():
    username = app.current_event.get_query_string_value(
        "username", default_value="lessa"
    )

    ctx = {"username": f"{username}"}
    # is_my_feature_active: bool = feature_flags.evaluate(
    #     name="my_feature", context=ctx, default=False
    # )

    all_features: list[str] = feature_flags.get_enabled_features(context=ctx)
    return {"message": f"{username}", "all_enabled_features": all_features}
    # return {"message": f"{username}", "my_feature_enabled": is_my_feature_active}


@tracer.capture_lambda_handler
def lambda_handler(event, context: LambdaContext):
    return app.resolve(event, context)

Load test for 60s but with AppConfig cache set to 5s to check on overhead after warm

All virtual users finished
Summary report @ 10:32:41(+0200) 2021-08-05
  Scenarios launched:  10
  Scenarios completed: 10
  Requests completed:  2000
  Mean response/sec: 30.98
  Response time (msec):
    min: 41
    max: 1533
    median: 51
    p95: 71
    p99: 122
  Scenario counts:
    0: 10 (100%)
  Codes:
    200: 2000

@heitorlessa heitorlessa self-assigned this Aug 4, 2021
@heitorlessa
Copy link
Contributor Author

Happy with the performance and we can merge now. This is now maintainable, it has docstrings to aid customers using it as well as maintainers, I've also created a schema specification, and it's easier to extend the feature schema, plus more compact and concise.

After launch, when time allows, we can easily add a AnalyticsProvider where one could bring their own to track who's using features that match conditions.

Thanks a lot @michaelbrewer for the help in the refactoring too.

@am29d we're now ready to move onto the docs, then release it.

@heitorlessa heitorlessa added feature New feature or functionality internal Maintenance changes and removed internal Maintenance changes labels Aug 4, 2021
@heitorlessa heitorlessa merged commit 92d4a6d into aws-powertools:develop Aug 4, 2021
@heitorlessa heitorlessa deleted the docs/dynamic-feature-toggles branch August 4, 2021 21:06
@pcolazurdo
Copy link

pcolazurdo commented Aug 5, 2021

if we need to reduce the amount of get_configuration calls that happen all the time -- could be 1ms for all validation logic, dunno.

We discussed about using memoize to speed up this, but @risenberg-cyberark pointed out that the rules could be really large in some cases and the memory used may have the opposite effect. This may be something to revisit if performance test aren't great.

@heitorlessa heitorlessa changed the title refactor(feature_flags): optimize UX and maintenance refactor(feature-flags): optimize UX and maintenance Aug 10, 2021
@sthulb sthulb added the feature_flags Feature Flags utility label Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation feature_flags Feature Flags utility feature New feature or functionality internal Maintenance changes size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants