Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Revamp config management #1104

Closed
wants to merge 25 commits into from

Conversation

llucax
Copy link
Contributor

@llucax llucax commented Nov 15, 2024

No description provided.

@github-actions github-actions bot added part:docs Affects the documentation part:tests Affects the unit, integration and performance (benchmarks) tests part:tooling Affects the development tooling (CI, deployment, dependency management, etc.) part:config Affects the configuration management labels Nov 15, 2024
Copy link
Contributor

@ela-kotulska-frequenz ela-kotulska-frequenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will continue checking on Monday :)

src/frequenz/sdk/config/_manager.py Outdated Show resolved Hide resolved
@llucax
Copy link
Contributor Author

llucax commented Nov 22, 2024

Updated to hardcode limit=1 (I guess it never makes sense to receive old configs, and the interface is already complicated enough) and to split the commits so it is easier to review.

Still pretty much a WIP, but the bulk should be done, I need to cleanup, document and test now 😬

@llucax llucax self-assigned this Nov 22, 2024
@github-actions github-actions bot added the part:actor Affects an actor ot the actors utilities (decorator, etc.) label Dec 9, 2024
Copy link
Contributor

@ela-kotulska-frequenz ela-kotulska-frequenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished at Add support for skipping None configs

src/frequenz/sdk/config/_manager.py Outdated Show resolved Hide resolved
src/frequenz/sdk/config/_manager.py Show resolved Hide resolved
src/frequenz/sdk/config/_manager.py Show resolved Hide resolved
async def new_receiver( # pylint: disable=too-many-arguments # noqa: DOC502
self,
*,
wait_for_first: bool = False,
Copy link
Contributor

@ela-kotulska-frequenz ela-kotulska-frequenz Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider

  • removing wait_for_first_timeout from constructor arguments.
  • changing wait_for_first to `wait_for_first_timeout: timedelta | None = None```` :

If wait_for_first is None don't wait for it first config, if timedelta(X)- wait for X seconds.

Interface will simplify a little - we will have one config instead of 2.
Unless you see any use case to wait infinitelly for the first config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the presentation of this to a wider audience I'm really considering to remove the wait for first completely, I think it adds a lot of complexity only to deal with a very niche an obscure situation where there might be a bug and for some reason we don't receive the first config.

I think we can improve debugability of that situation by adding a timeout to the reading of the config files in the config managing actor instead (if we don't have it yet), which would be probably the only place where this could really hang. Also we might add more logging before and after reading the first config, so if you see a "Waiting for first config" without an immediate "Got the initial config: %s", config, that would mean the config reading somehow got stuck. We can now do this in the ConfigManager itself, so users don't need to remember to add these log lines.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm even more convinced we should remove this, as if we keep this option, the new_receiver() method needs to be async, and if it is async, it can be used in constructors, which complicates initialization even further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, lets remove it :) It is not so difficult to write it now (after you simplified other things)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally moved it to an utility function. This removes all the complexity from the config manager, and if you need this functionality, you can just do a await wait_for_first() and that will get the default timeout if you don't pass one explicitly.

src/frequenz/sdk/config/_manager.py Outdated Show resolved Hide resolved
Comment on lines +153 to +160
# This is tricky, because a str is also a Sequence[str], if we would use only
# Sequence[str], then a regular string would also be accepted and taken as
# a sequence, like "key" -> ["k", "e", "y"]. We should never remove the str from
# the allowed types without changing Sequence[str] to something more specific,
# like list[str] or tuple[str].
key: str | Sequence[str] | None = None,
Copy link
Contributor

@ela-kotulska-frequenz ela-kotulska-frequenz Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try ConfigParser? : https://docs.python.org/3/library/configparser.html

Then we would simplify require that key should follows toml key structure so:
"key.xyz" or "logging.`frequenz.boo`"

And remove _get_key method :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check if it works with toml files...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of accepting dot notation, but thought it might be better to keep the flexibility of allowing dots in the keys, which is supported by TOML. Also since in most cases we'll just use "key" or [key, "sub_key"], or even key.append(sub_key), as we probably want to do a chain for sub-keys, it sounds like at the end it might be more convenient to use sequence than the dot notation, which would mean something like f"{key}.sub_key".

In any case, ConfigParser is not an option as it uses some weird INI file format, not TOML, and we decided to go with TOML for config files. ConfigParser is a very old beast, with a non-standard format, so I would rather avoid it for something new.

Copy link
Contributor Author

@llucax llucax Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried simplifying this but Python really sucks in terms of sequences of strings.

  • I tried accepting only Sequence[str] but it doesn't work because a str is also a Sequence[str], so it will keep accepting both, we'll just interpret a str as a sequence of characters, which is not what we want.

  • I tried accepting only tuple[str], but it sucks that a tuple with only one element needs a trailing comma, which is another common pitfall in Python ("str" == ("str")). Also tuple("str") == ("s", "t", "r") because "str" is a Sequence[str] 🙄

  • I tried accepting only list[str] but then we can't use an explicit default, as lists are mutable, and we hit just one more common Python pitfall. For example with this function:

    def f(l: list[str] = ["logging"]) -> None:
        l.append("this sucks")
        print(l)
    f()

    Prints:

    ['logging', 'this sucks']
    

python -m this

There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.

I guess the problem is I'm not Dutch.


With this in mind, maybe accepting dot notation with plain strings might be worth considering, but I'm also a bit hesitant about having to parse these strings, and parsing back ticks, my feeling is we could get into another problem and then have subtle bugs if there is an unmatched tick or stuff like that.

So I will keep it as is for now and we can reconsider it in a near future. If we go that route, I think I would just assume some restrictions on valid keys. We already have restrictions when creating the config files,I think only ASCII letters [A-Z] and digits [0-9] plus underscore _ and dot . are accepted. I just didn't want to have this limitation hardcoded in the SDK for now, as maybe some user would want to write their config files manually and not have these restrictions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, lets keep it as it is for now.
Wow such simple thing turns out to be so complicated...

Comment on lines +443 to +464
value = config
current_path = []
for subkey in key:
current_path.append(subkey)
if value is None:
return None
match value.get(subkey):
case None:
return None
case Mapping() as new_value:
value = new_value
case _:
subkey_str = ""
if len(key) > 1:
subkey_str = f" when looking for sub-key {key!r}"
_logger.error(
"Found key %r%s but it's not a mapping, returning None: config=%r",
current_path[0] if len(current_path) == 1 else current_path,
subkey_str,
config,
)
return None
value = new_value
return value


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't test it but this should work the same and is shorter.

subconfig = config
for idx, subkey in enumerate(key):
    subconfig = subconfig.get(subkey, default=None)
    if subconfig is None:
        return None
   if not isinstnace(subcofnig, Mapping):
        curr_key = '.'.join(key[:idx])
        substr = curr_key or  f" when looking for sub-key {key!r}"
        _logger.error(
                    "Found key %r%s but it's not a mapping, returning None: config=%r",
                    current_path[0] if len(current_path) == 1 else current_path,
                    subkey_str,
                    config,
                )
       return None
return subconfig

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right that this is basically the same but removing current_path and using idx instead to get it, and removing the match?

It is indeed shorter but I find the longer version more clear, at least it took me a while to understand the proposed code, I actually like the fact about match that makes all the possibilities very clear and separated, but I guess this probably depends on how each brain works, so I can totally get if for you your code is more clear. I'm happy to change it if that's the case, but I'm not so keep if it is just shorter (but not clearer).

Maybe using the index to get a slice instead of creating a new list can be a bit more efficient and more or less as readable, so I could change only that too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok , previous is more clear to you, then lets leave it :)

src/frequenz/sdk/config/_manager.py Show resolved Hide resolved
This is to be able to use type guards in with `Receiver.filter` to
narrow the received type and the new `WithPrevious` class.

Signed-off-by: Leandro Lucarella <[email protected]>
This class instantiates and starts the `ConfigManagingActor` and creates
the channel needed to communicate with it. It offers a convenience
method to get receivers to receive configuration updates.

Signed-off-by: Leandro Lucarella <[email protected]>
This option is useful for when retrieving the configuration for the
first time, as users might want to block the program's initialization
until the configuration is read.

Signed-off-by: Leandro Lucarella <[email protected]>
When this option is enabled, configurations will be sent to the receiver
only if they are different from the previously received configuration.

Signed-off-by: Leandro Lucarella <[email protected]>
This is useful for actors to be able to subscribe to a particular key
with the actor configuration, avoiding spurious updates and noise in
the received configuration.

Signed-off-by: Leandro Lucarella <[email protected]>
This commit adds support for validating configurations with a schema.
This is useful to ensure the configuration is correct and to receive it
as an instance of a dataclass.

The `new_receiver` method now accepts a `schema` argument that is used
to validate the configuration. If the configuration doesn't pass the
validation, it will be ignored and an error will be logged.

The schema is expected to be a dataclass, which is used to create a
marshmallow schema to validate the configuration. To customize the
schema, you can use `marshmallow_dataclass.dataclass` to specify extra
metadata.

Signed-off-by: Leandro Lucarella <[email protected]>
Configuration can be nested, and actors could have sub-actors that need
their own configuration, so we need to support filtering the
configuration by a sequence of keys.

For example, if the configuration is `{"key": {"subkey": "value"}}`, and
the key is `["key", "subkey"]`, then the receiver will get only
`"value"`.

Signed-off-by: Leandro Lucarella <[email protected]>
For every path where we create a new receiver, we now assert we end up
with the correct type, as the code proved to be a bit weak, and we
sometime ended up with a `Receiver[Any]` without noticing.

Signed-off-by: Leandro Lucarella <[email protected]>
This is useful for cases where the the receiver can't react to `None`
configurations, either because it is handled externally or because it
should just keep the previous configuration.

Signed-off-by: Leandro Lucarella <[email protected]>
This global instance can be used as a single point where any actor can
obtain a receiver to receive configuration updates.

Signed-off-by: Leandro Lucarella <[email protected]>
This means using background service with multiple inheritance, so when
calling `super().__init__()` it can properly ignore all the keyword
arguments it doesn't use.

This also means now `Actor` can be used as a *mixin*, as it doesn't
provide its own `__init__()`.

Signed-off-by: Leandro Lucarella <[email protected]>
This schema is provided to use as a default, and might be extended in
the future to support more commonly used fields that are not provided
by marshmallow by default.

To use the quantity schema we need to bump the `frequenz-quantities`
dependency and add the optional `marshmallow` dependency.

Signed-off-by: Leandro Lucarella <[email protected]>
This class is mainly provided as a guideline on how to implement actors
that can be reconfigured, so actor authors don't forget to do the basic
steps to allow reconfiguration, and to have a common interface and
pattern when creating reconfigurable actors.

Signed-off-by: Leandro Lucarella <[email protected]>
The `LoggingConfigUpdatingActor` now inherits also from `Reconfigurable`
to ensure a consistent initialization.

This also means the actor can now receive `None` as configuration (for
example if the configuration is removed), in which case it will go back
to the default configuration.

Signed-off-by: Leandro Lucarella <[email protected]>
The `Reconfigurable` mixin will be removed as it seems to be overkill
and confusing.

This reverts commit 3102990ef2d57c2a804a2c41391a512de13fb189.

Signed-off-by: Leandro Lucarella <[email protected]>
This class proved to be more confusing than helpful when presented to
a wider audience, it seems to be better to propose ways to implement
actors as documentation.

This reverts commit 3201348.

Signed-off-by: Leandro Lucarella <[email protected]>
Since `Reconfigurable` was removed we don't need this anymore. The added
possibility to use as a mixin doesn't seem to justify the loss of safety
of adding arbitrary keyword arguments to the constructor.

This reverts commit 5eb64165e8603bb50a5c0eb661de7c5770022cb2.

Signed-off-by: Leandro Lucarella <[email protected]>
It seems to be more clear and easier to test to always pass a config
manager explicitly, so we just remove the global instance, which was
also a bit confusing when presented.

This reverts commit 77295dc.

Signed-off-by: Leandro Lucarella <[email protected]>
This feature is adding too much complexity, just to make it slightly
easier to debug a very specific case, when a receiver gets stuck without
being able to receive any messages at all.

We will improve logging instead, so we can still debug this case without
adding this complexity to the API.

This also allows us to make the `new_receiver` method sync.

This reverts commit 29725c4.

Signed-off-by: Leandro Lucarella <[email protected]>
This seems to be a very niche feature that adds quite a bit of
complexity. Users than need this kind of raw access can just get
a receiver from the `config_channel` themselves and do the
processing they need.

Now the `schema` is required, was renamed to `config_class` for extra
clarity and is a positional-only argument.

Signed-off-by: Leandro Lucarella <[email protected]>
This is what most users will need, so we better make it the default.

Signed-off-by: Leandro Lucarella <[email protected]>
@llucax
Copy link
Contributor Author

llucax commented Dec 10, 2024

I just updated this PR with a LOT of simplifications based on the feedback after presenting it to a wider audience.

For now I made all reverting explicit, just in case to have more time and feedback about it. In the final PR I will probably just remove the commits that will not be part of the final diff completely.

The main changes are:

  • Removed support for plain mappings
  • Removed the Reconfigurable mixin
  • Removed the global instance
  • Made unknown=EXCLUDE the default in ConfigManager
  • Removed wait_for_first (and wait_for_first_timeout (improving logging for reading config files a bit).

I have still pending to make it a background service.

async def new_receiver( # pylint: disable=too-many-arguments # noqa: DOC502
self,
*,
wait_for_first: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, lets remove it :) It is not so difficult to write it now (after you simplified other things)

Comment on lines 188 to 201
if skip_unchanged:
receiver = receiver.filter(WithPrevious(not_equal_with_logging))
receiver = receiver.filter(WithPrevious(_NotEqualWithLogging(key)))

if wait_for_first:
async with asyncio.timeout(self.wait_for_first_timeout.total_seconds()):
await receiver.ready()

return receiver
if key is None:
return receiver

return receiver.map(lambda config: config.get(key))


def not_equal_with_logging(
old_config: Mapping[str, Any], new_config: Mapping[str, Any]
) -> bool:
"""Return whether the two mappings are not equal, logging if they are the same."""
if old_config == new_config:
_logger.info("Configuration has not changed, skipping update")
_logger.debug("Old configuration being kept: %r", old_config)
return False
return True
class _NotEqualWithLogging:
Copy link
Contributor

@ela-kotulska-frequenz ela-kotulska-frequenz Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should simplify code a lot if you do map.filter instead of filter.map :)
In other words you first get the key and then apply all filters.

        if key is not None:
            receiver = receiver.map(lambda config: config.get(key))

        if skip_unchanged:
            # Now WithPrevious takes config[key] instead of whole config so it should simplify, too.
            receiver = receiver.filter(WithPrevious(_NotEqualWithLogging(key)))

        return receiver


class _NotEqualWithLogging:
    """A predicate that returns whether the two mappings are not equal.

    If the mappings are equal, a logging message will be emitted indicating that the
    configuration has not changed for the specified key.
    """

    def __init__(self, key: str | None = None) -> None:
        """Initialize this instance.

        Args:
            key: The key to use in the logging message.
        """
        self._key = key

    def __call__(
        self, old_config: Mapping[str, Any] | None, new_config: Mapping[str, Any] | None
    ) -> bool:
        """Return whether the two mappings are not equal, logging if they are the same."""
        if new_config != old_config:
            key_str = f" for key '{self._key}'" if self._key else ""
            _logger.info("Configuration%s has changed, updating", key_str)
            _logger.debug("New configuration%s being applied: %r", key_str, new_config)
            return True
        return False

Copy link
Contributor

@ela-kotulska-frequenz ela-kotulska-frequenz Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above should also work with other commits. But maybe I am missing something


        if key is not None:
            receiver = receiver.map(lambda config: _get_key(config, key))

        if skip_unchanged:
           # _NotEqualWithLogging will simplify
            receiver = receiver.filter(WithPrevious(_NotEqualWithLogging(key)))

        receiver = receiver.map(
            # load_config_with_logging  will simplify
            lambda config: load_config_with_logging(
                config_class, config, base_schema=base_schema, **marshmallow_load_kwargs
            )
        ).filter(_is_valid_or_none)

        if skip_none:
            receiver = receiver.filter(_is_dataclass)
        return receiver

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words you first get the key and then apply all filters.

Yes! I can't believe I missed that before, but in the new version I also reversed the oder of map and filter and it made the code much simpler!

Comment on lines +9 to +10
class BaseConfigSchema(QuantitySchema):
"""A base schema for configuration classes."""
Copy link
Contributor

@ela-kotulska-frequenz ela-kotulska-frequenz Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Just wondering, why you create BaseConfigSchema?
    User can just use QuantitySchema

  2. BaseConfigSchema is not used anywhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Just wondering, why you create BaseConfigSchema?
    User can just use QuantitySchema

TL;DR: It is mostly to have a more future-proof API.

Maybe it is over-engineering for now, but my thought was that users might want to inherit from BaseConfigSchema and add their own custom fields. If in the future we have some other library providing a base schema like QuantitySchema, we can make BaseConfigSchema inherit from that too, and the user's code doesn't need to change, it will automatically get the new fields if they are inheriting from BaseConfigSchema.

If we use QuantitySchema (so users inherit from that), and then we want to ship a base schema with more fields than QuantitySchema, the users will either need to update their code to now inherit from BaseConfigSchema, or they won't get the new fields.

  1. BaseConfigSchema is not used anywhere

Yeah, draft PR :-P Fixed in the new version.

@@ -34,6 +34,7 @@ def __init__( # pylint: disable=too-many-arguments
*,
auto_start: bool = True,
force_polling: bool = True,
logging_config_key: str | Sequence[str] | None = "logging",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we will always have this enabled.
Maybe to simplify code we could just enable this by default and not make it configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this option adds a lot of complexity, it is just a couple of extra ifs, and gives a bit of extra flexibility to external users that might not want to allow reconfiguring the logging. And if you just want to disable that, it means you cannot use the config manager and miss on a lot of other useful features. Is not like making the new_receiver() method less flexible, where we have a escape hatch (just do manager.config_channel.new_receiver() to create your own), here you can't use the config manager at all if you want to customize only that one small part.

I don't have a strong opinion though, given the trade-offs I prefer for it to stay but I'm also open to remove it (it should be pretty easy to add it back if needed).

Comment on lines +324 to +326
if "unknown" not in marshmallow_load_kwargs:
marshmallow_load_kwargs["unknown"] = marshmallow.EXCLUDE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we would like to raise exception if there are any unknown fields.

RAISE (default): raise a ValidationError if there are any unknown fields
EXCLUDE: exclude unknown fields
INCLUDE: accept and include the unknown fields

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's too restrictive for a config. For example, if you deploy a new version that has new options, and you configure one of the new options. Then you find a regression and want to roll back. If you don't rollback the config too, then your app doesn't start anymore.

In the new version I'm actually logging warnings if unknown fields are found, I think that's the most balanced approach, as having unknown fields go unnoticed could also cause issues (a typo in a config variable name might end up having a misconfigured app without noticing).

@llucax
Copy link
Contributor Author

llucax commented Dec 17, 2024

I will close this PR and create a new one, because in the new iteration things changed too much.

@llucax llucax closed this Dec 17, 2024
@llucax llucax mentioned this pull request Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
part:actor Affects an actor ot the actors utilities (decorator, etc.) part:config Affects the configuration management part:docs Affects the documentation part:tests Affects the unit, integration and performance (benchmarks) tests part:tooling Affects the development tooling (CI, deployment, dependency management, etc.)
Projects
Development

Successfully merging this pull request may close these issues.

2 participants