Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide an ability to configure HPI sources directly #340

Open
Stvad opened this issue Nov 30, 2022 · 3 comments
Open

Provide an ability to configure HPI sources directly #340

Stvad opened this issue Nov 30, 2022 · 3 comments
Labels
backend Related to indexing/serving documentation documentation/readme enhancements

Comments

@Stvad
Copy link

Stvad commented Nov 30, 2022

Having to set up both promnesia and HPI backends considerably adds to the complexity of the setup.

IMO it'd be a considerable improvement if I could just pass a relevant configuration to HPI sources directly from Promnesia config.

So if I'm configuring Roam:

SOURCES = [
    Source(roamresearch),
]

would be something like

SOURCES = [
    Source(roamresearch, export_path="path/to/json"),
]

And then a user does not need to figure out how to set up HPI config which is literally just used to provide that export path string

@karlicoss
Copy link
Owner

karlicoss commented Dec 31, 2022

So let's say promnesia module takes in a JSON-like config object (instead of specific config fields, it can't interpret them anyway):

SOURCES = [
    Source(roamresearch, hpi_config={"export_path": "path/to/json", "error_policy": "throw"}),
]

Then the promnesia module could do some dynamic magic ✨ inside and set the corresponding HPI config, roughly something like import my.config; my.config.roamresearch = convert_json_to_object(hpi_config).

There are however some issues with this approach. It might work well in simple cases like roamresearch. Its hpi module is my.roamreasearch, its hpi config.py section is class roamresearch: ..., so it's straightforward.

Consider something more complicated like twitter module:

https://github.com/karlicoss/promnesia/blob/master/src/promnesia/sources/twitter.py#L9-L11

In promnesia it's just importing my.twitter.all and using whatever it spits out -- pretty simple.

However if you wanted to make promnesia.sources.twitter somehow configure HPI for itself, this would be quite tricky.
It's hard to say what HPI modules correspond to it!
Not only in the sense that it's not very well documented -- but it might actually depend on 'subsources' configuration and user's HPI overlays.

E.g. the default HPI is this: https://github.com/karlicoss/promnesia/blob/master/src/promnesia/sources/twitter.py#L9-L11 -- it's trying to merge twitter GDPR archive + Twint export. These would correspond to twitter_archive and twint sections in the config.

However, in my personal overlay I'm also using data from Talon app on top of that
https://github.com/karlicoss/hpi-personal-overlay/blob/master/src/my/twitter/all.py#L10-L13

... and the config section for Talon is twitter.talon.

So that means hpi_config would have to take multiple config sections, something like this:

SOURCES = [
    Source(twitter, hpi_configs=[
        {"twitter_archive": {"export_path": ...}},
        {"talon"          : ...},
        {"twitter.twint"  : ...},.
    ]),
]

To be honest it looks less bad than I expected before I started typing the response, so maybe it's a feasible option after all 😅

However perhaps more realistic is making it something like this:

class hpi_config:
    class roamresearch:
        export_path = "/path/to/json"

    class twitter_archive:
        export_path = ...

    class twitter:
        class twint:
            export_path = ....
    ....


<some magic helper to load hpi_config object into actual hpi>, e.g. hpi.load_extra_config(hpi_config)

SOURCES = [
    Source(roamresearch),
    Source(twitter),
]

I guess this is still simple enough, presumably the most annoying bit is having to init a separate config file, but if it's in the same file it's better?
This way it would keep promnesia modules intact, and agnostic of exact HPI module internals.

I actually wonder if it's possible to write this magic config loader even within the current HPI/promnesia implementation (although need to experiment to find out).
There are a couple of examples of generating configs dynamically in runtime

https://github.com/karlicoss/HPI/blob/4a04c09f314e10a4db8f35bf1ecc10e4d0203223/tests/config.py
https://github.com/karlicoss/HPI/blob/5ef277526577daaa115223e79a07a064ffa9bc85/tests/demo.py
https://github.com/karlicoss/HPI/blob/8422c6e420f5e274bd1da91710663be6429c666c/my/core/cfg.py#L31

So maybe something like this:

class hpi_config:
    .... # same as above

import my.config
for key in dir(hpi_config):
    # this likely won't work, might need something more elaborate)
    setattr(my.config, key, getattr(hpi_config, key))

SOURCES = [
    S(roamresearch),
    S(twitter),
]

@karlicoss karlicoss added backend Related to indexing/serving documentation documentation/readme enhancements labels Dec 31, 2022
@Stvad
Copy link
Author

Stvad commented Feb 4, 2023

Apologies for a delayed response!

I originally was thinking about something closer to the latter approach where you can define the HPI configs in a way that you usually define them and Promnesia can understand and consume that (presumably including installing the HPI dependency if it's not doing that already)

@karlicoss
Copy link
Owner

Note to self -- been playing with it today, and at the moment it's achievable with something like this (needs improvement of course, but at least it's already possible in principle)

def reset_modules() -> None:
    '''
    A hack to 'unload' HPI modules, otherwise some modules might cache the config
    '''
    import sys
    import re
    to_unload = [m for m in sys.modules if re.match(r'my[.]?', m)]
    for m in to_unload:
        del sys.modules[m]


def brindex(path: PathIsh):
    reset_modules()
    from my.core.cfg import tmp_config
    with tmp_config() as cfg:
        from my.core.common import classproperty, Paths, get_files
        class browser:   # adhoc user-defined config
            class export:
                @classproperty
                def export_path(cls) -> Paths:
                    return get_files(path, glob='**/*')[:1]
        cfg.browser = browser
        yield from browser_new.index()

karlicoss added a commit to karlicoss/HPI that referenced this issue Feb 9, 2023
properly reload/unload the relevant modules so hopefully no more weird hacks should be required

relevant
- karlicoss/promnesia#340
- #46
karlicoss added a commit to karlicoss/HPI that referenced this issue Feb 9, 2023
properly reload/unload the relevant modules so hopefully no more weird hacks should be required

relevant
- karlicoss/promnesia#340
- #46
karlicoss added a commit to karlicoss/HPI that referenced this issue Feb 9, 2023
properly reload/unload the relevant modules so hopefully no more weird hacks should be required

relevant
- karlicoss/promnesia#340
- #46
karlicoss added a commit to karlicoss/HPI that referenced this issue Feb 9, 2023
properly reload/unload the relevant modules so hopefully no more weird hacks should be required

relevant
- karlicoss/promnesia#340
- #46
karlicoss added a commit to karlicoss/HPI that referenced this issue Feb 9, 2023
properly reload/unload the relevant modules so hopefully no more weird hacks should be required

relevant
- karlicoss/promnesia#340
- #46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Related to indexing/serving documentation documentation/readme enhancements
Projects
None yet
Development

No branches or pull requests

2 participants