Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea/Discussion: Analytics V2 #2025

Open
winged opened this issue Jun 29, 2023 · 0 comments
Open

Idea/Discussion: Analytics V2 #2025

winged opened this issue Jun 29, 2023 · 0 comments

Comments

@winged
Copy link
Contributor

winged commented Jun 29, 2023

Current Situation

The analytics functionality is currently built directly into Caluma. This allows
for tight integration and reusability of the Django models.

However, it also overloads Caluma itself with analytics processing, which may slow
down the "transactional" processing (OLAP vs OLTP).

Users also want other forms of output, like CSV, Excel, or even graphics. Integrating
this would blow up the "core" Caluma even more, both in terms of (disk) volume and
workload.

The goal

In the long term, we will need to extract the analytics functionality into it's own
service. The (live/production) data would be periodically synchronized and
transformed into a structure that is better suited for analytics processing (Snowflake / Star schema)

Caluma Analytics currently tries to "tabularize" the tree structure by providing specific selections when a "parent" object has multiple sub-objects. For example, when starting with cases, we have: Case -> Workitem[task-x,newest] -> Document -> Answer[question-x] -> (possible value extraction).

When building the new analytics service, this "pre-aggregation" could be done on DB level, so the query complexity would be reduced drastically. Taking the above example, the Caluma schema would be structured into the following tables:

image

As an explanation, the work item's primary key in Caluma is its UUID. In the Analytics service, its primary key would be the combination of case id, task slug, and an additional selector to reduce the number to zero or one per case (like "newest" or "oldest"); allowing a "tabular" reading of the tree structure.

This would imply having custom visibility and permissions in the Analytics service, as the data model would not match the one used in Caluma itself. However we think this is not neccessarily a bad thing, as the requirements may differ anyway, and
users may be allowed to see things in aggregates that they wouldn't be allowed to see in detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant