984 refactoring separate planning and execution phases in key dataset manipulation services #994

zaychenko-sergei · 2024-12-17T22:39:09Z

Description

Closes: #984

Extracted "planner" and "executor" for compacting, reset, set watermark, push ingest, partially polling ingest.
Renamed long-running "executors" to "agents".
Introduced MetadataQueryService to absorb simple queries that do not have to be defined at the level of metadata chian from the interface point of view.

Checklist before requesting a review

src/infra/ingest-datafusion/src/writer.rs

sergiimk · 2024-12-18T00:03:08Z

src/infra/core/src/services/metadata_query_service_impl.rs

+////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
+
+#[async_trait::async_trait]
+impl MetadataQueryService for MetadataQueryServiceImpl {


Would these methods be better suited in MetadataChainExt extension trait?

To me a service encapsulates complex interactions between several components, but here we have nothing but a stateless wrapper on top of MetadataChain's visitor interface.

MetadataChain is a place where we also validate complex domain rules about what sequences of events are valid an which aren't (e.g. two active push sources with same name) so I don't think an argument that this is some "higher-level logic" works in this case either.

I'm not sure. I have a feeling that for high level queries to dataset's current state, we shouldn't be considering that there is some metadata chain behind it. We just want to know key characteristics of the dataset's current state. The fact that this maps to MetadataChain algorithm is an implementation detail. Besides, these questions could be answered with database in future.

MetadataChainExt deals with a much lower level of abstraction. This is mostly about visiting the blocks one way or another, emphasizing explicit knowledge of chain structure.

… set transform, instead of ingest/transform planners

…nning phase

* Do not show usage error for --all flag (#960) * Do not show usage error for --all flag When --all flag is set for the `repo delete` command, and there are no repositories to delete, do not shoow usage error. * Improve args validation * Improve args validation, e2e tests * Typo corrected in feature flags (#974) * Images, kamu-base-git: fix collision of executable files (#975) * 868 api server provide feature flags for UI (#976) Separated runtime and UI configuration flags. UI config is provided by API server too. * Release v0.210.0 + minor deps * 854 persistent storage of dataset dependencies graph (#973) Dependency graph service moved to 'datasets' domain. Defined dataset dependency repository interface and created 3 implementations. No more postponed initialization, organized initial setup in the form of an indexer. Added telemetry extensions on the way. Tests for repositories, stabilized other tests. Cascading effect on delete within the dataset entry domain. * v0.211.0 + minor deps * Fixed image building (#977) Replaced cascade delete of dataset entries in graph with more explicit events to allow orphan upstream dependencies where only ID is given * Upgrade to datafusion 43 * Use thiserror v2 throughout * trust-dns-resolver => hickory-resolver + minor deps * Fix non-sequential offsets on ingest * 0.212.0 * Use KAMU_CONTAINER_RUNTIME_TYPE env var in Makefile (#991) * Use KAMU_CONTAINER_RUNTIME_TYPE env var in Makefile * Make podman default engine for e2e tests * Backporting changes from Private Datasets feature branch (#992) * Backport tweaks * Add doc strings * Remove unused deps * Remove unactual test * CHANGELOG.md: update * Tips after self-review * Delete env var on dataset delete (#993) * Delete env var on dataset delete * 984 refactoring separate planning and execution phases in key dataset manipulation services (#994) * Draft split of `CompactionService` into planner and execution parts * Compaction cleanups * Compacting more cleanups * Compacting: read old HEAD on planning phase * Reset service split on planner and execution * Extracted `MetadataQueryService` - to query polling, push sources and set transform, instead of ingest/transform planners * DataWriterMetadataState became part of polling ingest item at the planning phase * Setting watermark : separate planner and execution service * Push ingest service prepared for split * Push ingest split on planning and executing * Made some order in infra/core services * {Flow,Task,Outbox}Executor=>Agent * Unified naming of planners and executors * Revised telemetry in refactored components * Review: DataWriterDataFusionBuilder flattened * changelog * v0.123.0 + minor deps * kamu-dev-base: include short commit hash as well (#995) * v0.213.1: less agressive telemetry with `DataWriterMetadataState` --------- Co-authored-by: Andrii Demus <[email protected]> Co-authored-by: Sergei Zaychenko <[email protected]> Co-authored-by: Sergii Mikhtoniuk <[email protected]> Co-authored-by: Roman Boiko <[email protected]>

zaychenko-sergei requested a review from sergiimk December 17, 2024 22:39

zaychenko-sergei linked an issue Dec 17, 2024 that may be closed by this pull request

Refactoring: Separate planning and execution phases in key dataset manipulation services #984

Closed

sergiimk reviewed Dec 18, 2024

View reviewed changes

zaychenko-sergei added 16 commits December 18, 2024 12:07

Draft split of CompactionService into planner and execution parts

ab33b41

Compaction cleanups

07c69d2

Compacting more cleanups

0916bc2

Compacting: read old HEAD on planning phase

d77593e

Reset service split on planner and execution

5eaf3c5

Extracted MetadataQueryService - to query polling, push sources and…

aca5f63

… set transform, instead of ingest/transform planners

DataWriterMetadataState became part of polling ingest item at the pla…

3e487d3

…nning phase

Setting watermark : separate planner and execution service

f4dfb8a

Push ingest service prepared for split

0bee8d7

Push ingest split on planning and executing

6684f19

Made some order in infra/core services

054d291

{Flow,Task,Outbox}Executor=>Agent

0a931d2

Unified naming of planners and executors

63ec431

Revised telemetry in refactored components

4e77b8f

Review: DataWriterDataFusionBuilder flattened

e88635b

changelog

6abd459

zaychenko-sergei force-pushed the 984-refactoring-separate-planning-and-execution-phases-in-key-dataset-manipulation-services branch from 00506be to 6abd459 Compare December 18, 2024 10:07

zaychenko-sergei merged commit 447c30c into master Dec 18, 2024
0 of 6 checks passed

zaychenko-sergei deleted the 984-refactoring-separate-planning-and-execution-phases-in-key-dataset-manipulation-services branch December 18, 2024 10:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

984 refactoring separate planning and execution phases in key dataset manipulation services #994

984 refactoring separate planning and execution phases in key dataset manipulation services #994

zaychenko-sergei commented Dec 17, 2024 •

edited

Loading

sergiimk Dec 18, 2024

zaychenko-sergei Dec 18, 2024

zaychenko-sergei Dec 18, 2024

984 refactoring separate planning and execution phases in key dataset manipulation services #994

984 refactoring separate planning and execution phases in key dataset manipulation services #994

Conversation

zaychenko-sergei commented Dec 17, 2024 • edited Loading

Description

Checklist before requesting a review

sergiimk Dec 18, 2024

Choose a reason for hiding this comment

zaychenko-sergei Dec 18, 2024

Choose a reason for hiding this comment

zaychenko-sergei Dec 18, 2024

Choose a reason for hiding this comment

zaychenko-sergei commented Dec 17, 2024 •

edited

Loading