-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
984 refactoring separate planning and execution phases in key dataset manipulation services #994
Conversation
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// | ||
|
||
#[async_trait::async_trait] | ||
impl MetadataQueryService for MetadataQueryServiceImpl { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would these methods be better suited in MetadataChainExt
extension trait?
To me a service encapsulates complex interactions between several components, but here we have nothing but a stateless wrapper on top of MetadataChain
's visitor interface.
MetadataChain
is a place where we also validate complex domain rules about what sequences of events are valid an which aren't (e.g. two active push sources with same name) so I don't think an argument that this is some "higher-level logic" works in this case either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. I have a feeling that for high level queries to dataset's current state, we shouldn't be considering that there is some metadata chain behind it. We just want to know key characteristics of the dataset's current state. The fact that this maps to MetadataChain
algorithm is an implementation detail. Besides, these questions could be answered with database in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MetadataChainExt
deals with a much lower level of abstraction. This is mostly about visiting the blocks one way or another, emphasizing explicit knowledge of chain structure.
… set transform, instead of ingest/transform planners
00506be
to
6abd459
Compare
* Do not show usage error for --all flag (#960) * Do not show usage error for --all flag When --all flag is set for the `repo delete` command, and there are no repositories to delete, do not shoow usage error. * Improve args validation * Improve args validation, e2e tests * Typo corrected in feature flags (#974) * Images, kamu-base-git: fix collision of executable files (#975) * 868 api server provide feature flags for UI (#976) Separated runtime and UI configuration flags. UI config is provided by API server too. * Release v0.210.0 + minor deps * 854 persistent storage of dataset dependencies graph (#973) Dependency graph service moved to 'datasets' domain. Defined dataset dependency repository interface and created 3 implementations. No more postponed initialization, organized initial setup in the form of an indexer. Added telemetry extensions on the way. Tests for repositories, stabilized other tests. Cascading effect on delete within the dataset entry domain. * v0.211.0 + minor deps * Fixed image building (#977) Replaced cascade delete of dataset entries in graph with more explicit events to allow orphan upstream dependencies where only ID is given * Upgrade to datafusion 43 * Use thiserror v2 throughout * trust-dns-resolver => hickory-resolver + minor deps * Fix non-sequential offsets on ingest * 0.212.0 * Use KAMU_CONTAINER_RUNTIME_TYPE env var in Makefile (#991) * Use KAMU_CONTAINER_RUNTIME_TYPE env var in Makefile * Make podman default engine for e2e tests * Backporting changes from Private Datasets feature branch (#992) * Backport tweaks * Add doc strings * Remove unused deps * Remove unactual test * CHANGELOG.md: update * Tips after self-review * Delete env var on dataset delete (#993) * Delete env var on dataset delete * 984 refactoring separate planning and execution phases in key dataset manipulation services (#994) * Draft split of `CompactionService` into planner and execution parts * Compaction cleanups * Compacting more cleanups * Compacting: read old HEAD on planning phase * Reset service split on planner and execution * Extracted `MetadataQueryService` - to query polling, push sources and set transform, instead of ingest/transform planners * DataWriterMetadataState became part of polling ingest item at the planning phase * Setting watermark : separate planner and execution service * Push ingest service prepared for split * Push ingest split on planning and executing * Made some order in infra/core services * {Flow,Task,Outbox}Executor=>Agent * Unified naming of planners and executors * Revised telemetry in refactored components * Review: DataWriterDataFusionBuilder flattened * changelog * v0.123.0 + minor deps * kamu-dev-base: include short commit hash as well (#995) * v0.213.1: less agressive telemetry with `DataWriterMetadataState` --------- Co-authored-by: Andrii Demus <[email protected]> Co-authored-by: Sergei Zaychenko <[email protected]> Co-authored-by: Sergii Mikhtoniuk <[email protected]> Co-authored-by: Roman Boiko <[email protected]>
Description
Closes: #984
Extracted "planner" and "executor" for compacting, reset, set watermark, push ingest, partially polling ingest.
Renamed long-running "executors" to "agents".
Introduced
MetadataQueryService
to absorb simple queries that do not have to be defined at the level of metadata chian from the interface point of view.Checklist before requesting a review