Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Accumulator::evaluate and Accumulator::state to take &mut self #8934

Closed
alamb opened this issue Jan 21, 2024 · 0 comments · Fixed by #8925
Closed

Change Accumulator::evaluate and Accumulator::state to take &mut self #8934

alamb opened this issue Jan 21, 2024 · 0 comments · Fixed by #8925
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Jan 21, 2024

Is your feature request related to a problem or challenge?

As part of #8849, I wanted to build up the output state (the distinct string values) during accumulation and then generate output directly (without a copy)

However, I couldn't implement a zero copy algorithm because evaluate and merge take a &self not a &mut self (well I worked around it with a Mutex 🤮 ).

The need to copy intermediate state likely doesn't matter for most Accumulators as they only emit a single scalar value where the cost of copying is pretty low. However, for ones that emit significant internal state (like count DISTINCT) using the same internal state can save a lot of copying (again, see #8849 for an example)

Also, the actual Accumulator instances are never used after a call to evaluate/state, so the state of the accumulator after this call is never used again.

Describe the solution you'd like

I would like to change Accumulator::evaluate and Accumulator::state to take &mut self

This is also consistent with GroupsAccumulator which takes mut for its state and evaluate functions:

pub trait GroupsAccumulator: Send {
...
    fn evaluate(
        &mut self,
        emit_to: EmitTo
    ) -> Result<Arc<dyn Array>, DataFusionError>;
    fn state(
        &mut self,
        emit_to: EmitTo
    ) -> Result<Vec<Arc<dyn Array>>, DataFusionError>;
...

Describe alternatives you've considered

No response

Additional context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant