Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get latest item optimization #33

Open
beasteers opened this issue Jul 18, 2022 · 1 comment
Open

get latest item optimization #33

beasteers opened this issue Jul 18, 2022 · 1 comment

Comments

@beasteers
Copy link
Contributor

I just wanted to share this recipe for how you can setup the queue to only unpickle the latest item and to skip all other items. This is useful if you have processes running at different rates and you need one process to be able to efficiently drop messages that aren't the latest message, especially applicable for real-time applications.

This can be done quite simply with a wrapper class as shown, but maybe ppl might find it as a nice general feature? wrapping it into the core would get around having to do the slightly hacky thing with self.actual_loads but it's no big deal either way:

import faster_fifo
import faster_fifo_reduction
import queue


class Queue(faster_fifo.Queue):
    def __init__(self, *a, **kw):
        super().__init__(*a, **kw)
        # have the loads function be a noop
        self.actual_loads = self.loads
        self.loads = lambda x: x

    def get_many(self, *a, raw=False, **kw):
        xs = super().get_many(*a, **kw)
        return xs if raw else [self.actual_loads(x) for x in xs]

    def get_latest(self, block=True, **kw):
        '''Get the latest value, don't waste time unpickling values 
        you're not going to use.
        '''
        xs = self.get_many(block=block, raw=True, **kw)
        return self.actual_loads(xs[-1]) if xs else None

    def get_latest_nowait(self):
        return self.get_latest(block=False)


if __name__ == '__main__':
    # have the deserializer print out so we know what's actually being unpickled
    og_loads = Queue.loads
    def loads(x):
        x = og_loads(q, x)
        print('loaded', x)
        return x

    # setup the queue
    q = Queue(loads=loads)
    for i in range(6):
        print('put', i)
        q.put(i)

    # get the latest value
    print(q.get_latest())
    # no more values, this throws an error
    try:
        print(q.get_latest_nowait())
    except queue.Empty:
        print('no more messages, as expected.')

    # try it again for good measure
    for i in range(6):
        print('put', i)
        q.put(i)
    print(q.get_latest_nowait())

An addition that could also maybe(?) be useful would be to get the N latest values?

@alex-petrenko
Copy link
Owner

Wow, that's pretty cool! :D
Awesome idea @beasteers
I didn't expect that a simple overloading of loads would enable functionality like that!

I'm pretty sure you can save some more performance by adding this to library core, although probably not much. And to implement this elegantly we'd need some amount of refactoring.

How about a compromise for now, would you like to write it up as a README section in a PR? And if more people find this useful we can work on adding this as a core feature.

BTW, if I had a use case like this, I would consider using a single shared object protected with multiprocessing.Lock()
Then you don't need a queue at all, and you'd never run into buffer overflow (Queue.Full) issues if you're lagging behind too many objects.
If you use shared memory of some sort you can even save some more time on pickling this way (because you need neither pickle nor unpickle).
This would not work for last_N of course.

Also, an obvious thought, if you need only the latest object but want to keep previous objects, sounds like you need a faster-lifo not faster-fifo :D
That is, a multiprocessing Stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants