Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add possibility to retrospectively promote subset collections to full collections #655

Open
Zehvogel opened this issue Aug 22, 2024 · 3 comments

Comments

@Zehvogel
Copy link
Contributor

See also discussion in: key4hep/k4FWCore#226

@nathanwbrei
Copy link
Contributor

One general approach to solving this that piques my interest is a "tree-shaking" algorithm. The user would specify which collections must be output. Podio then starts by marking these collections as "live". It follows all associations backwards, across all collections, marking everything it encounters as live. Anything not live at the end does not get written. This would strike a good balance between saving space and preserving associations and hence data integrity. If the runtime cost is too high, we could reduce it by having the user specify exactly which collections need to be pruned.

@tmadlener
Copy link
Collaborator

That sounds like an interesting approach. I think it could work, there might be some edge cases to be considered. One potential issue is the following: All objects are identified by their ObjectID, consisting of a collectionID and an index into that collection. We would have to make sure that these are properly set before any writing happens. I think (and this needs to be verified) that this should work, because the final setting of all of these before we write things happens in prepareForWrite, i.e. as long as things are pruned before that we should be able to get the index set correctly.

@tmadlener
Copy link
Collaborator

As discussed during the EDM4hep meeting on Sep 10 we don't think a truly generic solution is possible or this. Hence, we decided that at least for the foreseeable future the developments in this direction will (and should) focus on implementing the necessary functionality to make things work for specific use cases where the expected outcome is well defined, e.g. skimming MCParticles

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants