Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pickle support #81

Open
adizhol opened this issue Feb 1, 2023 · 4 comments
Open

pickle support #81

adizhol opened this issue Feb 1, 2023 · 4 comments

Comments

@adizhol
Copy link

adizhol commented Feb 1, 2023

Please add pickle support.

Now there's an error:

  pickle.dump(
  File "stringsource", line 2, in pykdtree.kdtree.KDTree.__reduce_cython__
TypeError: no default __reduce__ due to non-trivial __cinit__
@djhoese
Copy link
Collaborator

djhoese commented Feb 1, 2023

What do you mean by "now"? This has always been this way I assume. If you have an idea for implementing a fix then feel free to make a pull request or at least propose it here.

@djhoese
Copy link
Collaborator

djhoese commented Feb 1, 2023

I realize now that by "now" you mean "currently it produces an error". Understood.

I'm curious what is your use case?

@adizhol
Copy link
Author

adizhol commented Feb 1, 2023

hmmm... I have a class that contains a kdtree object. I want to save the object to disk for re-use later.

@djhoese
Copy link
Collaborator

djhoese commented Feb 3, 2023

We actually have a similar "need" in the Pytroll/Satpy community where we use dask and pykdtree KDTree. Our usage works fine in dask's threaded context when everything is on the same system and we don't need to serialize anything, but in the distributed context of multiple processes or multiple nodes when we have to serialize things we crash the processing because dask can't serialize the KDTree.

The hard part of serializing the KDTree is that the actual tree part of the structure is a low-level C struct:

typedef struct
{
${DTYPE} cut_val;
int8_t cut_dim;
uint32_t start_idx;
uint32_t n;
${DTYPE} cut_bounds_lv;
${DTYPE} cut_bounds_hv;
struct Node_${DTYPE} *left_child;
struct Node_${DTYPE} *right_child;
} Node_${DTYPE};
typedef struct
{
${DTYPE} *bbox;
int8_t no_dims;
uint32_t *pidx;
struct Node_${DTYPE} *root;
} Tree_${DTYPE};

If the algorithm was implemented as a contiguous block of memory then it'd be pretty simple to say "pretend this C pointer points to an array of numbers, let numpy serialize it", but it isn't that simple. I'm not sure the easiest way to go about this without rewriting large chunks of the C code. I suppose we could write a C function that traverses the C structs and converts them into a single array. I wonder how that effects look ups...

That said, @storpipfugl and @mraspaud, what do you think about performance and memory usage if the C code was rewritten to allocate a block of memory for all the nodes at once...if that's even possible? If it is algorithm-wise wouldn't that be way faster than the possibly non-contiguous pieces of memory for every Node struct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants