pickle support #81

adizhol · 2023-02-01T11:13:23Z

Please add pickle support.

Now there's an error:

  pickle.dump(
  File "stringsource", line 2, in pykdtree.kdtree.KDTree.__reduce_cython__
TypeError: no default __reduce__ due to non-trivial __cinit__

The text was updated successfully, but these errors were encountered:

djhoese · 2023-02-01T12:53:18Z

What do you mean by "now"? This has always been this way I assume. If you have an idea for implementing a fix then feel free to make a pull request or at least propose it here.

djhoese · 2023-02-01T14:52:48Z

I realize now that by "now" you mean "currently it produces an error". Understood.

I'm curious what is your use case?

adizhol · 2023-02-01T19:40:52Z

hmmm... I have a class that contains a kdtree object. I want to save the object to disk for re-use later.

djhoese · 2023-02-03T18:06:05Z

We actually have a similar "need" in the Pytroll/Satpy community where we use dask and pykdtree KDTree. Our usage works fine in dask's threaded context when everything is on the same system and we don't need to serialize anything, but in the distributed context of multiple processes or multiple nodes when we have to serialize things we crash the processing because dask can't serialize the KDTree.

The hard part of serializing the KDTree is that the actual tree part of the structure is a low-level C struct:

pykdtree/pykdtree/_kdtree_core.c.mako

Lines 40 to 58 in 7e2eab7

    
           typedef struct 
        
           { 
        
               ${DTYPE} cut_val; 
        
               int8_t cut_dim; 
        
               uint32_t start_idx; 
        
               uint32_t n; 
        
               ${DTYPE} cut_bounds_lv; 
        
               ${DTYPE} cut_bounds_hv; 
        
               struct Node_${DTYPE} *left_child; 
        
               struct Node_${DTYPE} *right_child; 
        
           } Node_${DTYPE}; 
        
           typedef struct 
        
           { 
        
               ${DTYPE} *bbox; 
        
               int8_t no_dims; 
        
               uint32_t *pidx; 
        
               struct Node_${DTYPE} *root; 
        
           } Tree_${DTYPE};

If the algorithm was implemented as a contiguous block of memory then it'd be pretty simple to say "pretend this C pointer points to an array of numbers, let numpy serialize it", but it isn't that simple. I'm not sure the easiest way to go about this without rewriting large chunks of the C code. I suppose we could write a C function that traverses the C structs and converts them into a single array. I wonder how that effects look ups...

That said, @storpipfugl and @mraspaud, what do you think about performance and memory usage if the C code was rewritten to allocate a block of memory for all the nodes at once...if that's even possible? If it is algorithm-wise wouldn't that be way faster than the possibly non-contiguous pieces of memory for every Node struct?

djhoese mentioned this issue Jul 7, 2023

Consider array for all Node structs and other possible optimizations #92

Open

bastiencyr mentioned this issue Nov 26, 2024

Pickle error and never ending code with 1.43.1 sertit/sertit-utils#26

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pickle support #81

pickle support #81

adizhol commented Feb 1, 2023

djhoese commented Feb 1, 2023

djhoese commented Feb 1, 2023

adizhol commented Feb 1, 2023

djhoese commented Feb 3, 2023

pickle support #81

pickle support #81

Comments

adizhol commented Feb 1, 2023

djhoese commented Feb 1, 2023

djhoese commented Feb 1, 2023

adizhol commented Feb 1, 2023

djhoese commented Feb 3, 2023