Handling None values in data #3

kb0304 · 2019-09-11T07:06:11Z

There can be None values in the data, I thought of the following approach to handle them. Looking forward to hear your comments on the same.

We ignore all the Nones in the start and the end of the data. And, the method for sampling a point from a bucket can be modified as follows (Bucketing method is same as before)

// Pseudocode
a_avg be the average area of all the areas calculated till now.

if (bucket is all Nones){
    return None
}

if (left bucket is all Nones && right bucket is all Nones){
    // Maybe a criteria to choose from the available not None points could be there?
    return first not None element
}

if (left bucket is all Nones && right bucket is not all Nones){
    // let r_avg[x], r_avg[y] be the average of the not Nones in right bucket
    return the point (p[x], p[y]) having maximum area of the triangle formed by 0.5 * |r_avg[y] – p[y]| * | r_avg[x] – p[x]|
}

if (left bucket is not all Nones && right bucket is all Nones){
    // let l_avg[x], l_avg[y] be the average of the not Nones in left bucket
    return the point (p[x], p[y]) having maximum area of the triangle formed by 0.5 * |l_avg[y] – p[y]| * | l_avg[x] – p[x]|
}
      
Calculate the average only using non None values. 
Compute the area of each point, let p_max be the point with the maximum area, and the max area be a_max

// Idea: None is the most significant sample if there are enough number of Nones in the bucket 
// and area of the triangle computed for rest of the points is not significant enough
if (number of Nones in the bucket > bucket_size/2  && a_max < a_avg)
    return None
}

return p_max

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling None values in data #3

Handling None values in data #3

kb0304 commented Sep 11, 2019

Handling None values in data #3

Handling None values in data #3

Comments

kb0304 commented Sep 11, 2019