Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling None values in data #3

Open
kb0304 opened this issue Sep 11, 2019 · 0 comments
Open

Handling None values in data #3

kb0304 opened this issue Sep 11, 2019 · 0 comments

Comments

@kb0304
Copy link

kb0304 commented Sep 11, 2019

There can be None values in the data, I thought of the following approach to handle them. Looking forward to hear your comments on the same.

We ignore all the Nones in the start and the end of the data. And, the method for sampling a point from a bucket can be modified as follows (Bucketing method is same as before)

// Pseudocode
a_avg be the average area of all the areas calculated till now.

if (bucket is all Nones){
    return None
}

if (left bucket is all Nones && right bucket is all Nones){
    // Maybe a criteria to choose from the available not None points could be there?
    return first not None element
}

if (left bucket is all Nones && right bucket is not all Nones){
    // let r_avg[x], r_avg[y] be the average of the not Nones in right bucket
    return the point (p[x], p[y]) having maximum area of the triangle formed by 0.5 * |r_avg[y] – p[y]| * | r_avg[x] – p[x]|
}

if (left bucket is not all Nones && right bucket is all Nones){
    // let l_avg[x], l_avg[y] be the average of the not Nones in left bucket
    return the point (p[x], p[y]) having maximum area of the triangle formed by 0.5 * |l_avg[y] – p[y]| * | l_avg[x] – p[x]|
}
      
Calculate the average only using non None values. 
Compute the area of each point, let p_max be the point with the maximum area, and the max area be a_max

// Idea: None is the most significant sample if there are enough number of Nones in the bucket 
// and area of the triangle computed for rest of the points is not significant enough
if (number of Nones in the bucket > bucket_size/2  && a_max < a_avg)
    return None
}

return p_max
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant