-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
K nearest neighbors Algorithm #723
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #723 +/- ##
==========================================
+ Coverage 95.02% 95.07% +0.04%
==========================================
Files 303 304 +1
Lines 22577 22809 +232
==========================================
+ Hits 21454 21685 +231
- Misses 1123 1124 +1 ☔ View full report in Codecov by Sentry. |
This commit adds K-nearest neighbors Algorithm. Important points regarding Commit: 1. Eucledian distance is used to compute the distance between 2 points. 2. Only classification has been implemented by using majority vote amongst k nearest neighbors for an input point 3. An Enum has been created for distance compuation so that it can be extended to other distance computation techniques/formulae like Manhattan Distance,etc. 4. Classification Label is of type 'String'.
fn eucledian_distance(source_point: &Point, destination_point: &Point) -> f64 { | ||
let distance = square_root( | ||
(destination_point.x - source_point.x) * (destination_point.x - source_point.x) | ||
+ (destination_point.y - source_point.y) * (destination_point.y - source_point.y), | ||
); | ||
abs(distance) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not to use existing implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can. However , i have changed my code to use n-dimensional data. The existing implementation is only for 2D data.
x: f64, // x-axis | ||
y: f64, // y-axis | ||
label: String, // A label denotes the classification of the data. For instance, 'Football' might be labeled as 'Sport' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments do not provide any additional value and can be removed, since.
Why do you restrict yourself to 2D data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah , I agree. I started with 2D data. I updated the code now to support n-dimensions data.
neighbors: Vec<&Point>, // The training data which essentially consists of a set of points on the X-Y axis represented as vector of Points, | ||
input_point: Point, // The input point requiring classification | ||
k: usize, // The value of 'K'. For example, if K equals 4, classification is determined by the majority vote among the 4 nearest neighbors | ||
distance_computation: DistanceMeasurementFormula, // An enum employed to specify the technique/formula for calculating the distance between two points |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not do make it a function pointer, so the caller could define their own distance without modifying your code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was also thinking about using traits to achieve the same. But i assume, traits do add a vtable's lookup overhead which can be avoided with function pointers. I updated the code with the same
…ters for KNN Implementation This commit 1. Replaces the Enum used for Distance Computations with Function Pointers 2. Adds Support for Multiple Dimensions. Previoulsy the algorithm was only considering a two dimensional data. New change lets the input be n-dimension's data. 3. Eucledian Distance Formula is applied for n-dimension data.
Hi , i made the changes suggested. Let me know if there is anything else that's needed :) |
This pull request has been automatically marked as abandoned because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Please ping one of the maintainers once you commit the changes requested or make improvements on the code. If this is not the case and you need some help, feel free to ask for help in our Gitter channel. Thank you for your contributions! |
This commit adds K-nearest neighbors Algorithm.
Important points regarding Commit:
Pull Request Template
Description
Please include a summary of the change and which issue (if any) is fixed.
A brief description of the algorithm and your implementation method can be helpful too. If the implemented method/algorithm is not so
well-known, it would be helpful to add a link to an article explaining it with more details.
Type of change
Please delete options that are not relevant.
Checklist:
cargo clippy --all -- -D warnings
just before my last commit and fixed any issue that was found.cargo fmt
just before my last commit.cargo test
just before my last commit and all tests passed.mod.rs
file within its own folder, and in any parent folder(s).DIRECTORY.md
with the correct link.COUNTRIBUTING.md
and my code follows its guidelines.Please make sure that if there is a test that takes too long to run ( > 300ms), you
#[ignore]
that ortry to optimize your code or make the test easier to run. We have this rule because we have hundreds of
tests to run; If each one of them took 300ms, we would have to wait for a long time.