Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K nearest neighbors Algorithm #723

Closed
wants to merge 9 commits into from

Conversation

haswanth10
Copy link

This commit adds K-nearest neighbors Algorithm.

Important points regarding Commit:

  1. Euclidean distance is used to compute the distance between 2 points.
  2. Only classification has been implemented by using majority vote amongst k nearest neighbors for an input point
  3. An Enum has been created for distance computation so that it can be extended to other distance computation techniques/formulae like Manhattan Distance, etc.
  4. Classification Label is of type 'String'.

Pull Request Template

Description

Please include a summary of the change and which issue (if any) is fixed.
A brief description of the algorithm and your implementation method can be helpful too. If the implemented method/algorithm is not so
well-known, it would be helpful to add a link to an article explaining it with more details.

Type of change

Please delete options that are not relevant.

  • [ Y] New feature (non-breaking change which adds functionality)

Checklist:

  • [ Y] I ran bellow commands using the latest version of rust nightly.
  • [ Y] I ran cargo clippy --all -- -D warnings just before my last commit and fixed any issue that was found.
  • [ Y] I ran cargo fmt just before my last commit.
  • [Y ] I ran cargo test just before my last commit and all tests passed.
  • [ Y] I added my algorithm to the corresponding mod.rs file within its own folder, and in any parent folder(s).
  • [ Y] I added my algorithm to DIRECTORY.md with the correct link.
  • [ Y] I checked COUNTRIBUTING.md and my code follows its guidelines.

Please make sure that if there is a test that takes too long to run ( > 300ms), you #[ignore] that or
try to optimize your code or make the test easier to run. We have this rule because we have hundreds of
tests to run; If each one of them took 300ms, we would have to wait for a long time.

@haswanth10 haswanth10 requested review from imp2002 and vil02 as code owners May 21, 2024 01:43
@haswanth10 haswanth10 marked this pull request as draft May 21, 2024 01:45
@codecov-commenter
Copy link

codecov-commenter commented May 21, 2024

Codecov Report

Attention: Patch coverage is 99.56897% with 1 line in your changes missing coverage. Please review.

Project coverage is 95.07%. Comparing base (0b8ba06) to head (56bd101).

Files Patch % Lines
...chine_learning/optimization/k_nearest_neighbors.rs 99.56% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #723      +/-   ##
==========================================
+ Coverage   95.02%   95.07%   +0.04%     
==========================================
  Files         303      304       +1     
  Lines       22577    22809     +232     
==========================================
+ Hits        21454    21685     +231     
- Misses       1123     1124       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@haswanth10 haswanth10 marked this pull request as ready for review May 21, 2024 23:22
Haswanth Dasiga added 4 commits May 21, 2024 16:24
 This commit adds K-nearest neighbors Algorithm.

 Important points regarding Commit:
   1. Eucledian distance is used to compute the distance between 2 points.
   2. Only classification has been implemented by using majority vote amongst k nearest neighbors for an input point
   3. An Enum has been created for distance compuation so that it can be extended to other distance computation techniques/formulae like Manhattan Distance,etc.
   4. Classification Label is of type 'String'.
@haswanth10 haswanth10 changed the title Knn haswanth K nearest neighbors Algorithm May 21, 2024
Comment on lines 34 to 41
fn eucledian_distance(source_point: &Point, destination_point: &Point) -> f64 {
let distance = square_root(
(destination_point.x - source_point.x) * (destination_point.x - source_point.x)
+ (destination_point.y - source_point.y) * (destination_point.y - source_point.y),
);
abs(distance)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not to use existing implementation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can. However , i have changed my code to use n-dimensional data. The existing implementation is only for 2D data.

Comment on lines 12 to 14
x: f64, // x-axis
y: f64, // y-axis
label: String, // A label denotes the classification of the data. For instance, 'Football' might be labeled as 'Sport'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments do not provide any additional value and can be removed, since.

Why do you restrict yourself to 2D data?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah , I agree. I started with 2D data. I updated the code now to support n-dimensions data.

neighbors: Vec<&Point>, // The training data which essentially consists of a set of points on the X-Y axis represented as vector of Points,
input_point: Point, // The input point requiring classification
k: usize, // The value of 'K'. For example, if K equals 4, classification is determined by the majority vote among the 4 nearest neighbors
distance_computation: DistanceMeasurementFormula, // An enum employed to specify the technique/formula for calculating the distance between two points
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do make it a function pointer, so the caller could define their own distance without modifying your code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was also thinking about using traits to achieve the same. But i assume, traits do add a vtable's lookup overhead which can be avoided with function pointers. I updated the code with the same

Haswanth Dasiga and others added 3 commits May 28, 2024 18:13
…ters for KNN Implementation

This commit
1. Replaces the Enum used for Distance Computations with Function Pointers
2. Adds Support for Multiple Dimensions. Previoulsy the algorithm was only considering a two dimensional data. New change lets the input be n-dimension's data.
3. Eucledian Distance Formula is applied for n-dimension data.
@haswanth10 haswanth10 requested a review from vil02 May 29, 2024 01:24
@haswanth10
Copy link
Author

haswanth10 commented Jun 6, 2024

Hi , i made the changes suggested. Let me know if there is anything else that's needed :)

Copy link

github-actions bot commented Jul 6, 2024

This pull request has been automatically marked as abandoned because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jul 6, 2024
Copy link

Please ping one of the maintainers once you commit the changes requested or make improvements on the code. If this is not the case and you need some help, feel free to ask for help in our Gitter channel. Thank you for your contributions!

@github-actions github-actions bot closed this Jul 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants