Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter 8 - KNN - Notebook - index selection is offset #52

Open
bakergreg opened this issue May 18, 2021 · 0 comments
Open

Chapter 8 - KNN - Notebook - index selection is offset #52

bakergreg opened this issue May 18, 2021 · 0 comments

Comments

@bakergreg
Copy link

../homlr/notebooks/08-knn.nb.html
Header Measuring similarity Figure 8.1

Issue:
The index selects k-1 for any observation that the ID is >home.
Using index to define knn_homes from ames_train the correct items are not captured without other changes.

nrow(df)
[1] 2049
> nrow(df[-home,])
[1] 2048
nrow(ames_train)
[1] 2049

The removed row is not typically standardized.

home <- 70 # used this ID as an example to show the offset
k = 10

Original as.vector(FNN::knnx.index(df[-home ,], df[home, ], k = k))
[1] 411 296 69 70 513 183 293 186 184 515

Updated FNN::knn.index(df[], k = k)[home,]
[1] 412 297 69 71 514 184 294 187 185 516

If ID >'home` the ID are offset by -1.

ORIGINAL

home <- 30
k = 10

index <- as.vector(FNN::knnx.index(df[-home, ], df[home, ], k = k))
knn_homes <- ames_train[c(home, index), ]

UPDATED

index <- FNN::knn.index(df[], k = k)[home,]

  • change the function to knn.index() and do the home search outside the function, as.vector() is also not required

or

index <- as.vector(FNN::knnx.index(df[], df[home, ], k = k+1))

  • remove df[-home] this keeps df the same length and will add home within the set
  • add k = k+1 -- knnx.index() will add home (typically as the first object unless there are duplicates).

index <-index[!index %in% home]

  • this will remove the home id from index

Potential issues with this updated code:
My code is not clean and there is a better way.

The target home is included in the original index selection. Removing home is required for the rest of the code to work.

If rows are duplicated in df the first observation is selected as the first record .
Reference iris[c(102,143),] id 102 would be selected first if target is 143.

@bakergreg bakergreg changed the title Chapter 8 - KNN - Notebook Chapter 8 - KNN - Notebook - index selection is offset May 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant