Chapter 8 - KNN - Notebook - index selection is offset #52

bakergreg · 2021-05-18T18:06:13Z

../homlr/notebooks/08-knn.nb.html
Header Measuring similarity Figure 8.1

Issue:
The index selects k-1 for any observation that the ID is >home.
Using index to define knn_homes from ames_train the correct items are not captured without other changes.

nrow(df)
[1] 2049
> nrow(df[-home,])
[1] 2048
nrow(ames_train)
[1] 2049

The removed row is not typically standardized.

home <- 70 # used this ID as an example to show the offset
k = 10

Original as.vector(FNN::knnx.index(df[-home ,], df[home, ], k = k))
[1] 411 296 69 70 513 183 293 186 184 515

Updated FNN::knn.index(df[], k = k)[home,]
[1] 412 297 69 71 514 184 294 187 185 516

If ID >'home` the ID are offset by -1.

ORIGINAL

home <- 30
k = 10

index <- as.vector(FNN::knnx.index(df[-home, ], df[home, ], k = k))
knn_homes <- ames_train[c(home, index), ]

UPDATED

index <- FNN::knn.index(df[], k = k)[home,]

change the function to knn.index() and do the home search outside the function, as.vector() is also not required

or

index <- as.vector(FNN::knnx.index(df[], df[home, ], k = k+1))

remove df[-home] this keeps df the same length and will add home within the set
add k = k+1 -- knnx.index() will add home (typically as the first object unless there are duplicates).

index <-index[!index %in% home]

this will remove the home id from index

Potential issues with this updated code:
My code is not clean and there is a better way.

The target home is included in the original index selection. Removing home is required for the rest of the code to work.

If rows are duplicated in df the first observation is selected as the first record .
Reference iris[c(102,143),] id 102 would be selected first if target is 143.

The text was updated successfully, but these errors were encountered:

bakergreg changed the title ~~Chapter 8 - KNN - Notebook~~ Chapter 8 - KNN - Notebook - index selection is offset May 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chapter 8 - KNN - Notebook - index selection is offset #52

Chapter 8 - KNN - Notebook - index selection is offset #52

bakergreg commented May 18, 2021

Chapter 8 - KNN - Notebook - index selection is offset #52

Chapter 8 - KNN - Notebook - index selection is offset #52

Comments

bakergreg commented May 18, 2021