You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue:
The index selects k-1 for any observation that the ID is >home.
Using index to define knn_homes from ames_train the correct items are not captured without other changes.
index <- as.vector(FNN::knnx.index(df[-home, ], df[home, ], k = k)) knn_homes <- ames_train[c(home, index), ]
UPDATED
index <- FNN::knn.index(df[], k = k)[home,]
change the function to knn.index() and do the home search outside the function, as.vector() is also not required
or
index <- as.vector(FNN::knnx.index(df[], df[home, ], k = k+1))
remove df[-home] this keeps df the same length and will add home within the set
add k = k+1 -- knnx.index() will add home (typically as the first object unless there are duplicates).
index <-index[!index %in% home]
this will remove the home id from index
Potential issues with this updated code:
My code is not clean and there is a better way.
The target home is included in the original index selection. Removing home is required for the rest of the code to work.
If rows are duplicated in df the first observation is selected as the first record .
Reference iris[c(102,143),] id 102 would be selected first if target is 143.
The text was updated successfully, but these errors were encountered:
bakergreg
changed the title
Chapter 8 - KNN - Notebook
Chapter 8 - KNN - Notebook - index selection is offset
May 18, 2021
../homlr/notebooks/08-knn.nb.html
Header Measuring similarity Figure 8.1
Issue:
The
index
selectsk-1
for any observation that the ID is >home
.Using
index
to defineknn_homes
fromames_train
the correct items are not captured without other changes.nrow(df)
[1] 2049
> nrow(df[-home,])
[1] 2048
nrow(ames_train)
[1] 2049
The removed row is not typically standardized.
home <- 70
# used this ID as an example to show the offsetk = 10
Original
as.vector(FNN::knnx.index(df[-home ,], df[home, ], k = k))
[1] 411 296 69 70 513 183 293 186 184 515
Updated
FNN::knn.index(df[], k = k)[home,]
[1] 412 297 69 71 514 184 294 187 185 516
If ID >'home` the ID are offset by -1.
ORIGINAL
UPDATED
knn.index()
and do thehome
search outside the function,as.vector()
is also not requiredor
df[-home]
this keeps df the same length and will addhome
within the setk = k+1
--knnx.index()
will addhome
(typically as the first object unless there are duplicates).home
id fromindex
Potential issues with this updated code:
My code is not clean and there is a better way.
The target
home
is included in the originalindex
selection. Removinghome
is required for the rest of the code to work.If rows are duplicated in
df
the first observation is selected as the first record .Reference
iris[c(102,143),]
id 102 would be selected first if target is 143.The text was updated successfully, but these errors were encountered: