-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create Median Imputation feature #130
Conversation
fredabisai
commented
Nov 20, 2023
- Create Median Imputation feature
- Write Unit Test for Median Imputation
for (String value : datasetColumn) { | ||
if (!isNumeric(value)) { | ||
throw new RuntimeException( | ||
"AVERAGE imputation strategy can not be used on categorical features. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must be Median?
} | ||
} | ||
|
||
private boolean isNumeric(String value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about value.isBlank(), why is a blank String numeric. I dont say this is wrong, just want to understand.
return value.matches("-?\\d+(\\.\\d+)?") || value.isBlank(); | ||
} | ||
|
||
private double calculateMedian(String[] datasetColumn) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will run way to slow
|
||
private double calculateMedian(String[] datasetColumn) { | ||
List<String> filteredDatasetColumn = | ||
Arrays.stream(datasetColumn).filter((value) -> !value.isBlank()).toList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method utilizes two separate streams - one for filtering non-blank values and another for converting to Double objects. This approach increases overhead due to the creation of multiple streams and lists.
You use List of Doubles, this is much less efficient compared double[] here. So pls use an array and instead of Collections.sort use Arrays.sort(numbers);
this will outperform Collections.sort by far.
List<String> filteredDatasetColumn = | ||
Arrays.stream(datasetColumn).filter((value) -> !value.isBlank()).toList(); | ||
List<Double> filteredDatasetColumnInNumbers = | ||
new java.util.ArrayList<>(filteredDatasetColumn.stream().map(Double::parseDouble).toList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not use static import here.