Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Median Imputation feature #130

Merged
merged 5 commits into from
Nov 20, 2023
Merged

Conversation

fredabisai
Copy link
Collaborator

  • Create Median Imputation feature
  • Write Unit Test for Median Imputation

for (String value : datasetColumn) {
if (!isNumeric(value)) {
throw new RuntimeException(
"AVERAGE imputation strategy can not be used on categorical features. "
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be Median?

}
}

private boolean isNumeric(String value) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about value.isBlank(), why is a blank String numeric. I dont say this is wrong, just want to understand.

return value.matches("-?\\d+(\\.\\d+)?") || value.isBlank();
}

private double calculateMedian(String[] datasetColumn) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will run way to slow


private double calculateMedian(String[] datasetColumn) {
List<String> filteredDatasetColumn =
Arrays.stream(datasetColumn).filter((value) -> !value.isBlank()).toList();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method utilizes two separate streams - one for filtering non-blank values and another for converting to Double objects. This approach increases overhead due to the creation of multiple streams and lists.

You use List of Doubles, this is much less efficient compared double[] here. So pls use an array and instead of Collections.sort use Arrays.sort(numbers); this will outperform Collections.sort by far.

List<String> filteredDatasetColumn =
Arrays.stream(datasetColumn).filter((value) -> !value.isBlank()).toList();
List<Double> filteredDatasetColumnInNumbers =
new java.util.ArrayList<>(filteredDatasetColumn.stream().map(Double::parseDouble).toList());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use static import here.

@Samyssmile Samyssmile merged commit 2502a0a into main Nov 20, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants