Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parsing of non-finite values #3942
Fix parsing of non-finite values #3942
Changes from 1 commit
c87b8e2
48b6e2c
182e924
7d81cdb
c3ab42f
fa06bf6
fbf8637
84c3af8
af9c942
c75f756
f81ed85
25a7a29
5562014
e29ae45
6272a30
fd2a504
108c6de
be702c0
1aea9d9
4f06459
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great clean code @mjmckp ;)
Instead of allocating a string, and since you already have a lambda, what about defining a case-insensitive comparison lambda and use
std::equal
to check the "inf" and "nan" values below?Although this is the rare branch there might be longer strings than inf or nan which might be parsed here and might slow down our parsing without need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example: https://thispointer.com/c-case-insensitive-string-comparison-using-stl-c11-boost-library/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this hardly seems worth it, this branch is rarely invoked, meanwhile a colossal amount of strings are being allocated in splitting and parsing the input file, so these few extra allocations are a drop in the ocean.
I think our time is better spent adding robust round-trip tests to ensure major bugs like this don't occur again...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Will you add such tests?
I actually run such tests but on an external lgbm provider and didn't have nan nor inf on my model, but would prefer to see them in lgbm's CI so any breakage is detected immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could assist in adding the tests, any idea where these should go and how best to implement them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mjmckp #3555 has been just merged. We will really appreciate new GTest-compatible tests in this or in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, would you mind pointing me towards a similar kind of test that I can use as a starting point please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we don't have any tests yet. This is something that we should concentrate on in the near future. For now, I think you can take a look at tests from @AlbertoEAF in #3997.
https://github.com/microsoft/LightGBM/pull/3997/files#diff-c363eba6eda99d9e560f8341a1fc8fe02e885d2256db2482e1c543430a25666d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mjmckp I've merged this PR with the aim to not delay the upcoming release. Please feel free to add tests in a new PR. We'll be very grateful! And thanks a lot for the bug fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@StrikerRUS Thanks a lot, I'll get up to speed on how the new tests work and add some tests for this in a new PR soon.