Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: BUG: ABNORMAL_TERMINATION test from real data #212

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Eh2406
Copy link
Contributor

@Eh2406 Eh2406 commented May 25, 2018

At yesterday's meeting with @semcogli and @janowicz we discuss whether #211 would have a practical impact on our models. I thought I would run an experiment. On our computer, I updated our copy of UrbanSim to include #211, and refit all the elcm models.

Indeed many of our elcm models are reliably throwing a non-convergence warning 'ABNORMAL_TERMINATION_IN_LNSRCH'. According to the internet, that diagnostic code is usually caused by programer error. Specifically, a bad gradient calculation.

To dig in I tried to make a minimal reproducible example, I took one of our models and reduced the sample size and removed columns until the ABNORMAL_TERMINATION stopped. That became the new test_mnl_estimate_temp. It needs a better name, or even better to be in the same form as the parametric tests.

With a MRE in hand I wanted to look deeper. Is the gradient calculation correct? So I added a wrapper function mnl_loglik_print that print the difference between the gradient returned by mnl_loglik and the gradient as approximated by scipy.optimize.approx_fprime. Indeed the gradient does not look correct! In fact it doesn't look good on most of the existing test.

@Eh2406
Copy link
Contributor Author

Eh2406 commented May 25, 2018

Edit: Closed by accident.

I am having trouble getting the calculus to agree with the code,
@smmaurer, @timothyb0912 you are more familiar with this, dose the math look correct?

@Eh2406 Eh2406 closed this May 25, 2018
@smmaurer
Copy link
Member

Hi @Eh2406, this is really interesting. I'm not familiar enough with the estimation code to have a good sense of what's going on, though.

One thing I'm curious about is how much the non-convergence is affecting the estimation results. When there's a non-converging model and you reduce the sample size until the error disappears, do the coefficients and standard errors change or are they similar?

Another thing to try could be to estimate some of the same models using PyLogit, which is a completely different codebase, and compare the results. This is a bit of a pain because the data formats and model expressions are different, but here's a notebook with an example. ("ChoiceModels engine" is essentially urbansim.urbanchoice.mnl and "PyLogit engine" is pylogit.mnl.MNL.)

https://github.com/UDST/choicemodels/blob/master/notebooks/Destination-choice-models-02.ipynb

I'm going to take a look and see if I can get non-convergence with some Bay Area data..

@smmaurer smmaurer reopened this May 28, 2018
@Eh2406
Copy link
Contributor Author

Eh2406 commented May 29, 2018

One thing I'm curious about is how much the non-convergence is affecting the estimation results.

I believe the technically correct answer is "unspecified", as in scipy makes no warranties about what the value will be. It can return the initial point, we are seeing this in some of our data sets. It can return a point related to the stopping condition, in this case the last place tested to determine that our gradient is incorrect. Most perniciously it can return the correct answer, as in #211, witch makes it hard to notice that there is a problem.

Another thing to try could be to estimate some of the same models using PyLogit.

That does seem like a good test. Different code, different bugs. From a brief review of the PyLogit code, I don't see them testing the success flag, so thay are alsow not reporting on non-convergence bugs.

I'm going to take a look and see if I can get non-convergence with some Bay Area data.

Thank you. And thank you for taking this bug report seriously.

My attempt to work thru the calculus, will be coming in the next post.

@timothyb0912
Copy link

timothyb0912 commented May 29, 2018

@Eh2406 I don't test the success flag as I've found it to often be meaningless.

I've seen many examples where the gradient is essentially zero (e.g. all terms being on the order of 1e-07) but the success flag says that the optimizer did not converge. I've also seen cases where the optimizer converged, as in the change in the function evaluations was less than a small threshold, yet the gradient shows that the local optimum wasn't yet reached.

I've checked my gradients using automatic differentiation (which has higher accuracy than the finite difference approximations used in approx_fprime) and they've all been correct to every reported decimal place. You can also find the analytic derivations for the MNL's gradient and hessian here

If you do test against PyLogit, I'd suggest directly examining the reported gradient instead of paying attention to the success flag. The gradient can be viewed by accessing the gradient attribute on a fitted model object.

@Eh2406
Copy link
Contributor Author

Eh2406 commented May 29, 2018

I don't test the success flag as I've found it to often be meaningless.

Indeed, I stand corrected. My previous post is overstated. It is more accurate to say that most forms of non-convergence return a reasonable approximation to the correct result. My impression is that ABNORMAL_TERMINATION_IN_LNSRCH is not one of them.

I've checked my gradients using automatic differentiation

Automatic differentiation would be a much better test of the gradient implementation! It is immensely comforting to know you have tested PyLogit so thoroughly! Do you have advice on how to use AD to test the urbansim implementation?

You can also find the analytic derivations for the MNL's gradient

Thank you! I will go read that now! My aforementioned attempts to work thru the calculus on my own have not been going well.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Jun 4, 2018

Work got bizzy last week and I did not have time two dig in as much as I'd like, but hear is what I have found so far.

The example reproduces with only the first column of data.

The gradient calculation matches the one from @timothyb0912's paper. So maybe it has to do with the underflow checks, or something else that is not considered by that paper? Specifically log(prob) == -inf. @timothyb0912 how does PyLogit handle underflow?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants