Facility for model comparison #64

fgregg · 2015-05-15T17:13:02Z

Have current model output predictions in csv
Write naive model script that also outputs predictions in csv
Write script to intake prediction csvs and report model performance on test sample

When developing alternate model, this final script will facilitate evaluation of the models.

orborde · 2015-05-22T11:28:17Z

Looking at the code and whitepaper, it seems like the way you evaluated the model was by using the glm output to create an inspection "schedule" (a list of the order in which to conduct the inspections) and then analyzing how quickly the created schedule located the violations (as opposed to looking at the model confusion matrix or other traditional measures of model performance).

So an evaluation script should probably take the "schedule" as input and rerun the analyses in the white paper to compute some metrics. I'm planning on hacking one together today in the course of trying some other ML techniques on this dataset.

geneorama · 2015-06-15T18:19:17Z

@orborde you are exactly correct. The individual glm scores are used to sort the inspectors into a schedule, and that schedule is more important than the individual scores. I don't know of a way to directly optimize on the schedule performance. Hopefully optimizing the scores results in a better schedule.
(btw, thanks for introducing the word schedule. That's a useful addition to the vocabulary of this project.)

geneorama · 2015-06-15T18:44:25Z

Sorry that this has been taking so long, I've been busy with a few other things.

Here's an update on what I'm thinking for the plan:

Refactor the 30 script to only "run the model"; specifically:

Import pre-calculated features and raw data
Put the data into a form that works for the model
- Convert to proper class (e.g. matrix / numeric)
- Manage factors (currently with model.matrix)
Create test / train index
Run model
Calculate prediction (test and train)
Save prediction

The plots and benchmarks should go to another report / file, which will have a more clear comparison.

I was thinking it would be nice to make a demonstration "31" file that has an alternative model, and an accompanying report that compares the results between 30 and 31. That way someone could just pick up from there and have a facility for comparison.

For the 31 demonstration file I was thinking it would be nice to simply have the past "average" value. This would be similar to how the baselines look in Kaggle competitions. Rather than having a "submission" the user could compile results in the report. To guard against overfitting, we would check that the results make sense on even more recent data.

This might be a separate issue, but perhaps it would be nice to publish the 40 prediction scripts. The model uses the food inspection history, but the prediction uses current business licenses as the basis. Ultimately the logic in the prediction script would be important for testing on new samples, especially if this is going to be an ongoing evaluation. @tomschenkjr - you may have some thoughts on this?

geneorama · 2019-03-29T20:12:13Z

@orborde or @fgregg Do you have any recommendations for best practices for model comparison?

As I mentioned in @orborde 's pull request, the format of the food inspection data has changed dramatically as of last year, and there is a need to reconsider the model.

orborde · 2019-04-01T07:11:46Z

I don't have any "best practices" in mind offhand. I do think that generating an inspection schedule and simulating to see how quickly that schedule finds violations, or how efficiently (in terms of number of violations per inspection performed), is a solid approach.

Note that you'll need to be careful not to directly evaluate your inspection schedule on the data used to train the model generating that schedule. See https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets

Beyond that, I don't know enough about your problem space to give you more specific advice. Let me know what you wind up trying, though!

tomschenkjr assigned geneorama May 17, 2015

geneorama mentioned this issue Jun 11, 2015

Create a script to read a candidate inspection pattern from CSV and evaluate its performance #73

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Facility for model comparison #64

Facility for model comparison #64

fgregg commented May 15, 2015

orborde commented May 22, 2015

geneorama commented Jun 15, 2015

geneorama commented Jun 15, 2015

geneorama commented Mar 29, 2019

orborde commented Apr 1, 2019 •

edited

Loading

Facility for model comparison #64

Facility for model comparison #64

Comments

fgregg commented May 15, 2015

orborde commented May 22, 2015

geneorama commented Jun 15, 2015

geneorama commented Jun 15, 2015

geneorama commented Mar 29, 2019

orborde commented Apr 1, 2019 • edited Loading

orborde commented Apr 1, 2019 •

edited

Loading