Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xgboost RF bump for n=10M #14

Closed
szilard opened this issue May 16, 2015 · 4 comments
Closed

xgboost RF bump for n=10M #14

szilard opened this issue May 16, 2015 · 4 comments

Comments

@szilard
Copy link
Owner

szilard commented May 16, 2015

Moved "something weird happens for the largest data size (n=10M) - the trend for Run time and AUC "breaks", see figures main README" issue from #2 here.

@szilard
Copy link
Owner Author

szilard commented May 16, 2015

@tqchen says: "I now think the bump in running time was due to cache-line issues. As there are some non-consecutive going on xgboost. Having larger amount of rows could mean less cache hit rate, but the impact should not be large as this has things to do micro level optimization.

I have pushed some optimization to do prefetching, which should in general improve the speed of xgboost. Would be great if you want to run another round of test."

@tqchen
Copy link

tqchen commented May 16, 2015

Thanks, I have to note that the bump in trend is still likely to exist, but the impact should be limited due to the micro level thing I mentioned. Just that we know the cause of this phenomenon:)

@tqchen
Copy link

tqchen commented May 16, 2015

As for the AUC part, I find that at least in terms of boosting, seems treating all the dates and times as integer gives definitely better result.

@szilard
Copy link
Owner Author

szilard commented May 17, 2015

I think that's a reasonable explanation. I re-ran it and there was a significant improvement for n=10M (from 4800sec to 3000sec). The Time vs size curve is still convex though (see updated graphs in README), but your previous comments can be an explanation for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants