-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Working with larger, custom datasets #133
Comments
Hey @ajhepburn -- to help debug this, can you give me the following information: |
@jfkirk Hi, I've a feeling this may be on my end. I've switched to running your keras Book Crossing example, just as is and about 5 epochs in, Python kills my process. Tensorflow backend and GPU usage seems to be fine, no idea why there seems to be this memory leak, was wondering if you've came across anything similar. Here's what is dumped when I run it:
EDIT: I must've completely missed the As for the original issue, I think I wasn't forming my tensors correctly as I was trying to work with a dataset which did not have explicit ratings so I must've messed up somewhere. Going to try your keras implementation and hopefully won't run into many issues. |
Spoke too soon, I trained with a
This is still with the Book Crossing dataset. Also using a GeForce 1060, 6GB GPU |
I am currently using TensorRec for my masters project, and have been following the MovieLens guide on getting started with the library.
My dataset is a CSV file in which each row represents a tweet made by a user and potentially useful item metadata for a content-based system. The issue is that a single week's worth of data consists of roughly 48,000 entries for which there is a one-to-one interaction between a tweet's author and the tweet itself.
I initially trained the model on a month's worth of data which caused Python to crash with an out of memory error as I am running this on a machine with 16GB RAM. I narrowed this down to a week for the purposes of building a baseline. I managed to train a CF model, with the poor results that I had expected due to the lack of interactions. In order to enrich precision and recall evaluation scores, I tried to add item metadata and make use of scikit's MultiLabelBinarizer as the guide describes but this seems to crash when I train the model.
I was wondering if there were any optimisation methods to prevent this from crashing or if it were possible to use a Iterable to fit the models instead of storing it in memory? I can't afford to shrink the dataset any further without losing valuable information.
These are the errors I get when running any model of a larger size.
The text was updated successfully, but these errors were encountered: