Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Issue #69

Open
saicharishmavalluri opened this issue Oct 22, 2021 · 3 comments
Open

Memory Issue #69

saicharishmavalluri opened this issue Oct 22, 2021 · 3 comments

Comments

@saicharishmavalluri
Copy link

Hello @guxd,
I tried downloading the real dataset from google drive and training the model for 2 epochs. It worked fine for it. When I am trying code embedding with the last epoch as an optimal checkpoint, the cell is getting terminated after running for some time. When I searched in google they said it might be because of a RAM issue and suggested upgrading the RAM.
Is there any other way around that could work, like decreasing the batch_size or chunk_size or any other parameter?
(currently 'batch_size': 100,'chunk_size':100000 )

Update: I tried decreasing the batch_sizes to 100, 64, but still I am facing the same issue.

codeembed_error

@guxd
Copy link
Owner

guxd commented Oct 22, 2021

How about reducing chunk_size? You can track the variable vecs and check whether it is allocated with memory after calling vecs = [].
You can also try to use a small codebase given you have limited memory.

@saicharishmavalluri
Copy link
Author

How about reducing chunk_size? You can track the variable vecs and check whether it is allocated with memory after calling vecs = [].
You can also try to use a small codebase given you have limited memory.

@guxd
When you say small codebase, does that mean using a dummy dataset instead of the real dataset?
Also, during preprocessing step, how did you extract the <method name, API sequence, tokens, description> tuples from the java code snippets?

@guxd
Copy link
Owner

guxd commented Oct 24, 2021

I mean using a subset of the use.XXX.h5 from Google drive. For example, using only 1 million code snippets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants