-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
It's keeping all processed lines on memory making useless for processing big files #286
Comments
Hi, var csv=require("csvtojson");
console.log("pid",process.pid);
csv().fromFile("./1.csv")
.subscribe(function(data){
return new Promise(function(resolve,reject){
setTimeout(function(){
resolve();
},0);
})
})
.on("done",function(){
console.log(process.memoryUsage());
}) the The var csv=require("csvtojson");
console.log("pid",process.pid);
csv().fromFile("./1.csv")
.subscribe(function(data){
return new Promise(function(resolve,reject){
setTimeout(function(){
resolve();
},0);
})
})
.then(function(){ // this will implicitly tell parser that user want all json objects in a final array.
console.log(process.memoryUsage());
}) This will keep result in memory.. Please let me know if this is what you have experienced? Thanks. |
In my case I need call .then because I need to wait for the file been processed to do more things. maybe will be great pass this as an option |
added parameter "needEmitAll" |
Thanks, I think I may have ran into this same issue. |
Hi all, thanks for documenting this issue. I'm having a related memory problem, not sure if it's due to the library or how I've structured my code, but I'm hoping someone might be able to point me in the right direction. I require the complete JSON result to be built as I need need to do further processing afterwards, so this means depending on the csv file I could be consuming a lot of memory. const content = await csv().fromString(csvString) The problem I'm having is when the function finishes executing the garbage collector does not seem to be collecting the array of json objects once the function finishes or goes out of scope, which is what I'd expect to happen (eventually). Even forcing a collection with global.gc(); doesn't work. Is there a possibility something in the csvtojson library is maintaining references to elements in the JSON array preventing collection? |
I'm processing big files more than 800M, 3.7M lines and 10x bigger that
doing
works fine but I'm noticing that internally the library is keeping for every parsed line, two things first the parsed line (raw string) and also some kind of object
by looking the code of the library I found (i dunno if it's the right place) that there's a Result.needEmitAll getter based on some calculations to see if it's false/true.
Could you please explain how i can force this to be false
When I force false (by changing the source code on node_modules) by the same iterations the change is notable
The text was updated successfully, but these errors were encountered: