Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak and incorrect behavior of stop/resume #200

Open
georgyfarniev opened this issue Jun 15, 2019 · 3 comments
Open

Memory leak and incorrect behavior of stop/resume #200

georgyfarniev opened this issue Jun 15, 2019 · 3 comments

Comments

@georgyfarniev
Copy link

Hello.

Thanks for creating this parser, it works fast and has nice documentation, really wonderful job. Unfortunately, I ran into issues with pausing stream. I need it for writing record to database before continuing.

I noticed incorrect behavior when I call stop/resume in closing tag handler, and I got out of memory crash on big xml file, but unfortunately I cannot provide this file, as it contains corporate information.

So, to reproduce, use scripts below. First script will generate large XML file, and second will show error. While there's 100000 records in sample.xml, parser will show less closed tags (in my case, 3274). If I remove stop and resume calls, it shows 100000 as it should.

On another file (which I cannot provide unfortunately) it causes out of memory crash. I will try to create auto-generated file which can reproduce this issue.

Sample generator:

const fs = require('fs')
const stream = require('stream')

const SAMPLE_RECORDS_COUNT = 100000
const SAMPLE_PATH = './sample.xml'

class DataStream extends stream.Readable {
  constructor() {
    super()
    this.count = 0
  }

  _read() {
    if (this.count === 0) {
      this.push('<?xml version="1.0" encoding="UTF-8"?><root>')
      this.count++
      return;
    }


    this.count++

    if (this.count > SAMPLE_RECORDS_COUNT) {
      this.push('</root>')
      this.push(null)
    } else {
      this.push('<child>text</child>\n')
    }

  }
}

const ws = fs.createWriteStream(SAMPLE_PATH)
const ds = new DataStream()

ds.pipe(ws).on('close', () => {
  console.log('done')
  process.exit()
})

Reproduce:

const fs = require('fs')
const expat = require('node-expat')

console.log('started')

const parser = new expat.Parser('UTF-8')
const rs = fs.createReadStream('./sample.xml')

parser.on('startElement', (elt, attrs) => {
})

let count = 0
parser.on('endElement', async (elt, attrs) => {
  count++
  parser.stop()
  parser.resume()
})

parser.on('error', console.error)
rs.on('end', () => console.log(count))

rs.pipe(parser)
@DanielDornhardt
Copy link

Having the same issue... is there maybe some idea for a workaround?

@georgyfarniev
Copy link
Author

Having the same issue... is there maybe some idea for a workaround?

@DanielDornhardt, you have to call resume after next tick. E.g.:

parser.on('endElement', async (elt, attrs) => {
  count++
  parser.stop()

  executeSomePromise().then(parser.resume)
 // or
 nextTick(parser.resume)
})

It's not well documented, but I found it.

@DanielDornhardt
Copy link

Hi @georgyfarniev , thank you for helping us out... I'm not sure if it'll work out yet but we're working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants