Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

You've found a bug in xml-rs, caused by calls to push_pos() #227

Closed
acuifex opened this issue Jul 13, 2023 · 3 comments
Closed

You've found a bug in xml-rs, caused by calls to push_pos() #227

acuifex opened this issue Jul 13, 2023 · 3 comments

Comments

@acuifex
Copy link

acuifex commented Jul 13, 2023

Hi!

I'm doing what the message told me.
Note that I'm not a rust dev and i have no idea what I'm doing.

I'm messing around with this project: https://github.com/NeKzor/lp
serde-xml-rs has a similar issue: RReverser/serde-xml-rs#205

thread 'main' panicked at 'You've found a bug in xml-rs, caused by calls to push_pos() in states that don't end up emitting events.
            This case is ignored in release mode, and merely causes document positions to be out of sync.
            Please file a bug and include the XML document that triggers this assert.'

Package versions pulled from Cargo.lock:
serde-xml-rs: 0.6.0
xml-rs: 0.8.15

url from where the xml was pulled
xml file itself

@NeKzor
Copy link

NeKzor commented Jul 14, 2023

This is an interesting bug which I did not encounter two years ago.

Here's a minimal reproducible example:

use std::io::BufReader;

use xml::{
    reader::{EventReader, XmlEvent},
    ParserConfig,
};

fn main() -> std::io::Result<()> {
    let data = r#"<root>
    <item><![CDATA[]]></item>
    <item><![CDATA[]]></item>
    <item><![CDATA[]]></item>
    <item><![CDATA[]]></item>
    <item><![CDATA[]]></item>
    <item><![CDATA[]]></item>
    <item><![CDATA[]]></item>
    <item><![CDATA[]]></item>
    <item><![CDATA[]]></item>
    <item><![CDATA[]]></item>
    <item><![CDATA[]]></item>
    <item><![CDATA[]]></item>
    <item><![CDATA[]]></item>
    <item><![CDATA[]]></item>
</root>"#;

    let config = ParserConfig::new().cdata_to_characters(true);

    let reader = BufReader::new(data.as_bytes());
    let parser = EventReader::new_with_config(reader, config);

    let mut depth = 0;
    for e in parser {
        match e {
            Ok(XmlEvent::StartElement { name, .. }) => {
                println!("{:spaces$}+{name}", "", spaces = depth * 2);
                depth += 1;
            }
            Ok(XmlEvent::EndElement { name }) => {
                depth -= 1;
                println!("{:spaces$}-{name}", "", spaces = depth * 2);
            }
            Err(e) => {
                eprintln!("Error: {e}");
                break;
            }
            _ => {}
        }
    }

    Ok(())
}

Turns out that the panic occurs after exactly 14 empty CDATA values with cdata_to_characters enabled. I really wonder why that is...

@NeKzor
Copy link

NeKzor commented Jul 14, 2023

I bisected now all older versions where it still worked and found the commit that introduced the regression: 444a7c2

Not sure what this change does but the hard-coded 16 value there is very close to the magic number 14 which makes the event reader panic.

@kornelski
Copy link
Collaborator

kornelski commented Jul 19, 2023

This issue has always been happening, but instead of being caught, it was silently causing incorrect line numbers and ever-increasing memory usage.

The max depth for push_pos should be finite, because it's a stack of XML syntax constructs being parsed (like element has an attribute, attribute has an entity, and then the XML syntax doesn't go any more detailed than that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants