Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to unmarshal embedded xhtml #116

Open
simar7 opened this issue Jul 30, 2020 · 3 comments
Open

Unable to unmarshal embedded xhtml #116

simar7 opened this issue Jul 30, 2020 · 3 comments

Comments

@simar7
Copy link

simar7 commented Jul 30, 2020

hi! – first off, thanks for the amazing software. It really does help a lot!

I just had a quick question about an XML I was trying to unmarshal using the generated structs.

Most of the document gets parsed just fine expect the embedded XHTML elements. One example would be something like the following:

<Extended_Description>
    <xhtml:p>Such a scenario is commonly observed when:</xhtml:p>
    <xhtml:ol>
        <xhtml:li>A web application authenticates a user without first invalidating the existing session, thereby continuing to use the session already associated with the user.</xhtml:li>
        <xhtml:li>An attacker is able to force a known session identifier on a user so that, once the user authenticates, the attacker has access to the authenticated session.</xhtml:li>
        <xhtml:li>The application or container uses predictable session identifiers. In the generic exploit of session fixation vulnerabilities, an attacker creates a new session on a web application and records the associated session identifier. The attacker then causes the victim to associate, and possibly authenticate, against the server using that session identifier, giving the attacker access to the user's account through the active session.</xhtml:li>
    </xhtml:ol>
</Extended_Description>

In this case the output after the unmarshal is only the items with the <p> tag. The items with <ol>,<li> don't show up.

Here's the schema if it would help: https://cwe.mitre.org/data/xsd/cwe_schema_latest.xsd

Let me know if you need anything in terms of information or if you'd like me to try something. If you could point out the relevant code that is involved in generating this I'd be happy to look through and submit a PR if I managed to solve it.

@simar7
Copy link
Author

simar7 commented Jul 30, 2020

Some more info: These are the generated structs from the schema.

// The StructuredTextType complex type is used to allow XHTML content embedded within standard string data. Some common elements are: <BR/> to insert a line break, <UL><LI/></UL> to create a bulleted list, <OL><LI/></OL> to create a numbered list, and <DIV style="margin-left: 40px"></DIV> to create a new indented section.
type StructuredTextType []string

func (a StructuredTextType) MarshalXML(e *xml.Encoder, start xml.StartElement) error {
	var output struct {
		ArrayType string   `xml:"http://schemas.xmlsoap.org/wsdl/ arrayType,attr"`
		Items     []string `xml:" item"`
	}
	output.Items = []string(a)
	start.Attr = append(start.Attr, xml.Attr{Name: xml.Name{Space: " ", Local: "xmlns:ns1"}, Value: "http://www.w3.org/2001/XMLSchema"})
	output.ArrayType = "ns1:anyType[]"
	return e.EncodeElement(&output, start)
}
func (a *StructuredTextType) UnmarshalXML(d *xml.Decoder, start xml.StartElement) (err error) {
	var tok xml.Token
	for tok, err = d.Token(); err == nil; tok, err = d.Token() {
		if tok, ok := tok.(xml.StartElement); ok {
			var item string
			if err = d.DecodeElement(&item, &tok); err == nil {
				*a = append(*a, item)
			}
		}
		if _, ok := tok.(xml.EndElement); ok {
			break
		}
	}
	return err
}

@droyo
Copy link
Owner

droyo commented Jul 31, 2020

Thanks for the report.

The StructuredTextType type from your schema has the mixed=true attribute, so xsdgen should be generating a struct with a ,chardata field instead of what you got. Something like

type StructuredTextType struct {
    Value string `xml:",chardata"`
}

I'm not sure why that's not happening. My guess is that the SOAPArrayAsSlice pass is accidentally removing the mixed content model. What happens if you don't include this optimization? You can see the "Customizing the behavior of xsdgen" section of https://blog.aqwari.net/xml-schema-go/ for instructions on how to choose what optimizations are included.

If that's the issue, we could add a check to this optimization for a mixed content model.

@simar7
Copy link
Author

simar7 commented Oct 10, 2020

hi @droyo – So I gave that a try. This was my generator program:

package main

import (
	"log"
	"os"

	"aqwari.net/xml/xsdgen"
)

func main() {
	var cfg xsdgen.Config
	cfg.Option(
		xsdgen.LogOutput(log.New(os.Stderr, "", 0)),
		xsdgen.IgnoreElements("virtual", "sequence"))

	cfg.Option([]xsdgen.Option{
		xsdgen.IgnoreAttributes("id", "href", "ref", "offset"),
		xsdgen.Replace(`[._ \s-]`, ""),
		xsdgen.PackageName("ws"),
		xsdgen.HandleSOAPArrayType(),
		xsdgen.UseFieldNames(),
	}...)

	if err := cfg.GenCLI(os.Args[1:]...); err != nil {
		log.Fatal(err)
	}
}

But the result I got for StructuredTextType was the following:

type StructuredTextType struct {
	Items []string `xml:",any"`
}

With no corresponding Marshal or Unmarshal functions.

Is this expected? What could I look into next? Let me know if there's anything else I can try here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants