FleXML: A Flexible XML Parser for Go

FleXML parses incomplete or invalid XML data. Unlike standard XML parsers that require valid, complete XML input, FleXML gracefully handles partial XML fragments and malformed documents, extracting as much structured data as possible.

🌟 Features

Partial XML Parsing: Extract data from incomplete XML fragments
- <key>Hello → Element with text "Hello"
- <response><message> → Nested element structure
Invalid XML Handling: Process XML with mixed content and other issues
- Handles text mixed with elements
- Supports malformed or unquoted attributes
Flexible Document Querying: Navigate and extract data easily
- Find elements by name anywhere in the document
- Extract text content and attribute values
Streaming XML Processing: Process XML data in chunks or as a stream
- Parse large XML files without loading them entirely into memory
- Process XML as it arrives, ideal for network streams
- Event-based API for efficient processing
Resilient Parsing: Recovers gracefully from unexpected input
- No panic on malformed input
- Extracts maximum valid data even from corrupted XML
Zero Dependencies: Pure Go implementation with no external dependencies

📦 Installation

go get github.com/jpoz/flexml

🚀 Quick Start

Parsing Valid XML

package main

import (
    "fmt"
    "github.com/jpoz/flexml"
)

func main() {
    // Parse complete XML
    xmlStr := `<response><message>Greetings</message></response>`
    
    doc, err := flexml.Parse(xmlStr)
    if err != nil {
        fmt.Printf("Error: %v\n", err)
        return
    }
    
    nodes, ok := doc.DeepFind("message")
    if ok {
        fmt.Printf("Message content: %s\n", nodes[0].GetText())
        // Output: Message content: Greetings
    }
}

Handling Partial XML

package main

import (
    "fmt"
    "github.com/jpoz/flexml"
)

func main() {
    // Example partial XML fragments
    partialXMLs := []string{
        `<key>Hello`,
        `<response><user id="123"`,
        `<data>Value</data></response>`,
    }

    // Process each fragment
    for _, xml := range partialXMLs {
        fmt.Printf("Processing: %s\n", xml)
        
        doc, err := flexml.Parse(xml)
        if err != nil {
            fmt.Printf("Notice (parsing continues): %v\n", err)
        }

        // Even with errors, we can still extract data
        fmt.Printf("Document structure: %s\n", doc.String())
    }
}

Streaming XML Processing

package main

import (
    "fmt"
    "strings"
    "github.com/jpoz/flexml"
)

func main() {
    // Create a stream from a reader
    xmlData := `<users>
        <user id="1">
            <name>Alice</name>
            <email>[email protected]</email>
        </user>
        <user id="2">
            <name>Bob</name>
            <email>[email protected]</email>
        </user>
    </users>`
    
    reader := strings.NewReader(xmlData)
    stream, err := flexml.ParseStream(reader)
    if err != nil {
        fmt.Printf("Error: %v\n", err)
        return
    }
    
    // Process events as they are generated
    for stream.Next() {
        event := stream.Event()
        
        switch event.Type {
        case flexml.StartElement:
            fmt.Printf("Element start: %s\n", event.Name)
            if event.Name == "user" {
                if id, ok := event.Attributes["id"]; ok {
                    fmt.Printf("  User ID: %s\n", id)
                }
            }
        case flexml.EndElement:
            fmt.Printf("Element end: %s\n", event.Name)
        case flexml.Text:
            if event.Text != "" {
                fmt.Printf("Text: %s\n", event.Text)
            }
        }
    }
    
    if stream.Err() != nil {
        fmt.Printf("Stream error: %v\n", stream.Err())
    }
}

Reading Complete XML Nodes from Stream

package main

import (
    "fmt"
    "io"
    "os"
    "github.com/jpoz/flexml"
)

func main() {
    // Open a large XML file
    file, err := os.Open("large.xml")
    if err != nil {
        fmt.Printf("Error opening file: %v\n", err)
        return
    }
    defer file.Close()
    
    // Create a node reader
    reader := flexml.NewElementStreamReader(file)
    
    // Read and process complete nodes one at a time
    for {
        node, err := reader.ReadNode()
        if err == io.EOF {
            break
        }
        if err != nil {
            fmt.Printf("Error: %v\n", err)
            continue
        }
        
        // Process the node
        if node.Type == flexml.ElementNode && node.Name == "item" {
            fmt.Printf("Found item: %s\n", node.GetText())
            
            // Check attributes
            if id, ok := node.GetAttribute("id"); ok {
                fmt.Printf("  ID: %s\n", id)
            }
        }
    }
}

Processing LLM Reasoning with `<think>` Tags

LLMs (Large Language Models) often use XML-like tags to structure their reasoning process. FleXML is perfect for parsing this output, extracting both the reasoning steps and the final answer:

package main

import (
    "fmt"
    "strings"
    "github.com/jpoz/flexml"
)

func main() {
    // Example LLM output with reasoning in <think> tags
    llmOutput := `<think>
To solve this problem, I'll need to find the sum of the first 100 positive integers.

I can use the formula: sum = n(n+1)/2
Where n is the number of integers we're adding.

For n = 100:
sum = 100(100+1)/2
sum = 100(101)/2
sum = 10100/2
sum = 5050
</think>

<answer>The sum of the first 100 positive integers is 5050.

I can find this using the formula sum = n(n+1)/2, where n is the number of integers.
For n = 100: sum = 100(100+1)/2 = 100(101)/2 = 10100/2 = 5050</answer>`

    // Parse the LLM output
    doc, err := flexml.Parse(llmOutput)
    if err != nil {
        fmt.Printf("Error: %v\n", err)
        return
    }
    
    // Extract the thinking process
    thinkNode, found := doc.FindOne("think")
    if found {
        thinking := strings.TrimSpace(thinkNode.GetText())
        fmt.Println("=== LLM Reasoning Process ===")
        fmt.Println(thinking)
        fmt.Println("============================")
        fmt.Println()
    }
    
    // Extract the answer
    answerNode, found := doc.FindOne("answer")
    if found {
        answer := strings.TrimSpace(answerNode.GetText())
        fmt.Println("=== Final Answer ===")
        fmt.Println(answer)
        fmt.Println("===================")
    }
    
    // You can also process the reasoning steps for additional analysis
    // For example, to extract mathematical calculations, specific reasoning steps, etc.
}

This example shows how you can use FleXML to:

Parse LLM outputs that use XML-like tags to structure their thinking
Extract and separate the reasoning process from the final answer
Process incomplete or malformed XML that might be generated by LLMs

This is useful for applications that want to expose the LLM's reasoning process for review, educational purposes, or to provide transparency in AI decision-making.

You can also use the streaming API for processing LLM outputs as they arrive:

package main

import (
    "fmt"
    "io"
    "net/http"
    "strings"
    "github.com/jpoz/flexml"
)

func main() {
    // Simulate an API request receiving LLM output in chunks
    // In a real application, this would be replaced with an actual HTTP request
    // to an LLM API endpoint with streaming enabled
    llmOutputChunks := []string{
        "<th",
        "ink>\nLet me reason through this step by step.\n\nTo find",
        " the derivative of f(x) = x^2 * sin(x), I'll use the product rule:\n",
        "(uv)' = u'v + uv'\n\nLet u = x^2 and v = sin(x)\n",
        "Then u' = 2x and v' = cos(x)\n\n",
        "Applying the product rule:\nf'(x) = (2x)(sin(x)) + (x^2)(cos(x))",
        "\n= 2x*sin(x) + x^2*cos(x)",
        "</think>\n\n<answer>",
        "The derivative of f(x) = x^2 * sin(x) is:\n\nf'(x) = 2x*sin(x) + x^2*cos(x)",
        "</answer>"
    }
    
    // Create a stream
    stream := flexml.NewStream()
    
    // Variables to track state
    inThinking := false
    inAnswer := false
    thinking := ""
    answer := ""
    
    // Process the chunks as they arrive
    for _, chunk := range llmOutputChunks {
        // Add the chunk to the stream
        stream.AddData([]byte(chunk))
        
        // Process events from the stream
        for stream.Next() {
            event := stream.Event()
            
            switch event.Type {
            case flexml.StartElement:
                if event.Name == "think" {
                    inThinking = true
                    inAnswer = false
                    fmt.Println("Started receiving thinking process...")
                } else if event.Name == "answer" {
                    inThinking = false
                    inAnswer = true
                    fmt.Println("Started receiving answer...")
                }
                
            case flexml.EndElement:
                if event.Name == "think" {
                    inThinking = false
                    fmt.Println("Completed thinking section.")
                } else if event.Name == "answer" {
                    inAnswer = false
                    fmt.Println("Completed answer section.")
                }
                
            case flexml.Text:
                if inThinking {
                    thinking += event.Text
                    fmt.Printf("Thinking (partial): %s\n", strings.TrimSpace(event.Text))
                } else if inAnswer {
                    answer += event.Text
                    fmt.Printf("Answer (partial): %s\n", strings.TrimSpace(event.Text))
                }
            }
        }
    }
    
    // Signal that we're done adding data
    stream.EOF()
    
    // Process any remaining events
    for stream.Next() {
        // Similar event processing as above
    }
    
    // Final output
    fmt.Println("\n=== Final Thinking Process ===")
    fmt.Println(strings.TrimSpace(thinking))
    fmt.Println("\n=== Final Answer ===")
    fmt.Println(strings.TrimSpace(answer))
}

This streaming approach is particularly useful for:

Processing real-time LLM outputs as they're generated
Providing immediate feedback to users while the LLM is still thinking
Working with very large outputs without waiting for the complete response

⚙️ How It Works

FleXML uses a custom parsing algorithm designed to be forgiving while still providing a useful document structure:

Single-Pass Parser: Processes the input in a single pass for efficiency
Node Hierarchy: Builds a tree of nodes (elements, text, comments)
Automatic Recovery: Detects and handles common XML errors

The library intelligently handles problematic input by:

Treating unclosed tags as valid elements
Supporting text outside of elements
Processing malformed attributes

🔍 API Reference

Document

Parse(xml string) (*Document, error) - Parses an XML string into a Document
DeepFind(name string) ([]*Node, bool) - Searches for nodes with the given name recursively
FindOne(name string) (*Node, bool) - Finds the first node with the given name
String() string - Returns a string representation of the document

Node

GetAttribute(name string) (string, bool) - Returns the value of an attribute
GetText() string - Returns the text content of a node
Type - The type of node (ElementNode, TextNode, CommentNode, ProcessingInstructionNode)

Streaming

ParseStream(r io.Reader) (*Stream, error) - Creates a stream parser from an io.Reader
NewStream() - Creates a new XML stream parser
Stream.AddData(data []byte) - Adds more data to the stream parser
Stream.Next() bool - Advances to the next event, returns false when done
Stream.Event() *Event - Returns the current event
Stream.Err() error - Returns any error that occurred during parsing

Node Streaming

NewElementStreamReader(r io.Reader) *ElementStreamReader - Creates a reader for XML stream events
ElementStreamReader.ReadNode() (*Node, error) - Reads the next complete XML node
ParseReader(r io.Reader) (*StreamDocument, error) - Parses XML from an io.Reader into a StreamDocument
StreamDocument.DeepFind(name string) ([]*Node, bool) - Searches for nodes in the streamed document
StreamDocument.FindOne(name string) (*Node, bool) - Finds the first matching node in the streamed document

🧪 Testing

The library includes comprehensive test coverage for both valid and invalid XML parsing:

make test

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
stream.go		stream.go
stream_test.go		stream_test.go
xml.go		xml.go
xml_test.go		xml_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FleXML: A Flexible XML Parser for Go

🌟 Features

📦 Installation

🚀 Quick Start

Parsing Valid XML

Handling Partial XML

Streaming XML Processing

Reading Complete XML Nodes from Stream

Processing LLM Reasoning with `<think>` Tags

⚙️ How It Works

🔍 API Reference

Document

Node

Streaming

Node Streaming

🧪 Testing

About

Releases

Packages

Languages

License

jpoz/flexml

Folders and files

Latest commit

History

Repository files navigation

FleXML: A Flexible XML Parser for Go

🌟 Features

📦 Installation

🚀 Quick Start

Parsing Valid XML

Handling Partial XML

Streaming XML Processing

Reading Complete XML Nodes from Stream

Processing LLM Reasoning with <think> Tags

⚙️ How It Works

🔍 API Reference

Document

Node

Streaming

Node Streaming

🧪 Testing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Processing LLM Reasoning with `<think>` Tags

Packages