FleXML parses incomplete or invalid XML data. Unlike standard XML parsers that require valid, complete XML input, FleXML gracefully handles partial XML fragments and malformed documents, extracting as much structured data as possible.
-
Partial XML Parsing: Extract data from incomplete XML fragments
<key>Hello
→ Element with text "Hello"<response><message>
→ Nested element structure
-
Invalid XML Handling: Process XML with mixed content and other issues
- Handles text mixed with elements
- Supports malformed or unquoted attributes
-
Flexible Document Querying: Navigate and extract data easily
- Find elements by name anywhere in the document
- Extract text content and attribute values
-
Streaming XML Processing: Process XML data in chunks or as a stream
- Parse large XML files without loading them entirely into memory
- Process XML as it arrives, ideal for network streams
- Event-based API for efficient processing
-
Resilient Parsing: Recovers gracefully from unexpected input
- No panic on malformed input
- Extracts maximum valid data even from corrupted XML
-
Zero Dependencies: Pure Go implementation with no external dependencies
go get github.com/jpoz/flexml
package main
import (
"fmt"
"github.com/jpoz/flexml"
)
func main() {
// Parse complete XML
xmlStr := `<response><message>Greetings</message></response>`
doc, err := flexml.Parse(xmlStr)
if err != nil {
fmt.Printf("Error: %v\n", err)
return
}
nodes, ok := doc.DeepFind("message")
if ok {
fmt.Printf("Message content: %s\n", nodes[0].GetText())
// Output: Message content: Greetings
}
}
package main
import (
"fmt"
"github.com/jpoz/flexml"
)
func main() {
// Example partial XML fragments
partialXMLs := []string{
`<key>Hello`,
`<response><user id="123"`,
`<data>Value</data></response>`,
}
// Process each fragment
for _, xml := range partialXMLs {
fmt.Printf("Processing: %s\n", xml)
doc, err := flexml.Parse(xml)
if err != nil {
fmt.Printf("Notice (parsing continues): %v\n", err)
}
// Even with errors, we can still extract data
fmt.Printf("Document structure: %s\n", doc.String())
}
}
package main
import (
"fmt"
"strings"
"github.com/jpoz/flexml"
)
func main() {
// Create a stream from a reader
xmlData := `<users>
<user id="1">
<name>Alice</name>
<email>[email protected]</email>
</user>
<user id="2">
<name>Bob</name>
<email>[email protected]</email>
</user>
</users>`
reader := strings.NewReader(xmlData)
stream, err := flexml.ParseStream(reader)
if err != nil {
fmt.Printf("Error: %v\n", err)
return
}
// Process events as they are generated
for stream.Next() {
event := stream.Event()
switch event.Type {
case flexml.StartElement:
fmt.Printf("Element start: %s\n", event.Name)
if event.Name == "user" {
if id, ok := event.Attributes["id"]; ok {
fmt.Printf(" User ID: %s\n", id)
}
}
case flexml.EndElement:
fmt.Printf("Element end: %s\n", event.Name)
case flexml.Text:
if event.Text != "" {
fmt.Printf("Text: %s\n", event.Text)
}
}
}
if stream.Err() != nil {
fmt.Printf("Stream error: %v\n", stream.Err())
}
}
package main
import (
"fmt"
"io"
"os"
"github.com/jpoz/flexml"
)
func main() {
// Open a large XML file
file, err := os.Open("large.xml")
if err != nil {
fmt.Printf("Error opening file: %v\n", err)
return
}
defer file.Close()
// Create a node reader
reader := flexml.NewElementStreamReader(file)
// Read and process complete nodes one at a time
for {
node, err := reader.ReadNode()
if err == io.EOF {
break
}
if err != nil {
fmt.Printf("Error: %v\n", err)
continue
}
// Process the node
if node.Type == flexml.ElementNode && node.Name == "item" {
fmt.Printf("Found item: %s\n", node.GetText())
// Check attributes
if id, ok := node.GetAttribute("id"); ok {
fmt.Printf(" ID: %s\n", id)
}
}
}
}
LLMs (Large Language Models) often use XML-like tags to structure their reasoning process. FleXML is perfect for parsing this output, extracting both the reasoning steps and the final answer:
package main
import (
"fmt"
"strings"
"github.com/jpoz/flexml"
)
func main() {
// Example LLM output with reasoning in <think> tags
llmOutput := `<think>
To solve this problem, I'll need to find the sum of the first 100 positive integers.
I can use the formula: sum = n(n+1)/2
Where n is the number of integers we're adding.
For n = 100:
sum = 100(100+1)/2
sum = 100(101)/2
sum = 10100/2
sum = 5050
</think>
<answer>The sum of the first 100 positive integers is 5050.
I can find this using the formula sum = n(n+1)/2, where n is the number of integers.
For n = 100: sum = 100(100+1)/2 = 100(101)/2 = 10100/2 = 5050</answer>`
// Parse the LLM output
doc, err := flexml.Parse(llmOutput)
if err != nil {
fmt.Printf("Error: %v\n", err)
return
}
// Extract the thinking process
thinkNode, found := doc.FindOne("think")
if found {
thinking := strings.TrimSpace(thinkNode.GetText())
fmt.Println("=== LLM Reasoning Process ===")
fmt.Println(thinking)
fmt.Println("============================")
fmt.Println()
}
// Extract the answer
answerNode, found := doc.FindOne("answer")
if found {
answer := strings.TrimSpace(answerNode.GetText())
fmt.Println("=== Final Answer ===")
fmt.Println(answer)
fmt.Println("===================")
}
// You can also process the reasoning steps for additional analysis
// For example, to extract mathematical calculations, specific reasoning steps, etc.
}
This example shows how you can use FleXML to:
- Parse LLM outputs that use XML-like tags to structure their thinking
- Extract and separate the reasoning process from the final answer
- Process incomplete or malformed XML that might be generated by LLMs
This is useful for applications that want to expose the LLM's reasoning process for review, educational purposes, or to provide transparency in AI decision-making.
You can also use the streaming API for processing LLM outputs as they arrive:
package main
import (
"fmt"
"io"
"net/http"
"strings"
"github.com/jpoz/flexml"
)
func main() {
// Simulate an API request receiving LLM output in chunks
// In a real application, this would be replaced with an actual HTTP request
// to an LLM API endpoint with streaming enabled
llmOutputChunks := []string{
"<th",
"ink>\nLet me reason through this step by step.\n\nTo find",
" the derivative of f(x) = x^2 * sin(x), I'll use the product rule:\n",
"(uv)' = u'v + uv'\n\nLet u = x^2 and v = sin(x)\n",
"Then u' = 2x and v' = cos(x)\n\n",
"Applying the product rule:\nf'(x) = (2x)(sin(x)) + (x^2)(cos(x))",
"\n= 2x*sin(x) + x^2*cos(x)",
"</think>\n\n<answer>",
"The derivative of f(x) = x^2 * sin(x) is:\n\nf'(x) = 2x*sin(x) + x^2*cos(x)",
"</answer>"
}
// Create a stream
stream := flexml.NewStream()
// Variables to track state
inThinking := false
inAnswer := false
thinking := ""
answer := ""
// Process the chunks as they arrive
for _, chunk := range llmOutputChunks {
// Add the chunk to the stream
stream.AddData([]byte(chunk))
// Process events from the stream
for stream.Next() {
event := stream.Event()
switch event.Type {
case flexml.StartElement:
if event.Name == "think" {
inThinking = true
inAnswer = false
fmt.Println("Started receiving thinking process...")
} else if event.Name == "answer" {
inThinking = false
inAnswer = true
fmt.Println("Started receiving answer...")
}
case flexml.EndElement:
if event.Name == "think" {
inThinking = false
fmt.Println("Completed thinking section.")
} else if event.Name == "answer" {
inAnswer = false
fmt.Println("Completed answer section.")
}
case flexml.Text:
if inThinking {
thinking += event.Text
fmt.Printf("Thinking (partial): %s\n", strings.TrimSpace(event.Text))
} else if inAnswer {
answer += event.Text
fmt.Printf("Answer (partial): %s\n", strings.TrimSpace(event.Text))
}
}
}
}
// Signal that we're done adding data
stream.EOF()
// Process any remaining events
for stream.Next() {
// Similar event processing as above
}
// Final output
fmt.Println("\n=== Final Thinking Process ===")
fmt.Println(strings.TrimSpace(thinking))
fmt.Println("\n=== Final Answer ===")
fmt.Println(strings.TrimSpace(answer))
}
This streaming approach is particularly useful for:
- Processing real-time LLM outputs as they're generated
- Providing immediate feedback to users while the LLM is still thinking
- Working with very large outputs without waiting for the complete response
FleXML uses a custom parsing algorithm designed to be forgiving while still providing a useful document structure:
- Single-Pass Parser: Processes the input in a single pass for efficiency
- Node Hierarchy: Builds a tree of nodes (elements, text, comments)
- Automatic Recovery: Detects and handles common XML errors
The library intelligently handles problematic input by:
- Treating unclosed tags as valid elements
- Supporting text outside of elements
- Processing malformed attributes
Parse(xml string) (*Document, error)
- Parses an XML string into a DocumentDeepFind(name string) ([]*Node, bool)
- Searches for nodes with the given name recursivelyFindOne(name string) (*Node, bool)
- Finds the first node with the given nameString() string
- Returns a string representation of the document
GetAttribute(name string) (string, bool)
- Returns the value of an attributeGetText() string
- Returns the text content of a nodeType
- The type of node (ElementNode, TextNode, CommentNode, ProcessingInstructionNode)
ParseStream(r io.Reader) (*Stream, error)
- Creates a stream parser from an io.ReaderNewStream()
- Creates a new XML stream parserStream.AddData(data []byte)
- Adds more data to the stream parserStream.Next() bool
- Advances to the next event, returns false when doneStream.Event() *Event
- Returns the current eventStream.Err() error
- Returns any error that occurred during parsing
NewElementStreamReader(r io.Reader) *ElementStreamReader
- Creates a reader for XML stream eventsElementStreamReader.ReadNode() (*Node, error)
- Reads the next complete XML nodeParseReader(r io.Reader) (*StreamDocument, error)
- Parses XML from an io.Reader into a StreamDocumentStreamDocument.DeepFind(name string) ([]*Node, bool)
- Searches for nodes in the streamed documentStreamDocument.FindOne(name string) (*Node, bool)
- Finds the first matching node in the streamed document
The library includes comprehensive test coverage for both valid and invalid XML parsing:
make test