xmltodict

xmltodict is a Python module that makes working with XML feel like you are working with JSON, as in this "spec":

>>> print(json.dumps(xmltodict.parse("""
...  <mydocument has="an attribute">
...    <and>
...      <many>elements</many>
...      <many>more elements</many>
...    </and>
...    <plus a="complex">
...      element as well
...    </plus>
...  </mydocument>
...  """), indent=4))
{
    "mydocument": {
        "@has": "an attribute", 
        "and": {
            "many": [
                "elements", 
                "more elements"
            ]
        }, 
        "plus": {
            "@a": "complex", 
            "#text": "element as well"
        }
    }
}

Namespace support

By default, xmltodict does no XML namespace processing (it just treats namespace declarations as regular node attributes), but passing process_namespaces=True will make it expand namespaces for you:

>>> xml = """
... <root xmlns="http://defaultns.com/"
...       xmlns:a="http://a.com/"
...       xmlns:b="http://b.com/">
...   <x>1</x>
...   <a:y>2</a:y>
...   <b:z>3</b:z>
... </root>
... """
>>> xmltodict.parse(xml, process_namespaces=True) == {
...     'http://defaultns.com/:root': {
...         'http://defaultns.com/:x': '1',
...         'http://a.com/:y': '2',
...         'http://b.com/:z': '3',
...     }
... }
True

It also lets you collapse certain namespaces to shorthand prefixes, or skip them altogether:

>>> namespaces = {
...     'http://defaultns.com/': None, # skip this namespace
...     'http://a.com/': 'ns_a', # collapse "http://a.com/" -> "ns_a"
... }
>>> xmltodict.parse(xml, process_namespaces=True, namespaces=namespaces) == {
...     'root': {
...         'x': '1',
...         'ns_a:y': '2',
...         'http://b.com/:z': '3',
...     },
... }
True

Streaming mode

xmltodict is very fast (Expat-based) and has a streaming mode with a small memory footprint, suitable for big XML dumps like Discogs or Wikipedia:

>>> def handle_artist(_, artist):
...     print artist['name']
...     return True
>>> 
>>> xmltodict.parse(GzipFile('discogs_artists.xml.gz'),
...     item_depth=2, item_callback=handle_artist)
A Perfect Circle
Fantômas
King Crimson
Chris Potter
...

Also you can use generator, for this item_depth has to be greater than 0 and no item_callback. Since the generator uses a thread under the hood you need to use it from within a with statement so that the thread is correctly disposed of:

>>> with xmltodict.parse(GzipFile('discogs_artists.xml.gz'), item_depth=2) as gen:
...    artists_names = [artist['name'] for _, artist in gen]
... artists_names
['A Perfect Circle', 'Fantômas', 'King Crimson', 'Chris Potter']

It can also be used from the command line to pipe objects to a script like this:

import sys, marshal
while True:
    _, article = marshal.load(sys.stdin)
    print article['title']

$ cat enwiki-pages-articles.xml.bz2 | bunzip2 | xmltodict.py 2 | myscript.py
AccessibleComputing
Anarchism
AfghanistanHistory
AfghanistanGeography
AfghanistanPeople
AfghanistanCommunications
Autism
...

Or just cache the dicts so you don't have to parse that big XML file again. You do this only once:

$ cat enwiki-pages-articles.xml.bz2 | bunzip2 | xmltodict.py 2 | gzip > enwiki.dicts.gz

And you reuse the dicts with every script that needs them:

$ cat enwiki.dicts.gz | gunzip | script1.py
$ cat enwiki.dicts.gz | gunzip | script2.py
...

Roundtripping

You can also convert in the other direction, using the unparse() method:

>>> mydict = {
...     'response': {
...             'status': 'good',
...             'last_updated': '2014-02-16T23:10:12Z',
...     }
... }
>>> print unparse(mydict, pretty=True)
<?xml version="1.0" encoding="utf-8"?>
<response>
	<status>good</status>
	<last_updated>2014-02-16T23:10:12Z</last_updated>
</response>

Ok, how do I get it?

You just need to

$ pip install xmltodict

There is an official Fedora package for xmltodict. If you are on Fedora or RHEL, you can do:

$ sudo yum install python-xmltodict

Donate

If you love xmltodict, consider supporting the author on Gittip.

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
ez_setup.py		ez_setup.py
setup.py		setup.py
tox.ini		tox.ini
xmltodict.py		xmltodict.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xmltodict

Namespace support

Streaming mode

Roundtripping

Ok, how do I get it?

Donate

About

Releases

Packages

Languages

License

solvebio/xmltodict

Folders and files

Latest commit

History

Repository files navigation

xmltodict

Namespace support

Streaming mode

Roundtripping

Ok, how do I get it?

Donate

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages