The lxml.etree Tutorial (2024)

  • lxml
    • lxml
      • Introduction
      • Documentation
      • Download
      • Mailing list
      • Bug tracker
      • License
      • Old Versions
    • Why lxml?
      • Motto
      • Aims
    • lxml - Frequently Asked Questions (FAQ)
      • General Questions
      • Installation
      • Contributing
      • Bugs
      • Threading
      • Parsing and Serialisation
      • XPath and Document Traversal
    • Benchmarks and Speed
      • General notes
      • How to read the timings
      • Parsing and Serialising
      • The ElementTree API
      • XPath
      • A longer example
      • lxml.objectify
    • How to build lxml from source
      • Pyrex
      • Subversion
      • Setuptools
      • Running the tests and reporting errors
      • Contributing an egg
      • Providing newer library versions on Mac-OS X
      • Static linking on Windows
      • Building Debian packages from SVN sources
  • Developing with lxml
    • The lxml.etree Tutorial
      • The Element class
      • Elements are lists
      • Elements carry attributes
      • Elements contain text
      • Tree iteration
    • APIs specific to lxml.etree
      • lxml.etree
      • Other Element APIs
      • Trees and Documents
      • Iteration
      • Error handling on exceptions
      • Error logging
      • Serialisation
      • XInclude and ElementInclude
      • write_c14n on ElementTree
    • Parsing XML and HTML with lxml
      • Parsers
      • iterparse and iterwalk
      • Python unicode strings
    • Validation with lxml
      • DTD
      • RelaxNG
      • XMLSchema
    • XPath and XSLT with lxml
      • XPath
      • XSLT
    • lxml.objectify
      • Setting up lxml.objectify
      • The lxml.objectify API
      • ObjectPath
      • Python data types
      • How data types are matched
      • What is different from lxml.etree?
  • Extending lxml
    • Document loading and URL resolving
      • Resolvers
      • Document loading in context
      • I/O access control in XSLT
    • Extension functions for XPath and XSLT
      • The FunctionNamespace
      • Global prefix assignment
      • Evaluators and XSLT
      • Evaluator-local extensions
      • What to return from a function
    • Using custom Element classes in lxml
      • Element initialization
      • Setting up a class lookup scheme
      • Implementing namespaces
    • Sax support
      • Building a tree from SAX events
      • Producing SAX events from an ElementTree or Element
      • Interfacing with pulldom/minidom
    • The public C-API of lxml.etree
      • Writing external modules in Pyrex
      • Writing external modules in C
Author:Stefan Behnel

This tutorial briefly overviews the main concepts of the ElementTree API asimplemented by lxml.etree, and some simple enhancements that make your life asa programmer easier.

Contents

  • The Element class
  • Elements are lists
  • Elements carry attributes
  • Elements contain text
  • Tree iteration

A common way to import lxml.etree is as follows:

>>> from lxml import etree

If your code only uses the ElementTree API and does not rely on anyfunctionality that is specific to lxml.etree, you can also use thefollowing import chain as a fall-back to the original ElementTree:

try: from lxml import etree print "running with lxml.etree"except ImportError: try: # Python 2.5 import xml.etree.cElementTree as etree print "running with cElementTree on Python 2.5+" except ImportError: try: # Python 2.5 import xml.etree.ElementTree as etree print "running with ElementTree on Python 2.5+" except ImportError: try: # normal cElementTree install import cElementTree as etree print "running with cElementTree" except ImportError: try: # normal ElementTree install import elementtree.ElementTree as etree print "running with ElementTree" except ImportError: print "Failed to import ElementTree from any known place"

To aid in writing portable code, this tutorial makes it clear in the exampleswhich part of the presented API is an extension of lxml.etree over theoriginal ElementTree API, as defined by Fredrik Lundh's ElementTreelibrary.

An Element is the main container object for the ElementTree API. Most ofthe XML tree functionality is accessed through this class. Elements areeasily created through the Element factory:

>>> root = etree.Element("root")

The XML tag name of elements is accessed through the tag property:

>>> print root.tagroot

Elements are organised in an XML tree structure. To create child elements andadd them to a parent element, you can use the append() method:

>>> root.append( etree.Element("child1") )

However, a much more efficient and more common way to do this is through theSubElement factory. It accepts the same arguments as the Elementfactory, but additionally requires the parent as first argument:

>>> child2 = etree.SubElement(root, "child2")>>> child3 = etree.SubElement(root, "child3")

To see that this is really XML, you can serialise the tree you have created:

>>> print etree.tostring(root, pretty_print=True)<root> <child1/> <child2/> <child3/></root>

To make the access to these subelements as easy and straight forward aspossible, elements behave exactly like normal Python lists:

>>> child = root[0]>>> print child.tagchild1>>> for child in root:... print child.tagchild1child2child3>>> if root:... print "root has children!"root has children!>>> root.insert(0, etree.Element("child0"))>>> start = root[:1]>>> end = root[-1:]>>> print start[0].tagchild0>>> print end[0].tagchild3>>> root[0] = root[-1]>>> for child in root:... print child.tagchild3child1child2

Note how the last element was moved to a different position in the lastexample. This is a difference from the original ElementTree (and from lists),where elements can sit in multiple positions of any number of trees. Inlxml.etree, elements can only sit in one position of one tree at a time.

If you want to copy an element to a different position, consider creating anindependent deep copy using the copy module from Python's standardlibrary:

>>> from copy import deepcopy>>> element = etree.Element("neu")>>> element.append( deepcopy(root[1]) )>>> print element[0].tagchild1>>> print [ c.tag for c in root ]['child3', 'child1', 'child2']

To retrieve a 'real' Python list of all children (or a shallow copy of theelement children list), you can call the getchildren() method:

>>> children = root.getchildren()>>> print type(children) is type([])True>>> for child in children:... print child.tagchild3child1child2

The way up in the tree is provided through the getparent() method:

>>> root is root[0].getparent() # lxml.etree only!True

The siblings (or neighbours) of an element are accessed as next and previouselements:

>>> root[0] is root[1].getprevious() # lxml.etree only!True>>> root[1] is root[0].getnext() # lxml.etree only!True

XML elements support attributes. You can create them directly in the Elementfactory:

>>> root = etree.Element("root", interesting="totally")>>> print etree.tostring(root)<root interesting="totally"/>

Fast and direct access to these attributes is provided by the set() andget() methods of elements:

>>> print root.get("interesting")totally>>> root.set("interesting", "somewhat")>>> print root.get("interesting")somewhat

However, a very convenient way of dealing with them is through the dictionaryinterface of the attrib property:

>>> attributes = root.attrib>>> print attributes["interesting"]somewhat>>> print attributes.get("hello")None>>> attributes["hello"] = "Guten Tag">>> print attributes.get("hello")Guten Tag>>> print root.get("hello")Guten Tag

Elements can contain text:

>>> root = etree.Element("root")>>> root.text = "TEXT">>> print root.textTEXT>>> print etree.tostring(root)<root>TEXT</root>

In many XML documents (so-called data-centric documents), this is the onlyplace where text can be found. It is encapsulated by a leaf tag at the verybottom of the tree hierarchy.

However, if XML is used for tagged text documents such as (X)HTML, text canalso appear between different elements, right in the middle of the tree:

<html><body>Hello<br/>World</body></html>

Here, the <br/> tag is surrounded by text. This is often referred to asdocument-style XML. Elements support this through their tail property.It contains the text that directly follows the element, up to the next elementin the XML tree:

>>> html = etree.Element("html")>>> body = etree.SubElement(html, "body")>>> body.text = "TEXT">>> print etree.tostring(html)<html><body>TEXT</body></html>>>> br = etree.SubElement(body, "br")>>> print etree.tostring(html)<html><body>TEXT<br/></body></html>>>> br.tail = "TAIL">>> print etree.tostring(html)<html><body>TEXT<br/>TAIL</body></html>

These two properties are enough to represent any text content in an XMLdocument. If you want to read the text without the intermediate tags,however, you have to recursively concatenate all text and tailattributes in the correct order. A simpler way to do this is XPath:

>>> print html.xpath("string()") # lxml.etree only!TEXTTAIL>>> print html.xpath("//text()") # lxml.etree only!['TEXT', 'TAIL']

If you want to use this more often, you can wrap it in a function:

>>> buildTextList = etree.XPath("//text()") # lxml.etree only!>>> print buildTextList(html)['TEXT', 'TAIL']

For problems like the above, where you want to recursively traverse the treeand do something with its elements, tree iteration is a very convenientsolution. Elements provide a tree iterator for this purpose. It yieldselements in document order, i.e. in the order their tags would appear if youserialised the tree to XML:

>>> root = etree.Element("root")>>> etree.SubElement(root, "child").text = "Child 1">>> etree.SubElement(root, "child").text = "Child 2">>> etree.SubElement(root, "another").text = "Child 3">>> print etree.tostring(root, pretty_print=True)<root> <child>Child 1</child> <child>Child 2</child> <another>Child 3</another></root>>>> for element in root.getiterator():... print element.tag, '-', element.textroot - Nonechild - Child 1child - Child 2another - Child 3

If you know you are only interested in a single tag, you can pass its name togetiterator() to have it filter for you:

>>> for element in root.getiterator("child"):... print element.tag, '-', element.textchild - Child 1child - Child 2

In lxml.etree, elements provide further iterators for all directions in thetree: children, parents (or rather ancestors) and siblings.

The lxml.etree Tutorial (2024)
Top Articles
Grokking DynamoDB with TypeScript | AppSignal Blog
Strange World Showtimes Near Regal Fox Run & Rpx
Stayton Craigslist
ALLEN 'CHAINSAW' KESSLER | LAS VEGAS, NV, United States
Spectrum Store Appointment
Jeff Bezos Lpsg
M3Gan Showtimes Near Cinemark Movies 8 - Paris
Meet Scores Online 2022
Everything You Might Want to Know About Tantric Massage - We've Asked a Pro
Marie Temara Snapchat
Carmax Chevrolet Tahoe
Walmart Front Door Wreaths
102 Weatherby Dr Greenville Sc 29615
Watchseries To New Domain
Post-Tribune Obits
Wall Street Journal Currency Exchange Rates Historical
Roadwarden Thais
MyChart | University Hospitals
2503 South Tacoma Way
Christian Hogue co*ck
Verity Or Falsity Of A Proposition Crossword Clue
Kyle Gibson Stats Vs Blue Jays 5 Games | StatMuse
The Front Porch Self Service
Craiglist Morgantown
Maintenance Required Gear Selector Ecu
Caliber Near Me
Fototour verlassener Fliegerhorst Schönwald [Lost Place Brandenburg]
Gran Turismo Showtimes Near Epic Theatres Of Ocala
Are Huntington Home Candles Toxic
Wells Fargo Hiring Hundreds to Develop New Tech Hub in the Columbus Region
Hospice Thrift Store St Pete
Cece Rose Facial
Family Naturist Contest
Encore Atlanta Cheer Competition
Boostmaster Lin Yupoo
House Party 2023 Showtimes Near Mjr Chesterfield
Tandon School of Engineering | NYU Bulletins
Mike Norvell Height
Bronx Apartments For Rent Craigslist
Splunk Stats Count By Hour
Sloansmoans Many
SP 800-153 Guidelines for Securing WLANs
Zmeenaorrxclusive
Webworx Call Management
Docagent Caesars Sign In
Aces Fmcna Login
Buhsd Studentvue
Fetid Emesis
Geico Proof Of Residency
Vorschau: Battle for Azeroth – eine Tour durch Drustvar
Csuf Mail
Vimeo Downloader - Download Vimeo Videos Online - VEED.IO
Latest Posts
Article information

Author: Lakeisha Bayer VM

Last Updated:

Views: 6105

Rating: 4.9 / 5 (69 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Lakeisha Bayer VM

Birthday: 1997-10-17

Address: Suite 835 34136 Adrian Mountains, Floydton, UT 81036

Phone: +3571527672278

Job: Manufacturing Agent

Hobby: Skimboarding, Photography, Roller skating, Knife making, Paintball, Embroidery, Gunsmithing

Introduction: My name is Lakeisha Bayer VM, I am a brainy, kind, enchanting, healthy, lovely, clean, witty person who loves writing and wants to share my knowledge and understanding with you.