Python Xml

Python 2.x Python XML Parsing

XML stands for eXtensible Markup Language. You can learn through our website's XML Tutorial

XML is designed to transport and store data.

XML is a set of rules that define semantic markup, which divides documents into many parts and identifies these parts.

It is also a meta-markup language, which defines the syntax language used to define other domain-specific, semantic, structured markup languages.

Python XML Parsing

Common XML programming interfaces include DOM and SAX. These two interfaces handle XML files differently and are suitable for different occasions.

Python has three methods to parse XML: SAX, DOM, and ElementTree:

1. SAX (simple API for XML)

The Python standard library contains a SAX parser. SAX uses an event-driven model, triggering various events during the parsing of XML and calling user-defined callback functions to handle XML files.

2. DOM (Document Object Model)

Parses XML data into a tree in memory, operating on XML by manipulating the tree.

3. ElementTree

ElementTree is like a lightweight DOM with convenient and friendly APIs. It has good code availability, fast speed, and consumes less memory.

Note: Since DOM needs to map XML data to a tree in memory, it is both slow and memory-intensive. SAX reads XML files in a streaming manner, which is faster and uses less memory, but requires users to implement callback functions (handlers).

The XML example file movies.xml used in this chapter contains the following content:

movies.xml

War, ThrillerDVD2003PG10Talk about a US-Japan warAnime, Science FictionDVD1989R8A schientific fictionAnime, ActionDVD4PG10Vash the Stampede!ComedyVHSPG2Viewable boredom

Python Uses SAX to Parse XML

SAX is an event-driven API.

Using SAX to parse XML documents involves two parts: Parser and Event Handler.

The parser is responsible for reading the XML document and sending events to the event handler, such as element start and element end events.

The event handler is responsible for responding to events and processing the passed XML data.

1. Processing large files;
2. Only needing part of the file's content, or obtaining specific information from the file.
3. When you want to build your own object model.

In Python, using the SAX method to process XML requires importing the parse function from xml.sax and ContentHandler from xml.sax.handler.

Introduction to ContentHandler Class Methods

characters(content) method

Call timing:

From the beginning of a line until before encountering a tag, if there are characters, the value of content will be these strings.
From one tag until before encountering the next tag, if there are characters, the value of content will be these strings.
From one tag until before encountering a line break, if there are characters, the value of content will be these strings.

The tags can be start tags or end tags.

startDocument() method

Called when the document starts.

endDocument() method

Called when the parser reaches the end of the document.

startElement(name, attrs) method

Called when encountering an XML start tag. name is the tag name, and attrs is a dictionary of tag attribute values.

endElement(name) method

Called when encountering an XML end tag.

make_parser Method

The following method creates a new parser object and returns it.

xml.sax.make_parser()

Parameter Description:

parser_list - Optional parameter, parser list

parser Method

The following method creates a SAX parser and parses the XML document:

xml.sax.parse(xmlfile, contenthandler[, errorhandler])

Parameter Description:

xmlfile - XML filename
contenthandler - Must be a ContentHandler object
errorhandler - If this parameter is specified, errorhandler must be a SAX ErrorHandler object

parseString Method

The parseString method creates an XML parser and parses the XML string:

xml.sax.parseString(xmlstring, contenthandler[, errorhandler])

Parameter Description:

xmlstring - XML string
contenthandler - Must be a ContentHandler object
errorhandler - If this parameter is specified, errorhandler must be a SAX ErrorHandler object

Python Parses XML Example

Example

import xml.sax class MovieHandler(xml.sax.ContentHandler): def __init__ (self): self.CurrentData = ""self.type = ""self.format = ""self.year = ""self.rating = ""self.stars = ""self.description = ""def startElement(self, tag, attributes): self.CurrentData = tag if tag == "movie": print" *****Movie***** "title = attributesprint"Title:", title def endElement(self, tag): if self.CurrentData == "type": print"Type:", self.type elif self.CurrentData == "format": print"Format:", self.format elif self.CurrentData == "year": print"Year:", self.year elif self.CurrentData == "rating": print"Rating:", self.rating elif self.CurrentData == "stars": print"Stars:", self.stars elif self.CurrentData == "description": print"Description:", self.description self.CurrentData = ""def characters(self, content): if self.CurrentData == "type": self.type = content elif self.CurrentData == "format": self.format = content elif self.CurrentData == "year": self.year = content elif self.CurrentData == "rating": self.rating = content elif self.CurrentData == "stars": self.stars = content elif self.CurrentData == "description": self.description = content if( __name__  == " __main__ "): parser = xml.sax.make_parser()parser.setFeature(xml.sax.handler.feature_namespaces, 0)Handler = MovieHandler()parser.setContentHandler(Handler)parser.parse("movies.xml")

The execution result of the above code is as follows:

 ***** Movie ***** Title: Enemy BehindType: War, ThrillerFormat: DVD Year: 2003Rating: PG Stars: 10Description: Talk about a US-Japan war  ***** Movie ***** Title: TransformersType: Anime, Science FictionFormat: DVD Year: 1989Rating: R Stars: 8Description: A schientific fiction  ***** Movie ***** Title: TrigunType: Anime, ActionFormat: DVD Rating: PG Stars: 10Description: Vash the Stampede! ***** Movie ***** Title: IshtarType: ComedyFormat: VHS Rating: PG Stars: 2Description: Viewable boredom

For complete SAX API documentation, please refer to Python SAX APIs

Using xml.dom to Parse XML

The Document Object Model (DOM) is a standard programming interface recommended by the W3C organization for processing extensible markup languages.

A DOM parser reads the entire document at once when parsing an XML document, saving all elements of the document in a tree structure in memory. Then you can use different functions provided by DOM to read or modify the content and structure of the document, or write the modified content back to an XML file.

In Python, xml.dom.minidom is used to parse XML files. The example is as follows:

Example

from xml.dom.minidom import parse import xml.dom.minidom DOMTree = xml.dom.minidom.parse("movies.xml")collection = DOMTree.documentElement if collection.hasAttribute("shelf"): print"Root element : %s" % collection.getAttribute("shelf")movies = collection.getElementsByTagName("movie")for movie in movies: print" *****Movie***** "if movie.hasAttribute("title"): print"Title: %s" % movie.getAttribute("title")type = movie.getElementsByTagName('type')print"Type: %s" % type.childNodes.data format = movie.getElementsByTagName('format')print"Format: %s" % format.childNodes.data rating = movie.getElementsByTagName('rating')print"Rating: %s" % rating.childNodes.data description = movie.getElementsByTagName('description')print"Description: %s" % description.childN

YouTip

Python Xml

Python 2.x Python XML Parsing

Python XML Parsing

1. SAX (simple API for XML)

2. DOM (Document Object Model)

3. ElementTree

movies.xml

Python Uses SAX to Parse XML

Introduction to ContentHandler Class Methods

make_parser Method

parser Method

parseString Method

Python Parses XML Example

Example

Using xml.dom to Parse XML

Example

📂 Categories