Developing XML-enabled C programs with libxml2 A beginner's guide By David Turover This document is in the public domain. For brevity's sake, the code in this document contains no error checking. In real life, you will want to check for NULL pointers and function returns. Introduction libxml2 is a library of functions for handling XML data. A simple example: #include #include int main(){ xmlDocPtr doc; xmlNodePtr nodeLevel1; xmlNodePtr nodeLevel2; doc = xmlParseFile("xmlfile.xml"); for( nodeLevel1 = doc->children; nodeLevel1 != NULL; nodeLevel1 = nodeLevel1->next) { printf("%s\n",nodeLevel1->name); for( nodeLevel2 = nodeLevel1->children; nodeLevel2 != NULL; nodeLevel2 = nodeLevel2->next) { printf("\t%s\n",nodeLevel2->name); } } xmlSaveFile("xmlfile_copy.xml", doc); xmlFreeDoc(doc); return 0; } The above code, compiled with -lxml2, should print out the names of the elements in the first two elements' depth of an XML file, and save a copy of the file. Explanation of Introduction The xmlDocPtr is a pointer to an xmlDoc structure. It represents an XML data source. You load an XML file with the xmlParseFile() function, which takes as a parameter the name of an XML file and returns a pointer to a new xmlDoc structure (or NULL on failure). When done, you release this memory with the xmlFreeDoc() function. You can export an xmlDocPtr's data as an XML file with the xmlSaveFile() function. The xmlNodePtr points to a single element or node of an XML document. Each xmlNode has a .children member which is an xmlNodePtr to the first of this node's children. Each xmlNode has a .name member which is a string containing the name of the element it represents, or the word "text" for a text node. The xmlNodePtr is the basic structure used to traverse an XML document with libxml2. It contains several xmlNodePtrs which can be used to move around the document. If there is no other node in a particular direction, the pointer is NULL. xmlNodePtr->children The first child of the node xmlNodePtr->last The node's last child xmlNodePtr->parent The current node's parent node xmlNodePtr->next The next sibling node xmlNodePtr->prev The previous sibling node xmlNodePtr->doc The xmlDocPtr for the document containing this node ->parent ->prev You Are Here ->children ->last ->next Although the above diagram suggests that nodes are simply elements, be aware that areas of whitespace between elements are also nodes. This makes the above example faulty if it is taken literally, because the child and neighbour node pointers would point to the whitespace between elements. Checking for text nodes You can easily check to see what type of xmlNode you have by looking at the xmlNodePtr->type member, which is an integer with one of the following values: XML_ELEMENT_NODE XML_ATTRIBUTE_NODE XML_TEXT_NODE XML_CDATA_SECTION_NODE XML_ENTITY_REF_NODE XML_ENTITY_NODE XML_PI_NODE XML_COMMENT_NODE XML_DOCUMENT_NODE XML_DOCUMENT_TYPE_NODE XML_DOCUMENT_FRAG_NODE XML_NOTATION_NODE XML_HTML_DOCUMENT_NODE XML_DTD_NODE XML_ELEMENT_DECL XML_ATTRIBUTE_DECL XML_ENTITY_DECL XML_NAMESPACE_DECL XML_XINCLUDE_START XML_XINCLUDE_END XML_DOCB_DOCUMENT_NODE The only ones you need to care about right now are XML_TEXT_NODE and XML_ELEMENT_NODE. Handling a Node An XML node generally looks like this: Hello World The things you can manipulate are the node itself, the node's attributes, and the node's contents. Attributes Working with attributes of a node is fairly straightforward: You use the xmlGetProp() function to get an attribute's value and the xmlSetProp() function to change an attribute's value. If you want to know if an attribute exists, you use the xmlHasProp() function. If you want to completely remove an attribute, use xmlUnsetProp(). xmlSetProp(xmlNodePtr node, xmlChar *name, xmlChar *value); xmlGetProp(xmlNodePtr node, xmlChar *name); xmlHasProp(xmlNodePtr node, xmlChar *name); xmlUnsetProp(xmlNodePtr node, xmlChar *name); xmlGetProp returns a string that must be freed with the xmlFree() function when you are done with it, or else your program will have a memory leak. Content Working with content is less intuitive. The content of a node is not simply what a node contains, but is the text of a node and its children with the elements stripped and removed. Thus the content of from the above example would be "Hello World", with the child element nowhere to be seen. If you try adding element tags to a node's content, libxml2 will &escape their < and > characters. To work with content, then, you use the xmlNodeSetContent() and xmlNodeGetContent() functions to set or retrieve a node's content, or the xmlNodeAddContent() function to append to a node's content. xmlNodeSetContent(xmlNodePtr node, xmlChar *content); xmlNodeAddContent(xmlNodePtr node, xmlChar *content); xmlNodeGetContent(xmlNodePtr node); As with xmlGetProp(), you must use xmlFree() on the result of xmlNodeGetContent() or else you will have a memory leak. To print everything an element contains, not simply its content, use xmlElemDump() xmlElemDump(FILE * output, xmlDocPtr doc, xmlNodePtr node); Strings: xmlChar* versus char* xmlChar* is the string type used by libxml2. You can easily cast between char* and xmlChar*. Creating a New Node To create a node from scratch and add it to a document: xmlNodePtr node = xmlNewNode(NULL, "name"); xmlNodePtr nodeParent = doc->children; node = xmlDocCopyNode(node, doc, 1); xmlAddChild(nodeParent, node); The xmlNewNode() function allocates memory for a new node. When you are done, you must free the node with xmlFree() unless the node has been added to another structure (as it has here) which will be freed. The NULL in xmlNewNode() is where an xmlNsPtr namespace pointer would be if the node was going to be assigned to a particular namespace; we are not using namespaces right now, so it is left as NULL. The xmlDocCopyNode() function does not copy the node to the target document. Instead, it only copies the document information to the node, so that the node believes it is part of the document. To add the node to the document, you must then use another function such as xmlAddChild(), xmlAddSibling(), xmlAddNextSibling(), or xmlAddPrevSibling(). Summary of xmlNode Members and Simple Interface Functions type Node type (usually XML_ELEMENT_NODE or XML_ELEMENT_TEXT) name String containing element's name, or "text" if a text node children First child of node last Last child of node parent Parent node next Next sibling node prev Previous sibling node doc The document containing this node xmlSetProp(xmlNodePtr node, const xmlChar *name, const xmlChar *value); xmlGetProp(xmlNodePtr node, const xmlChar *name); xmlHasProp(xmlNodePtr node, const xmlChar *name); xmlUnsetProp(xmlNodePtr node, const xmlChar *name); xmlNodeSetContent(xmlNodePtr cur, const xmlChar *content); xmlNodeAddContent(xmlNodePtr cur, const xmlChar *content); xmlNodeGetContent(xmlNodePtr cur); xmlElemDump(FILE * output, xmlDocPtr doc, xmlNodePtr node); For more information, read the API docs at: http://xmlsoft.org/html/libxml-tree.html Addendum: Using LibXSLT Example: #include #include #include int main(){ xmlDocPtr doc = xmlParseFile("xmlfile.xml"); xsltStylesheetPtr xsl=xsltParseStyleSheetFile("xslfile.xsl"); xmlDocPtr result = xsltApplyStylesheet(xsl, doc, NULL); xmlSaveFile("stylesheet_output.xml", result); xmlFreeDoc(doc); xmlFreeDoc(result); xsltFreeStylesheet(xsl); return 0; } The XSL file is available, as an xmlDocPtr, as xsltStylesheetPtr->doc The NULL in xsltApplyStylesheet() is where you would give parameters to the XSL parser. That is beyond the scope of this document. If you want to add nodes to an XSL stylesheet, you will need to give the nodes an XSL namespace: xmlNsPtr xslNamespace=xmlNewNs(NULL, (xmlChar*)"http://www.w3.org/1999/XSL/Transform", (xmlChar*) "xsl"); xmlNodePtr newNode=xmlNewNode(xslNamespace,(xmlChar*)"variable"); xmlNodeSetContent(varNode, (xmlChar *)"fourty-two"); mlSetProp(varNode, (xmlChar *)"name", (xmlChar *)"my_variable"); xmlDocCopyNode(varNode, xslDoc, 1); xmlAddPrevSibling(xslDoc->children->next, varNode); This creates an element named my_variable, which contains the value "fourty-two". See also: http://xmlsoft.org/XSLT/html/libxslt-transform.html http://www.doc.eng.cmu.ac.th/ldp/howto/libxslt-1.0.1/tutorial/libxslttutorial.html Addendum: Using XPath: #include #include int main(){ xmlChar * xpath = "/foo[@bar='baz']"; xmlDocPtr doc = xmlParseFile("xmlfile.xml"); xmlXPathContextPtr context = xmlXPathNewContext(doc); xmlXPathObjectPtr result = xmlXPathEvalExpression(xpath, context); if(xmlXPathNodeSetIsEmpty(result->nodesetval)){ printf("No result\n"); } xmlFreeDoc(doc); xmlXPathFreeContext(context); xmlXPathFreeObject(result); } xmlXPathEvalExpression() function returns NULL if the xmlChar* is not a valid XPath statement. Otherwise, it returns an xmlXPathObjectPtr. xmlXPathObjectPtr->nodesetval is a xmlNodeSetPtr, a new data structure introduced with xpath. The xmlNodeSetPtr contains an array of pointers to the nodes that matched the XPath statement. xmlNodeSetPtr->nodeTab is an array of xmlNodePtr to the results. xmlNodesetPtr->nodeNr is the length of the ->nodeTab array An xmlXPathObject must be freed with the xmlXPathFreeObject() function. See also: http://xmlsoft.org/html/libxml-xpath.html http://www.doc.eng.cmu.ac.th/ldp/howto/libxslt-1.0.1/internals.html Addendum: Custom Functions These are a couple functions written for the semester project. /** srSeekChildNodeNamed() : Get a pointer to the child with the given name * Returns a pointer to the data, not a copy of it */ xmlNodePtr srSeekChildNodeNamed(xmlNodePtr p, char * name){ if(p == NULL || name == NULL) return NULL; for(p=p->children; p!= NULL; p=p->next){ if(p->name && (strcmp((char*)p->name,name) == 0)){ return p; } } return NULL; } /* srXPath(): Avoid having to use contexts in your code * Returns an xmlXPathObjectPtr that you must free with xmlXPathFreeObject() */ xmlXPathObjectPtr srXPath(xmlChar * str, xmlDocPtr doc){ xmlXPathContextPtr xpContext; xmlXPathObjectPtr xpResult; if(str == NULL){ printf("Error: srXPath(): NULL received for xpath string\n"); return NULL; } if(doc == NULL){ printf("Error: srXPath(): xmlDocPtr is NULL\n"); return NULL; } xpContext = xmlXPathNewContext(doc); if(xpContext == NULL){ printf("Error: srXPath(): Failed to create xpath context\n"); return NULL; } xpResult = xmlXPathEvalExpression(str, xpContext); xmlXPathFreeContext(xpContext); return xpResult; }