blog.humaneguitarist.org

fun with lxml, part 2

[Sat, 09 Apr 2011 15:56:34 +0000]
Just following up on a previous post [http://blog.humaneguitarist.org/2011/03/07/fun-with-libxml/]from about a month ago ... Per a request, I need to tweak some software of mine to allow a user to specify a parent element in an XML document and in turn retrieve child element values. Big deal. That's what XSLT is for - blah, blah, blah. But this is particularly for PubMed XML exports and turning those into Excel files [http://blog.humaneguitarist.org/projects/pubmed2xl/]. Anyway, the value of a given child element needs to be able to be specified (i.e. by position) and placed into an Excel cell. Alternatively, all children values need to be able to be placed into one cell separated by a delimiter. So before I try and tinker with the software I want to work a solution out using test code: from lxml import etree ##### Step 1 # make an XML example xml = '<a> \ <b> \ <c>cee1</c> \ <d>dee1</d> \ <c>cee2</c> \ <d>dee2</d> \ </b> \ <b>bee</b> \ <c>cee3</c> \ </a>' ##### Step 2 # parse the XML example parseXML = etree.XML(xml) ##### Step 3 # make a list of the first (i.e. the Zero-th) <b> element b_list = parseXML.findall('.//b')[0] ##### Step 4 # get a list of all the children in that first <b> element b_childList = b_list.getchildren() ##### Step 5 # make a new list called "c_list" with only <c> elements # that are children of our first <b> element c_list = [] # make an empty list to put things in and # place into that list only element *values* for child elements # of first <b> element from children that are <c> elements only for child in b_childList: if child.tag == 'c': c_list.append(child.text) ##### Step 6 # print desired results for c in c_list: #print all values, one per line print (c) print ('-'*4) # print dash line for reading ease print ('; '.join(c_list)) # print all values on one line with delimeter print ('-'*4) print (c_list[1]) #print only the second <c> element value Here are the results: <br/> >>> <br/> cee1<br/> cee2<br/> ----<br/> cee1; cee2<br/> ----<br/> cee2