XSLT: a practical usage example with Pubmed records [Sat, 15 Aug 2009 17:48:08 +0000]
Update, December 10, 2010: If you are interested in getting PubMed citations into a spreadsheet application (Excel, etc.) please see PubMed2XL [http://blog.humaneguitarist.org/projects/pubmed2xl/]. PubMed2XL is free software that can convert PubMed citations into a Microsoft Excel file.
As part of my coursework for the University of Alabama SLIS [http://www.slis.ua.edu/] program, I took a database class last year. Long story short, one of assignments was to create a Microsoft Access dbase based on Medline [http://www.nlm.nih.gov/pubs/factsheets/medline.html] records.
The records were already provided for us as well as Java-based script to parse the information into a tab-delimited format prior to import into Access.
For extra credit, we were given another script that would parse records from an Ovid [http://ovid.com] database. If we could find access to an Ovid dbase (I couldn't as they were all password protected, understandably), we could run the script, parse the records and bring them into Access for additional credit.
But there was a way to use a free source, Pubmed, and still get the job done.
How? Well, Pubmed allows article information to be exported as XML.
Once in XML, there was no need for a script to parse the information. From there it was simple to bring the information into Access. I found it easier to import it into Excel, clean it up, and then import that Excel data source into Access.
But what if you have OpenOffice [http://www.openoffice.org/]?
I'm not aware of a simple way to import XML documents into OpenOffice Calc (their spreadsheet app) or Base (their dbase app).
But by using XSLT [http://www.w3schools.com/xsl/], there's a way around this issue.
Here are the steps:
1. Conduct searches in Pubmed [http://www.ncbi.nlm.nih.gov/pubmed/].
2. Send your articles to the Clipboard [http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=helppubmed&part=pubmedhelp#pubmedhelp.Saving_citations_tem].
3. Set display to "XML".
4. Send the results to "File [http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=helppubmed&part=pubmedhelp#pubmedhelp.Saving_citations_as_]" (see image below).
5. Save the file as "pubmed_results.txt".
6. Change the file's extension from "txt" to "xml".
7. Open the document in a text editor.
8. Above the DTD (i.e. ), add the following line:
9. Re-save the file.
10. Then, download this file [http://blog.humaneguitarist.org/uploads/pubmed_xslt.xsl] to the same directory as your "pubmed_results.xml" file.
11. Now click on "pubmed_results.xml" ; your browser should now display select data in an HTML tabular format.
12. From here, simply copy/paste the tabular data into OpenOffice Calc, clean it up as desired, save it as a ".ods" file, hook it up to OpenOffice Base, and design your queries, etc.
And now you've got a totally Free (minus the cost of a laptop, internet connexion, etc.) desktop dbase of Medline results.
* Note that the XML stylesheet I provided [http://blog.humaneguitarist.org/uploads/pubmed_xslt.xsl] only displays certain info. You can always open the stylesheet in a text editor and set it to display more information, such as Abstract, etc.