blog.humaneguitarist.org

easy calls to OpenCalais with Python, daggummit!

[Fri, 23 Mar 2012 17:28:00 +0000]
Yesterday, I wrote this [http://blog.humaneguitarist.org/2012/03/22/make-you-some-facets-boy/] post about using Yahoo's deprecated term extraction web service to generate "subjects" - or whatever you want to call them - for an item based on the metadata housed in a Solr-compatible XML file. I'd also wondered about doing the same thing with OpenCalais [http://www.opencalais.com/]. Before we go any further, I'd just like to say I wrote that post from my hotel room. I'm writing today's from the Denver airport with about 2 hours to kill before my flight departs. And I'd also like to point out that when writing blog posts with spotty Wi-Fi connections, one should not compose their post online through WordPress. I'm using WordPad, and I should probably make that a habit. Yeah, so anyway there's not that much good documentation on how to make calls on the Calais site. By "good" I mean there's no code sample to rip off. I'm sure it's perfectly fine for people who actually know what they're doing. Using "The Google" I found this [http://www.flagonwiththedragon.com/2011/06/08/dead-simple-python-calls-to-open-calais-api/] helpful post on making calls to OpenCalais. While I found it very well written and the code very helpful, I didn't want to have "httplib2" as a dependency since it's not available out-of-the-box with Python 2.7, as far as I know. Nor did I want to do anything with JSON. I'm just trying to make a simple POST request to the OpenCalais REST API - is all. Using that post's code as a starting point, I whipped up some simple Python without "httplib2". Note that this code passes three parameters to the API through the following variables: * "myCalaisAPI_key": this is where to paste your API key once you get it from Calais here [http://www.opencalais.com/APIkey]. * "sampleText": this is a string of plain text to send to Calais for it to analyze and build terms for. * "calaisParams": these are the options to pass to the service in XML format. Note that I'm specifically requesting what I really want, "social tags", via the following option: c:enableMetadataType="GenericRelations,SocialTags" ... and I'm specifically requesting a simple result format as follows: c:outputFormat="Text/Simple" There are other options, including RDF, that can be requested per the options mentioned on this [http://www.opencalais.com/documentation/calais-web-service-api/forming-api-calls/input-parameters] page. If you look at the code, you can see I'm asking Calais to analyze some text about Tim Tebow [http://en.wikipedia.org/wiki/Tim_Tebow] since I was in Denver when the Denver Broncos [http://en.wikipedia.org/wiki/Denver_Broncos] football team acquired Peyton Manning [http://en.wikipedia.org/wiki/Peyton_Manning] and traded Tebow to the New York Jets. The text is from a USA Today article from, um, yesterday. The Jets, I'd like to state, are not worthy of a hyperlink. And that's only part of the reason I'm sad to see Tebow go there. Alas. Anway, here's the output below, followed by the code. Note that - as mentioned in the code - I'm using the slightly older REST API. But what do I care right now. I'm just testing. Here's the output: <!--Use of the Calais Web Service is governed by the Terms of Service located at http://www.opencalais.com. By using this service or the results of the service you agree to these terms of service.--> <!-- Company: HBO, Organization: New York Jets, Person: Tim Tebow, TVShow: Hard Knocks, --> <OpenCalaisSimple> <Description> <calaisRequestID>dafa6c80-b4f6-77b1-1363-de96bb7764f4</calaisRequestID> <id>http://id.opencalais.com/ODNr1ciDte8wwv0nU3G1jw</id> <about>http://d.opencalais.com/dochash-1/895ba8ff-4c32-3ae1-9615-9a9a9a1bcb39</about> <docTitle/> <docDate>2012-03-23 00:56:09.679</docDate> <externalMetadata/> </Description> <CalaisSimpleOutputFormat> <Company count="1" relevance="0.643" normalized="HBO & Company">HBO</Company> <Organization count="1" relevance="0.643">New York Jets</Organization> <Person count="1" relevance="0.643">Tim Tebow</Person> <TVShow count="1" relevance="0.643">Hard Knocks</TVShow> <SocialTags> <SocialTag importance="2">Training camp<originalValue>Training camp (National Football League)</originalValue> </SocialTag> <SocialTag importance="2">New York Jets<originalValue>New York Jets</originalValue> </SocialTag> <SocialTag importance="2">Florida Gators football team<originalValue>2008 Florida Gators football team</originalValue> </SocialTag> <SocialTag importance="1">Tim Tebow<originalValue>Tim Tebow</originalValue> </SocialTag> <SocialTag importance="1">HBO<originalValue>HBO</originalValue> </SocialTag> <SocialTag importance="1">Hard Knocks<originalValue>Hard Knocks (TV series)</originalValue> </SocialTag> <SocialTag importance="1">Entertainment_Culture</SocialTag> <SocialTag importance="1">Sports</SocialTag> </SocialTags> <Topics> <Topic Taxonomy="Calais" Score="1.000">Entertainment_Culture</Topic> <Topic Taxonomy="Calais" Score="1.000">Sports</Topic> </Topics> </CalaisSimpleOutputFormat> </OpenCalaisSimple> And the code: # this code is based on: http://www.flagonwiththedragon.com/2011/06/08/dead-simple-python-calls-to-open-calais-api/ import urllib, urllib2 ######################### ##### set API key and REST URL values. myCalaisAPI_key = '' # your Calais API key. calaisREST_URL = 'http://api.opencalais.com/enlighten/rest/' # this is the older REST interface. # info on the newer one: http://www.opencalais.com/documentation/calais-web-service-api/api-invocation/rest # alert user and shut down if the API key variable is still null. if myCalaisAPI_key == '': print "You need to set your Calais API key in the 'myCalaisAPI_key' variable." import sys sys.exit() ######################### ##### set the text to ask Calais to analyze. # text from: http://www.usatoday.com/sports/football/nfl/story/2012-03-22/Tim-Tebow-Jets-hoping-to-avoid-controversy/53717542/1 sampleText = ''' Like millions of football fans, Tim Tebow caught a few training camp glimpses of the New York Jets during the summer of 2010 on HBO's Hard Knocks. ''' ######################### ##### set XML parameters for Calais. # see "Input Parameters" at: http://www.opencalais.com/documentation/calais-web-service-api/forming-api-calls/input-parameters calaisParams = ''' <c:params xmlns:c="http://s.opencalais.com/1/pred/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <c:processingDirectives c:contentType="text/txt" c:enableMetadataType="GenericRelations,SocialTags" c:outputFormat="Text/Simple"/> <c:userDirectives/> <c:externalMetadata/> </c:params> ''' ######################### ##### send data to Calais API. # see: http://www.opencalais.com/APICalls dataToSend = urllib.urlencode({ 'licenseID': myCalaisAPI_key, 'content': sampleText, 'paramsXML': calaisParams }) ######################### ##### get API results and print them. results = urllib2.urlopen(calaisREST_URL, dataToSend).read() print results

COMMENTS

  1. nitin [2013-01-31 18:37:25]

    Hey Peter, I've been meaning to thank you for posting. It's been a while since I tried LiveWriter ... maybe I should give it another shot. I've tried uploading blogs directly from OpenOffice, too, but it seems both the blog editors and WordPress mangles a lot of the formatting/HTML and I have to do all kinds of manual correction. TRUE STORY: As I was writing this reply my internet cut out, so I pasted my response in a plain text editor and re-pasted it (then added this note). Creepy!

  2. Peter Quirk [2012-10-26 17:58:07]

    Thanks for the sample code. WIth reference to offline blog writing, have you tried Windows LiveWriter? I use it all the time and find it really useful for composition and publishing to multiple sites. You can get it from http://www.microsoft.com/en-us/download/details.aspx?id=8621 [http://www.microsoft.com/en-us/download/details.aspx?id=8621]