You are here: Home > Dive Into Python > SOAP Web Services > Searching Google | << >> | ||||
Dive Into PythonPython from novice to pro |
Let's finally turn to the sample code that you saw that the beginning of this chapter, which does something more useful and exciting than get the current temperature.
Google provides a SOAP API for programmatically accessing Google search results. To use it, you will need to sign up for Google Web Services.
Go to http://www.google.com/apis/ and create a Google account. This requires only an email address. After you sign up you will receive your Google API license key by email. You will need this key to pass as a parameter whenever you call Google's search functions.
Also on http://www.google.com/apis/, download the Google Web APIs developer kit. This includes some sample code in several programming languages (but not Python), and more importantly, it includes the WSDL file.
Decompress the developer kit file and find GoogleSearch.wsdl. Copy this file to some permanent location on your local drive. You will need it later in this chapter.
Once you have your developer key and your Google WSDL file in a known place, you can start poking around with Google Web Services.
>>> from SOAPpy import WSDL >>> server = WSDL.Proxy('/path/to/your/GoogleSearch.wsdl') >>> server.methods.keys() [u'doGoogleSearch', u'doGetCachedPage', u'doSpellingSuggestion'] >>> callInfo = server.methods['doGoogleSearch'] >>> for arg in callInfo.inparams: ... print arg.name.ljust(15), arg.type key (u'http://www.w3.org/2001/XMLSchema', u'string') q (u'http://www.w3.org/2001/XMLSchema', u'string') start (u'http://www.w3.org/2001/XMLSchema', u'int') maxResults (u'http://www.w3.org/2001/XMLSchema', u'int') filter (u'http://www.w3.org/2001/XMLSchema', u'boolean') restrict (u'http://www.w3.org/2001/XMLSchema', u'string') safeSearch (u'http://www.w3.org/2001/XMLSchema', u'boolean') lr (u'http://www.w3.org/2001/XMLSchema', u'string') ie (u'http://www.w3.org/2001/XMLSchema', u'string') oe (u'http://www.w3.org/2001/XMLSchema', u'string')
Here is a brief synopsis of all the parameters to the doGoogleSearch function:
>>> from SOAPpy import WSDL >>> server = WSDL.Proxy('/path/to/your/GoogleSearch.wsdl') >>> key = 'YOUR_GOOGLE_API_KEY' >>> results = server.doGoogleSearch(key, 'mark', 0, 10, False, "", ... False, "", "utf-8", "utf-8") >>> len(results.resultElements) 10 >>> results.resultElements[0].URL 'http://diveintomark.org/' >>> results.resultElements[0].title 'dive into <b>mark</b>'
The results object contains more than the actual search results. It also contains information about the search itself, such as how long it took and how many results were found (even though only 10 were returned). The Google web interface shows this information, and you can access it programmatically too.
>>> results.searchTime 0.224919 >>> results.estimatedTotalResultsCount 29800000 >>> results.directoryCategories [<SOAPpy.Types.structType item at 14367400>: {'fullViewableName': 'Top/Arts/Literature/World_Literature/American/19th_Century/Twain,_Mark', 'specialEncoding': ''}] >>> results.directoryCategories[0].fullViewableName 'Top/Arts/Literature/World_Literature/American/19th_Century/Twain,_Mark'
This search took 0.224919 seconds. That does not include the time spent sending and receiving the actual SOAP XML documents. It's just the time that Google spent processing your request once it received it. | |
In total, there were approximately 30 million results. You can access them 10 at a time by changing the start parameter and calling server.doGoogleSearch again. | |
For some queries, Google also returns a list of related categories in the Google Directory. You can append these URLs to http://directory.google.com/ to construct the link to the directory category page. |
<< Introspecting SOAP Web Services with WSDL |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
Troubleshooting SOAP Web Services >> |