Google AJAX Language API with PHP
I had noticed some time ago that Google had released an API for their language translation service. A recent forum discussion made me revisit the API, and since I had a wee bit of time on my hands, I wrote a very rudimentary PHP class implementation of the API.
Google seems to like flaunting “AJAX”, in their APIs at least. So the API is called “Google AJAX Language API” and the main implementation is .. take a guess, AJAX. However, in addition to their pure JavaScript API, they also have a REST interface (of course JavaScript would need such an interface anyway).
The REST interface is just a HTTP endpoint (URL) that returns JSON. You just need to formulate a HTTP GET passing the parameters described in the API documentation, and Google will send you a nicely formated JSON response with the translated text and some other details.
Here is an example request, to translate “Hello World” to Italian.
http://ajax.googleapis.com/ajax/services/language/translate?v=1.0&q=hello%20world&langpair=en|it
The parameters are q=hello world and langpair=en|it
The JSON response looks like:
{"responseData": {"translatedText":"ciao mondo"}, "responseDetails": null, "responseStatus": 200}
So a simple implementation in PHP would be to use file_get_contents() to download the JSON text from the URL over HTTP and here it is in a PHP class.
http://code.google.com/p/php-language-api/source/browse/trunk/google.translator.php
Example Usage:
// example usage
$text = 'Welcome "to my " website.';
$trans_text = Google_Translate_API::translate($text, '', 'it');
if ($trans_text !== false) {
echo $trans_text;
}
The class uses file_get_contents() which assumes your PHP has allow_url_fopen directive enabled in the PHP configuration (PHP.ini). This is usually the case, however, it can be disabled for security reasons (since it allows include() to use the URL wrapper and thus include remote files - a favourite exploit for attackers it to inject remote files into include() functions)
The class doesn’t use a JSON parser, I think its a bit of an overhead including the JSON libraries in PHP4. Intead it just uses regular expressions. The PCRE regular expression functions are pretty fast. PHP5 has native support for JSON however, so the class could be modified to use the PHP5 native JSON functions if you use PHP5 specifically.
Something I ran into was that Google returns not only UTF-8, but UTF-8 escape sequences. That is, they have characters outside the basic ASCII range escaped with the UTF-8 escape sequence which is \u followed by the character’s hex value. For example, the & symbol becomes:
\u0026
Cools aye. Sucks, because PHP does not understand this. JavaScript, which is the main method of invoking the Google Language API, understands this natively. PHP doesn’t even understand UTF-8 in PHP4. First I resorted to this ugly function to unescape the UTF-8 escape sequences (convert those UTF-8 sequence to actual UTF-8 byte sequences).
/**
* Convert UTF-8 Escape sequences in a string to UTF-8 Bytes. Old version.
* @return UTF-8 String
* @param $str String
*/
function __unescapeUTF8EscapeSeq($str) {
return preg_replace_callback("/\\\u([0-9a-f]{4})/i", create_function('$matches', 'return html_entity_decode(\'\'.$matches[1].\';\', ENT_NOQUOTES, \'UTF-8\');'), $str);
}
The function is ugly because it uses html_entity_decode() to do the transformation for us. We just convert the UTF-8 escape sequence to a HTML escape sequence (HTML entities), then use html_entity_decode() which PHP handles well. I decided on a compatible function that uses bitwise operations instead. Both are included in the source however for reference.
The code is very early development and will be buggy. You can check out the latest sources via SVN:
svn checkout http://php-language-api.googlecode.com/svn/trunk/ php-language-api-read-only
Feel free to let me know on the Google project page if you find any bugs.


