Buca Bay - Always nice

Dua tiko noqu toa loaloa, na yacana ko… laga mai…

Bing copies Google results experiment was fixed

February8

Over at SearchEngineLand there is a discussion concerning Bing copying Google’s search results fueled by an experiment performed by Google to prove this point.

The experiment in a nutshell was set up by Google employees to provide fake search results for meaningless phrases such as hiybbprqag and see if Bing would show the same results. The Google employees then used IE to click on the results in Google and monitored the results in Bing. After some time the results showed up in Bing.

The Director at Bing, Stefan Weitz, already explains some of the tools used to “rank” pages which explains why the Google experiment shows the results it does.

As you might imagine, we use multiple signals and approaches when we think about ranking, but like the rest of the players in this industry, we’re not going to go deep and detailed in how we do it. Clearly, the overarching goal is to do a better job determining the intent of the search, so we can guess at the best and most relevant answer to a given query.
Opt-in programs like the [Bing] toolbar help us with clickstream data, one of many input signals we and other search engines use to help rank sites. This “Google experiment” seems like a hack to confuse and manipulate some of these signals.

For one the experiment was performed by Google employees in pursuit of the result they wanted and expected. The conclusion is thus invalid regardless of the outcome. If the outcome had been different they would likely have tried another approach in order to get the desired outcome - if not already the case.

The experiment only validates that Bing uses the user’s “clickstream” to determine which sites are important to that user. Google does the same with their results. No one complains there.

The Google engineers should already know this. They should know that what they are seeing is their own clickstream, not Google’s results. They are seeing themselves visiting that site often enough to make it important to them in the eyes of Bing. That is why Bing is showing those results, not because it copied Google. It is because the user clicked on it over and over.

There may be a factor that the result is listed on Google search results, which is a site the user would consider relevant to their objectives, but this was not tested in the experiment. In any case, this is the same as any website linking to another site, giving it a measure of relevancy to a certain query.

The only reason it coincides with Google’s results is that they made it so. They mislead Bing to think they found those sites important to them. What did they expect? They have the same algorithm built into Google.

The only issue here is whether Bing should be using the users clickstream? If Google can, why shouldn’t Bing? Is there more for Bing to gain? Maybe? Is there more for end users to gain? Definitely. I wouldn’t want Google to not remember which sites I find relevant personally, same with Bing.

Tsunami warning for Fiji, Twitter vs Local Radio vs Google

October8

I was woken up this morning by Vara, yelling into my ear. “There is a Tsunami warning for Fiji, wake up. Do you think it’s gonna come this far”?

We’re about a mile from the beach, but also at an elevation of about 50m or so. I replied, “No, not gonna come this far”. I was up all night on a project so this was not at the slightest bit interesting to me right now.

In my half alseep state however, each passing car started to sound like an approaching wave, crashing through the coconut trees and quickly tearing it’s way up the hill towards us. I decided to was time to wake up.

I started making my tea, listening to the radio. It was going on about the Tsunami warning and evacuations etc. There was a report about that the sea was retreating in the Yasawa’s. OK, sounds like this is a real disaster.

The first thing that came to mind was to see if I could get more up to date information online. I typed in http://twitter.com/ and did a search for “tsunami fiji“.

There were 2-3 updates every minute, and most of them stating the Tsunami warnings were already withdrawn. However, the radio was still going on with the warning. I didn’t really want to trust twitter solely, since most of it was just word of mouth.

I tried google, which would not have been useful in this situation a week ago. However, early this month they had implemented “search options” which allowed you to filter search results by date, showing the most recent results first.

Google proved to be up to date, with trusted information. Twitter had been just as, or maybe a few minutes ahead, but it took weeding through a few posts to finally get a trusted source.

It didn’t take long for Fiji Times to post an update on the cancelled Tsunami warning for Fiji. Which was immediately picked up by google, as well as twitter.

A few months ago I had the need to search for some very up to date information. Twitter provided the best source, in which google was quite useless. Now google seems to have noticed that they needed to provide realtime results. Twitter however still has the edge, with human interaction in near real-time and a wider range of resources. For example, if the Tsunami had actually hit, you could have watched it from around the world via Fiji Webcam link posted on twitter.

Now looking at this, I’m amazed at how close up to date information, especially those of large human interest such as disasters, is on the web now, mostly attributed to Twitter. Forget the radio, I’m doing a twitter search.

posted under news | No Comments »

Google AJAX Language API with PHP

January20

I had noticed some time ago that Google had released an API for their language translation service. A recent forum discussion made me revisit the API, and since I had a wee bit of time on my hands, I wrote a very rudimentary PHP class implementation of the API.

Google seems to like flaunting “AJAX”, in their APIs at least. So the API is called “Google AJAX Language API” and the main implementation is .. take a guess, AJAX. However, in addition to their pure JavaScript API, they also have a REST interface (of course JavaScript would need such an interface anyway).

The REST interface is just a HTTP endpoint (URL) that returns JSON. You just need to formulate a HTTP GET passing the parameters described in the API documentation, and Google will send you a nicely formated JSON response with the translated text and some other details.

Here is an example request, to translate “Hello World” to Italian.

http://ajax.googleapis.com/ajax/services/language/translate?v=1.0&q=hello%20world&langpair=en|it

The parameters are q=hello world and langpair=en|it

The JSON response looks like:

{"responseData": {"translatedText":"ciao mondo"}, "responseDetails": null, "responseStatus": 200}

So a simple implementation in PHP would be to use file_get_contents() to download the JSON text from the URL over HTTP and here it is in a PHP class.

http://code.google.com/p/php-language-api/source/browse/trunk/google.translator.php

Example Usage:

// example usage
$text = 'Welcome "to my " website.';
$trans_text = Google_Translate_API::translate($text, '', 'it');
if ($trans_text !== false) {
	echo $trans_text;
}

The class uses file_get_contents() which assumes your PHP has allow_url_fopen directive enabled in the PHP configuration (PHP.ini). This is usually the case, however, it can be disabled for security reasons (since it allows include() to use the URL wrapper and thus include remote files - a favourite exploit for attackers it to inject remote files into include() functions)

The class doesn’t use a JSON parser, I think its a bit of an overhead including the JSON libraries in PHP4. Intead it just uses regular expressions. The PCRE regular expression functions are pretty fast. PHP5 has native support for JSON however, so the class could be modified to use the PHP5 native JSON functions if you use PHP5 specifically.

Something I ran into was that Google returns not only UTF-8, but UTF-8 escape sequences. That is, they have characters outside the basic ASCII range escaped with the UTF-8 escape sequence which is \u followed by the character’s hex value. For example, the & symbol becomes:

\u0026

Cools aye. Sucks, because PHP does not understand this. JavaScript, which is the main method of invoking the Google Language API, understands this natively. PHP doesn’t even understand UTF-8 in PHP4. First I resorted to this ugly function to unescape the UTF-8 escape sequences (convert those UTF-8 sequence to actual UTF-8 byte sequences).

/**
         * Convert UTF-8 Escape sequences in a string to UTF-8 Bytes. Old version.
         * @return UTF-8 String
         * @param $str String
         */
        function __unescapeUTF8EscapeSeq($str) {
                return preg_replace_callback("/\\\u([0-9a-f]{4})/i", create_function('$matches', 'return html_entity_decode(\'\'.$matches[1].\';\', ENT_NOQUOTES, \'UTF-8\');'), $str);
        }

The function is ugly because it uses html_entity_decode() to do the transformation for us. We just convert the UTF-8 escape sequence to a HTML escape sequence (HTML entities), then use html_entity_decode() which PHP handles well. I decided on a compatible function that uses bitwise operations instead. Both are included in the source however for reference.

The code is very early development and will be buggy. You can check out the latest sources via SVN:

svn checkout http://php-language-api.googlecode.com/svn/trunk/ php-language-api-read-only

Feel free to let me know on the Google project page if you find any bugs.

http://code.google.com/p/php-language-api/issues/list

Review of Google Chrome

September11

Just testing out Google Chrome, which is an Open Source web browser developed by Google, and I must say I am very impressed. They are beyond any current browser in many ways.

The interface is awesome, very simple, and elegant. The most compelling aspect of Chrome however is how the development team designed it from the ground up to cater for todays modern websites. The browser is desinged like an operating system, each new tab has its own process just like each application in your OS. JavaScript is executed in a JavaScript Virtual Machine, which means… speed!

I encountered a bug however, when opening the built in JavaScript console. Noting serious.

A small quirk is that you can’t view the list of history pages for a single tab like in Firefox and IE. You have to view the whole browser history. Other then that, the browser rocks. (update: You can view the list of pages by clicking on the back button and holding)

Another browser also means more time debugging JavaScript (and xHTML/CSS). But I’ve just tested a few of the web based apps we’ve developed and they work fine. I’ve also heard the same news from other developers. (update: I’ve found a few bugs, some flash apps that talk to JS don’t work - same bug I’ve seen in Safari)

That reminds me, the browser testing is actually done automatically using Google’s Index of sites. They claim to be able to test new Chrome builds against thousands of sites within half an hour and I believe them. What a development edge over Mozilla - don’t think Firefox has such a super testing process.

Now the only thing I haven’t looked into is how easy it would be to write plugins - what they have in store for plugin/extension developers.

Google Chrome even has the built in functionality of the famous Firefox Extension, Firebug, with its Chrome Inspector. Good stuff.

Tag Cloud