DoubleTrust Experimental Search

This blog concerns a small web programming project I wrote over a weekend (my first !) to combine and present search results from leading authorities in compact and useful ways. Its more of a learning experience, but hopefully its useful too !! I've been using it for my personal use with some benefits.

Tuesday, May 31, 2005

Rainbow Experimental Search launched

I really wanted to learn web programming since the last one year, so finally spent some time on it this weekend. As learning by doing is the best way, I decided to create a search hack Rainbow Experimental Search, an idea which had been going around my head for a month.

Initially I was just curious to write a small script to understand how useful the common results from Google and Yahoo can be. As it started to take shape, I think it now incorporates the following main ideas:

Relevance opinion from two authorities is better than one

The main idea was to see how the intersection of Google and Yahoo would look and present them in a useful way to be of practical utility I wanted to know that are the results from these two search engines very similiar or different. If they are different, then am I missing out on some good links not covered by one engine ? Different people had different opinions. Many thought they would be quite similiar. But it turns out, its somewhere in the middle: some results intersect , while many don't. Fraction of each depends on the search term. Popular terms like "movies" show lot of intersection. Searching for "virtual machine adaptation" doesn't.

Showing the intersection shows the most relevant results according to BOTH engines. Then the ORPHANS show the difference. Interesting results can be gleaned from the ORPHANS as well as the INTERSECTION.

Presentation Format

I was also curious how can we show results in a very compact form while still being very useful. In google or Yahoo it can take a while to scroll through all the results, esp if I want to see 20-40 results for a query.

Also wanted to create more choice in choosing results other than the ranking. The current results color code domain type, and show size bars to easily select a page with most content: useful if you are searching for lot of material on some topic and want to compare based on size.

Current format is an experiment with a table format with DHTML balloons to give a concise representation. The idea is to clearly show different page properties against one another, so that the user can make a better informed decision and quickly too. Using Google or Yahoo, it is very difficult for me to identify for example the page with the largest content in the first 10 to first 40 results.

The current format shows:
  1. Popup balloons showing description of each pages. This allows very concise summary of all the results and the user can look at descriptions of interesting pages.
  2. A graph for page sizes: allows very quick comparison amongst different page sizes.
  3. Color coded domains: allows easy seperation of different types of domains like .edu , .com etc.

It is hacked together in Perl CGI and Javascript, both of which I am very new at.
It's got some bugs Im sure. A known bug is in parsing results when searching for some weird queries.

I don't have time to work on this during the week, perhaps could spare time on the next weekend, a little busy with the quarter finishing and preparing for an internship.

Any feedback would be welcome !