Hacking History

Browsing the web securely

If you are browsing the web using the http protocol (which is the only protocol most sites today support), then you are essentially not safe from packet sniffing. This means that hackers on the same network that you are on can see the traffic going to and from your computer. I've written a short post about this here explaining what the problem is and how you can go about solving it.

You can try out the accompanying proxy which enables you to make HTTPS connections to a trusted host which will proxy all your requests to the real server you are trying to contact, thus eliminating the risk of your packets being sniffed on the local network.

Terminal Emulator in your browser to Search the Web

I've been experimenting with alternate UIs for searching the web more effectively. This is another such experiment in that direction. I had 2 options to start off with:

Bring the browser to the terminal. (people seem to have tried this - links, lynx and other text mode browsers) or
Bring the terminal to the browser. This is an experiment in this direction since it seems to have been yet unexplored.

Some observations and motivations:

I've noticed that when I'm viewing search results, I land up opening about 4 result pages on an average for every search I perform. This is because I think that one of these candidate pages will contain the answer that I am looking for. The terminal emulator based UI allows you to do this very easily.
Almost all search result UIs expect you to leave the search page or rely on the browser and you to open links in a new window/tab. I think there can be some gain in keeping the user on the search page. Providing statistics on the same page, allowing the user to perform another search there itself, and letting the user browse his/her session's search history on the same page seem to have an advantage in letting the user not lose track of what goal he/she wanted to accomplish. I (and others I know) tend to get distracted if I see some interesting link and lose track of the reason I wanted to search for something. I have noticed that such a UI has aided recall in such situations. I'm sure other more interesting tools and features can be developed if the user remains put on one page itself. (Note: Even though the complete history, etc... can be shown on subsequent pages, I've noticed that it feels different from if it is on the same page).
Search Suggestions: Conventional search engines use a drop down list to show the user a list of possible completions of his/her query based on the prefix of either past queries or terms on the web. I've noticed that a maximum of about 15 suggestions goes down well, after which the UI looks cluttered and there is a feeling of an information overload. With the terminal UI however, it feels natural to even see upwards of 50 suggestions for a search phrase. This means that the user can benefit more in this aspect from such a UI.

Friday Hacker's Night (FHN) @ directi

FHN is an initiative started by one of the tech. leads Sandeep Shetty at Directi. The agenda for the night is to work on anything you like. It may be something work related you haven't got time to work on due to deadlines, or your own personal projects. This happens (almost) every alternate Friday and is generally accompanied with pizza and beer. 2 other very active FHN members are Rakesh Pai & Vishnu Iyengar.

I've worked on 4 major projects in the 4 FHN meets that I've attended

Lite & HTML only versions of Duck Duck Go: These versions of the web-interface are for people using text-only browsers, browsers on mobiles, browsers that don't support javascript or those who have disabled javascript on their browsers.
lib-face: A fast auto-complete library for as-you-type search suggestions.
A search bot for Duck Duck Go. Add ~~duckduckbot@bot.im~~ im@ddg.gg to ~~any~~ any jabber chat network (gmail, jabber.org, pandion.im, ~~yahoo, aim~~) and use it to search the web. The new bot is written using node.js which is a very scalable runtime for building applications which perform a lot of I/O. You can also read the guest post I wrote on Gabriel's blog here.
A zero-click search widget that web-site authors can very easily embed on their pages to provide definitions and extra information about selected text to their readers. This frees users from having to install a browser plug-in/extension/add-on for such functionality. Additionally, users need not explicitly update any add-on since they will always get the latest version of the widget when they visit the page.

Lyrics related

Searching for song lyrics has been something that I've been fascinated with for a few years now. The reason being that even though there are so many sites that serve lyrics online, most are neither easily searchable nor do they present the lyrics in an enjoyable manner. Either their indexes are incomplete or the interfaces are filled with videos, ads and all sorts of distractions.

To overcome these drawbacks, I implemented a lyric search library that uses search engines and fetches lyrics from the internet by using a fuzzy document matching and intersection extraction algorithm.

I also recently wrote a lyrics scraper for Duck Duck Go. Try it here.

C++ STL

The C++ STL is an absolutely wonderful collection of generic algorithms and containers. If anyone wants to learn how to write top quality algorithms and data structures, I would highly recommend you to learn C++ just so you can read the STL and marvel at its beauty and simplicity.

C++ features the EBO (empty base optimization). So if you have a base class that is empty, it won't take up any space! To exploit this optimization, libstdc++ containers were made to inherit from the Allocator class. Due to this decision, if the Allocator class defines clear() or any other container member function to be virtual, and internally uses it, then it will actually result in the container's corresponding method being called (and not the allocator's method which would be expected by the allocator writer). I fixed this behaviour since it could be potentially disastrous.

One of my applications exhibited the following usage pattern with linked lists. It read a bunch of strings from a file into a linked list, sorted them, did some processing and then freed the list. It repeated this process for many other files. I noticed that the first file was always processed the fastest whereas all subsequent files took longer than the first file to be processed. This was counter intuitive since I would expect the cache to get hot after the first use! I pinned this down to the fact that memory was being freed in a random order since sorting a linked list shuffled the nodes around and then freeing them added them back to the free list in that random order. This meant that a lot of the CPU cache was being misused to store data that was never going to be used. Furthermore, unnecessary cache misses were slowing the application down. A brilliant tool called cachegrind (which is a valgrind extension) helped me verify my claims.

I wrote an allocator that avoided this pattern wherein non-adjacent blocks of free memory are placed next to each other in the free-list and contributed it back to libstdc++ as the bitmap allocator, since it uses bitmaps to keep track of used and free memory blocks. It also has a lesser per-object overhead when compared to the default allocator.

Ingres on CC-NUMA

CC-NUMA machines exhibit non-uniform memory access times to different memory locations. The Ingres database server when run on CC-NUMA machines slows down because statistics for every query are maintained at a single place in the process image. This means that processes executing on CPUs other than the CPU whose memory bank houses the statistics data will experience a slow-down every time statistics are updated by that process.

This problem was solved by having statistics data maintained separately for each running process and aggregating the data only when required (typically when the aggregate information is requested by a user).

StickyLinks

A lot of pages seem to vanish from the internet over time. StickyLinks is an attempt to make a web link last forever. This is similar to the internet archive and WebCite

Misc.

Fixed a bug in XMMS which caused it to crash if you deleted a song that was queued in the playlist.

Added a progress-bar to the cp(1) UNIX utility. I haven't released this since there is sufficient disagreement about this in the community ;)