To find out more about any project, click the link in the title (if one exists)
With node.js becoming popular among web developers, I felt the need for a library (basically collection) of commonly used Data Structures and algorithms. The library currently contains Data Structures such as Trie, Stack, Queue, AVLTree (Height Balanced BST), Binary Heap, Min-Max-Heap and Algorithms such as lower_bound, upper_bound, binary search, partition, selection (bisection related).
How do you automatically partition data and then execute queries on all replicas and provide a consolidated result set? TDDB is an experiment with databases and distributed query processing. This was my final year project as an undergraduate student. Made a man out of me!
Can a message queue support millions of queues (one per user) and still remain remain responsive? Can such a system be self-monitoring and exhibit smooth failover characteristics? My evaluation of existing message queues resulted in no existing product that satisfied these requirements. pymq is intended to fill this gap.
What do you do when you have 30 million phrases that people may search for and you want to help them by suggesting a set of 'k' possible completions based on what they have already typed? Furthermore, the completions should be ordered by a pre-defined rank.
Most current implementations (including Apache Solr) use a O(n log n) algorithm to get the candidate list and sort based on score. lib-face on the other hand returns the results in guaranteed O(k log n) time making it very attractive for large-scale deployments.
NFS lets you share files across the network. However, it suffers from the following shortcomings:
p2p-fs tries to mitigate all of the above by using a peer-2-peer model for file sharing (much like BitTorrent)
A search result for many song lyrics on popular search engines returns many mostly relevant results. However, the target pages are filled with ads, videos and images. Anyone searching for the lyric text would not be interested in all that paraphernalia.
liblyric is an attempt to automate the process of scanning these individual result pages and extract the common textual content from them in the hope that the common parts will definitely be just the song's lyric text.
The techniques that liblyric employs have turned out to give accurate results more than 90% of the time.
What does this user's email talk about? What products would he/she be interested in buying at this point in time? These are the broad questions that need to be answered for a user given his/her email so that relevant monetizable advertizements can be shown alongside the email. CAE tries to do this as accurately as possible.