FAQ
From Seeks
Below is the English FAQ. There is also a FAQ in French.
General
What is Seeks ?
Seeks is a p2p pattern matching overlay network on top of existing search engines. It provides collaborative websearch capabilities by automatically regrouping users based on the similarity of their queries, and letting them reorganize and evaluate the search results together. Seeks implements a websearch proxy and a distributed hashtable for this purpose.
What does Seeks do again, I'm confused ?
Seeks proposes that users share their queries to the main websearch engines. By doing so, users who perform similar queries can be automatically connected together through a p2p network. The regroupment of users is called a search group. Within a search group users and their machines interact to evaluate, organize and monitor results to their queries. Also, users will have the ability to connect to a search group, and publish their own work (i.e. webpage, comments, twitts, ...) directly to the group.
What are Seeks main components ?
Seeks implements two main elements:
- a websearch proxy, that is a little piece of software between the Internet and your browser, that provides a meta-search engine. More precisely, it intercepts your queries to main search engines, captures and reorganizes the results based on consensus among several search engines (for now Google, Bing & Cuil). It does so pretty fast, and allows you to beneficiate from more accurate search results.
- a p2p client, (also known as a distributed hashtable) that automatically regroups users who perform similar queries. This regroupment is done in realtime.
For now, only the websearch proxy is available.
How is that I can't find a similar project on the Web ?
We're not sure. The idea of searching together on top of the most connected networks ever built in human history is simple, realistic, and technically feasible. Our tentatives to answer are first that there may be not much to gain, moneywise, see below. Second, habits may not be ready for collaborative websearch.
Why aren't you guys trying to make money with it ?
It is common understanding for us that the architecture provided by the Seeks project can be used in many ways. However, we believe that our original model for free and transparent collaborative websearch would be hampered by a standard business model, such as support through advertizing. Also, the lack of business model allows developpers to flow into the project with no second thought. That is we're working for the community, through open source software and transparent development.
What are the potential alternative uses of Seeks ?
There are two main usages that we foresee. First, any application that would live on top of groups of automatically regrouped users could be enabled by Seeks. Second, the Seeks' proxy, controlling the input/output on the http port, used locally on a personal machine, or on a local network, allows the development of a full new set of innovative applications. Those include personal dashboard for displaying the information flows from the Web, the remashing of webpages, the development of personal machine learning assistants, either for crawling the Web, or helping with many information clustering and deciding tasks, the clustering and analysis of images and other medias, locally.
Why those big companies that run the main search engines are not doing what Seeks does ?
We believe this goes against their current business model. These companies are living from revenues of advertizing they target to websearch users on the basis of their search queries. Therefore it is unlikely they let the users share their queries, otherwise anybody could advertize anything to the users directly, breaking the mainstream advertizing-based business model. Seeks aims at powering up websearch, nothing more. We believe technical innovation should not be hampered by business means.
What are the steps in the development of Seeks that you guys have in mind for now ?
We have four steps in mind, take a look at the Roadmap.
What's the point in being able to register its own website through Seeks ?
The point is your website or any website by others does not need to be discovered, crawled and ranked by a third party search engine. Now you are in charge: decide a set of queries that does fit/describe with your website or the site you do want to register. This website will be recommended to users who performs similar queries to that you used in describing the registered content. Eventually, there is a plan to provide algorithms for automatically generating queries wrt. content.
Is there a way to bypass regular server-like search engine ? I thought it was theoretically slower ?
Decentralized search is several orders of magnitude slower than centralized search. However, Seeks is not a distributed search engine. Seeks is a distributed hashtable of queries, users and contents (mostly URLs). Search can be done by using regular (centralized) search engines while reworking their results collaboratively afterwards, and/or by registering content directly into Seeks' hashtable.
Sharing
Isn't it dangerous to let my queries go onto the Internet for users to see them ?
For now your queries are stored by private companies, running websearch services as a business. First there is no reason to believe that sharing queries by making them public would be worse. But most importantly, Seeks regroups users who perform similar queries, in real time. Other people seeing your queries have performed queries that are similar to yours. In other words, why hiding among your own crowd ? Second, sharing leads to collaboration, that leads to an improved, more subtle and precise treatment of information. So you share for a benefit, and you do the trade-off. Don't share what you want to keep for yourself (hum for the search engine databases truely). Third, when querying the Web, you are most likely looking for some human generated information. You are not alone out there, and what you are asking, others have asked it, and sometimes solved it before you. This means that most of the time, your query is a well known drop in an ocean of bits. We believe sharing is a reasonable option, backed up by serious rules and technical protection of the information you may not want to divulge.
Shouldn't my queries be encrypted on the network ?
Queries are hashed before passed on the network. That is your query never navigates into the clear, but as a bunch of numbers instead. This is equivalent to encryption. When it is to your benefit that the query reaches your peers into the clear, you will have the choice to do so. In this case, we have plan to provide dedicated encryption. But since you will be making this query (and not just the bunch of numbers) public, encryption should not be required.
What's my ID on the network, can other users identify me ?
No other users cannot identify you but for you IP address. If you wish to hide it, you must Tor or any other similar anonymous routing system. Your Seeks ID on the system is a 160bit randomly generated key.
Will people know what webpages I've been looking at ?
Truely, no. However, this is a little bit more complicated. In collaborative mode, Seeks will generate personalized rankings of websearch results. The ranking uses information from other users, mostly automatically computed scores on URLs visited out of a websearch. So, some smartasses could associate URLs to IP addresses that might have visited them. However, the sharing of scores will be put under every user's control, so privacy will be preserved whenever needed.
Technical
What is locality sensitive hashing about ?
Locality sensitive hashing is a method for regrouping similar elements. The general idea behind the theory is to control the collisions in hashing so that similar contents end up with colliding keys. If you are interested in the theory, please start with the LSH page on Wikipedia.
What would be the load on my box if I start using Seeks ?
Normally, not much. Try it on your laptop, you will see that your CPU should not fume over it. Memory cache should be in the xxMb every now and then for a single user, a few times more for a public node (as an example, ~16Mb on our public node, and ~40Mb on my laptop). The required space on your hard-drive should remain in the few Mb if you are using the proxy SOLO version.
Which existing peer-to-peer software did you start from ? And why ?
We are writing a DHT (p2p) from scratch based on Chord. The reason behind this choice is that Chord is a minimal, well studied DHT setting. The reason we are starting from scratch is that we need full and precise tweaking control over the software. First, Seeks protocol requires very fast transfer of information, in tiny amounts, among peers. Second, Seeks DHT defines several communication layers, from low-level stabilization of the p2p overlay network, to load balancing and user defined plugin-based decentralized exchanges. To achieve this, writing yet another DHT was we believe, the right decision.
Why did you design Seeks around a proxy ?
A proxy was not required by the architecture but is flexible solution that offers many advantages and almost no drawbacks. Among advantages are:
- A proxy is transparent and allows to redirect traffic from several domains to the same node. For example, some of our main nodes on www.seeks-project.info are hosted on some other remote machines;
- A proxy allows to intercept queries to other search engines, with no plugin added to the browser;
- A proxy allows to capture/intercept user feedback (useful for collaborative filtering) in a passive manner;
- A proxy allows to contact the DHT and to integrate additional information into the webpages (e.g. could intercept calls to URLs and ask the DHT for information related to these URLs, such as ratings, comments, ...).
- A proxy helps protecting user data and interaction with servers on the network.
- A proxy doesn't prevent any other solution, such as using a web server such as the included HTTP server plugin.
Development and the Open Source Community
What are you guys expectations w.r.t. Seeks ? Do you believe it can work ?
Technically, we could find no glitch. And we've been ruminating the whole project for several years now. Both theoretical and technical sides have been thoroughly analyzed. So technically it is feasible, no doubt. However, we are aware that public habits, demand, and usage may (and very probably will) not meet our vision. We are fine with this, Seeks being an open architecture, we are confident that it will fill a gap in collaborative websearch and communication, with a full respect of privacy in a decentralized architecture. How, when and what it will truly look like, that's what the adventure is about, and that's for you to decide!
What are the blocks that you guys need help on ?
If you're keen to give us a hand, take a look a the list of tasks that need help.
I'm a C++ programmer, how can I help ?
Sure, you can help with either debugging or by coding up plugins for a start. Take a look a the list of tasks that need help.
I'm good at Web-related stuff, do you believe I can help ?
Definitely. Seeks user interface is open to UI and web designers. You can start by setting up your own user interface, and then report to us, or by picking an open task. Depending of user interest we should provide a system of skins for at least Seeks websearch plugin.
I don't understand much when it comes to computer science, can I help ?
In the easiest manner, that is by using the software. Other than that, Seeks draws its force from a philosophy of openness, sharing and collaboration in the information age. So you can help with new high level ideas and criticism that would strengthen our skills and free access to good information.
Isn't there a danger providing a peer-to-peer tool in an open-source format ?
Security of software is always a concern. In an open-source community, bugs happen to be detected quickly. However, on a network of interacting machines, fiddling a communication protocol can partly disrupt the communication system. Here, it is enough to fall back onto the good mass vs. the bad mass argument: the more we are, using the software as it is, the more robust the network, and the more difficult is the disruption.
