Seeks On Web

From Seeks

Jump to: navigation, search

Once you have setup a Seeks proxy and have it running locally or on a remote server, you can make it available to the public directly through a Web site if you wish to.

This allows users to simply pass their queries to Seeks in a webpage, similarly to traditional search engines.

There are two ways to do so. The first solution uses a very light HTTP server built-in as a plugin in version of Seeks >= 0.2.3. The second solution requires an external webserver. The first solution (plugin) gives better performance to your node. The second solution is heavier but has some advantages, especially:

  • it allows the use of SSL whereas the HTTP server plugin based on libevent has no support for encryption yet.
  • it allows to run the webserver on a different machine than that of Seeks itself, something that can't be done with the built-in plugin.

Option 1, built-in plugin for Seeks >= 0.2.3:

  • a running Seeks proxy.

Option 2, with an external webserver.

  • a running Seeks proxy,
  • running webserver,
  • a script to route Web queries to the proxy (optional if using nginx as webserver, see below).

Contents

Option 1, built-in plugin

Again, you need Seeks >= 0.2.3. The server requires libevent to run.

When compiling Seeks, enable the HTTP server plugin:

./configure --enable-httpserv-plugin=yes --with-libevent=/your/path/to/libevent

Then compile. Before running, you must add the following to your src/proxy/config file:

activated-plugin httpserv

Then run Seeks, at startup you should see a line indicating that the webserver is running.

By default the server runs on localhost:8080. You can change this behavior by editing

src/plugins/httpserv/httpserv-config

from the sources.

On public nodes, it is recommended to use a robots.txt file to block crawlers to hit your websearch node. The robots.txt file must be put in the websearch/public repository. If you are running Seeks from the source repository, add your robots.txt to

src/public/

If you have installed Seeks in your home repository or on your system, add your robots.txt file to

<your_install_repository>/share/seeks/public/

Option 2, external webserver

You must set up the webserver by yourselves. Then the required scripts are given below, for Django or a PHP framework, pick the one you prefer. For beginners, we recommend you use the PHP script.

Django

settings.py

SEEKS_PROXY = 'http://localhost:8118'
SEEKS_URI = 'http://s.s/'
SEEKS_PATH = 'seeks/'

urls.py

from django.conf import settings

[...]

     (r'^%s(?P<path>.*)$' % settings.SEEKS_PATH, 'PROJECTNAME.seeks.views.seeks'),

seeks/views.py

DEPRECATED: There is no updated Python script for versions >= 0.3 yet, if you write it, let us know.

Use the php script or the built-in webserver instead

Use this script for versions of Seeks lower than Bubs-0.2-beta, and for versions < 0.3 the script below.

# Copyright Camille Harang
# 
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
# published by the Free Software Foundation, either version 3 of the
# License, or (at your option) any later version.
# 
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Affero General Public License for more details.
# 
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see http://www.fsf.org/licensing/licenses/agpl-3.0.html.

import urllib2
from urlparse import urljoin
from django.conf import settings
from django.http import HttpResponse, HttpResponseRedirect, HttpResponseServerError

def seeks(request, path):

    if path == '': return HttpResponseRedirect('websearch-hp')

    public_url = urljoin(settings.ROOT_URL, settings.SEEKS_PATH)
    local_url = urljoin(settings.SEEKS_URI, path)
    if 'QUERY_STRING' in request.META and request.META['QUERY_STRING']:
        local_url = '%s?%s' % (local_url, request.META['QUERY_STRING'])

    opener = urllib2.build_opener(urllib2.ProxyHandler({'http': settings.SEEKS_PROXY}))
    headers = [('Seeks-Remote-Location', public_url)]
    if 'HTTP_ACCEPT_LANGUAGE' in request.META:
        headers.append(('Accept-Language', request.META['HTTP_ACCEPT_LANGUAGE']))
    opener.addheaders = headers

    try:

        o = opener.open(urllib2.Request(local_url))

        info = o.info()
        mime = ''
        if 'content-type' in info: mime = info['content-type'].split(';')[0]
        else: mime = ''

    except urllib2.HTTPError, err:     return HttpResponseServerError('ERROR %s' % err)
    except urllib2.URLError, err:      return HttpResponseServerError('ERROR %s' % err.__getitem__(0)[1])
    except httplib.BadStatusLine, err: return HttpResponseServerError('ERROR %s' % err)
    except:                            return HttpResponseServerError('ERROR')
        
    return HttpResponse(o.read(), mimetype=mime)

PHP

Dependencies

Code

Use this script for versions of Seeks lower than Bubs-0.2-beta, for more recent versions, use the script below. Beware if you were using a script for versions < 0.3, those scripts will not work properly with versions >= 0.3, use the script that is part of the source distribution instead,

seeks-x.x/resources/search.php

You may need to modify the script accordingly with your configuration.

In case you are using a package for your distribution, the script may not have been packaged with it (this is independent from us). You can then find its last stable version here: search.php

Anonymization

Queries to Seeks logged on /dev/null

On *nix systems:

./seeks 2> /dev/null

Apache

 SetEnvIf Request_URI "^/seeks/" seeks # Set the appropriate pattern matching Seeks's location on you server
 CustomLog /dev/null env=!seeks
 CustomLog /var/www/access.log combined env=!seeks # Or your usual logging file

Lighttpd

 $HTTP["host"] =~ "^your_node_address$" 
 {
 accesslog.filename = "/dev/null"
 server.errorlog = "/dev/null"
 }

NGinx without script

If you use nginx as your front webserver, you can simply use the following configuration:

       location /search/ {
               rewrite ^/search/$ /websearch-hp break;
               proxy_pass http://localhost:8250/;
               proxy_set_header Host s.s;
               proxy_set_header Seeks-Remote-Location http://my.seeks-node.net/search;
       }

The PHP script isn't needed anymore, and the main page is accessible through http://my.seeks-node.net/search/

Tips

How to prevent Seeks from crashing

  • Help debugging it?
  • Run it endlessly (cool in a screen):
while true ; 
do ./seeks ; 
done
  • Another way of doing the same thing, but with cron and running seeks as a daemon:

Add the following line to your crontab file:

*/5 * * * * root [ ! -f /var/run/seeks.pid -o -z "$(cat /var/run/seeks.pid 2>/dev/null )" -o ! -d "/proc/$(cat /var/run/seeks.pid 2>/dev/null)" ]  && cd seekpath && ./seeks --daemon

where seekpath is the path to your version of seeks. This will check on a possibly dead seeks every 5 minutes.

Run seeks, with the arguments:

./seeks --daemon --pidfile /var/run/seeks.pid

On public nodes, it is recommended you use a robots.txt to block crawlers that may try to hit your websearch node and stress it for no purpose.

Built-in http server and lighttpd as a reverse-proxy

To use lighttpd as a reverse-proxy and have faster results, you can use the built-in HTTP server plugin along with the following lighttpd configuration snippet:

$HTTP["host"] =~ "seeks.zat.im" { 
    proxy.server  = ( "" => (( "host" => "127.0.0.1", "port" => 8080 ))
        setenv.add-request-header = (
            "Seeks-Remote-Location" => "http://seeks.zat.im"
        )
    )
}

You have to replace "seeks.zat.im" and the port of the proxy (8080).

SSL support version:

$HTTP["host"] == "seeks.sileht.net" {
   $HTTP["scheme"] == "https" {
       proxy.server = ( "" => ( ( "host" => "127.0.0.1", "port" => 8080 ) ) )
       setenv.add-request-header = (
           "Seeks-Remote-Location" => "https://seeks.sileht.net"
       )
   } else  $HTTP["scheme"] == "http" {
       proxy.server = ( "" => ( ( "host" => "127.0.0.1", "port" => 8080 ) ) )
       setenv.add-request-header = (
           "Seeks-Remote-Location" => "http://seeks.sileht.net"
       )
   }
}

You have to replace "seeks.sileht.net" and the port of the proxy (8080).

Built-in http server and apache as a reverse-proxy + SSL

You can use apache as a reverse proxy to your seeks built-in HTTP server plugin, listening on 127.0.0.1:8080. You need to activate some apache modules to work as a transparent proxy:

 a2enmod proxy
 a2enmod proxy_http
 a2enmod headers
 a2enmod rewrite

And, if you want ssl support, you'll need the ssl module, and a valid ssl certificate.

 a2enmod ssl

Then, create a file for the seeks' virtualhost (namely /etc/apache2/sites-available/seeks) and use this as a configuration file

 <VirtualHost *:80>
         ServerAdmin admin@domain.tld
         ServerName seeks.domain.tld
         
         RewriteEngine on
         RewriteCond %{HTTPS} off
         RewriteRule (.*) https://seeks.domain.tld%{REQUEST_URI} #no / at the end of the servername
 </VirtualHost>
 
 #And now, the SSL part stars below
 <VirtualHost *:443>
         ServerAdmin admin@domain.tld
         ServerName seeks.domain.tld
         
         SSLEngine on
         SSLCertificateFile /etc/ssl/certs/seeks.domain.tld.pem        # Use a valid cert
         SSLCertificateKeyFile /etc/ssl/private/seeks.domaine.tld.key  # And the associated key
         
         RequestHeader add Seeks-Remote-Location "https://seeks.domain.tld"
         
         ProxyRequests off                            # We do not ant to proxy queries
         proxyPreserveHost on
         ProxyPass / http://127.0.0.1:8080/           # So, redirecting the root of https://seeks.domain.tld/ to the http server embedded into seeks
         ProxyPassReverse / http://127.0.0.1:8080/    # Same one for the reverse queries
 
         DocumentRoot /path/to/your/seeks/src
 
         <Location />
                 Order allow,deny
                 Allow from all
         </Location>
 
         ErrorLog /var/log/apache2/error.log
         # Possible values include: debug, info, notice, warn, error, crit,
         # alert, emerg.
         LogLevel warn
 
         # Don't want to log queries
         CustomLog /dev/null defaults                 # Log nothing, except errors
 </VirtualHost>

Then, don't forget to activate the site

 a2ensite seeks

And to restart apache.

Set up an access control list for an open proxy

It is recommended you control who can use your external proxy if it is open for connection by outsiders.

To do so, modify the options permit-access and deny-access in the proxy configuration file (in the sources in src/config).

See the detailed configuration in the proxy config file itself.

Personal tools