This is the detailed post to go with yesterday's quick discussion about proximity search. All the code is available on GitHub.

This assumes a bit of NodeJS knowledge, a working copy of homebrew or something similar.

Install

  • MongoDB - brew install mongodb
  • NodeJS
  • NPM (included in NodeJS installer these days)

These are included in the package.json but it can't hurt to mention them here:

  • npm install twitter (node twitter streaming API library)
  • npm install mongodb (native mongodb driver for node)
  • npm install express (for convenience with API later)

Start mongod in the background. We don't quite need it yet but it needs done at some point, may as well do it now.

Create a Twitter App

Fill out the form Then press the button to get the single-user access token and key. I love that Twitter does this now, rather than having to create a full authentication flow for single-user applications.

ingest.js

(open the ingest.js file and read along with this bit)

Using the basic native MongoDB driver, everything must be done in the database.open callback. This might lead to a bit of Nested Callback Fury but if it bothers you or becomes a bit too furious for your particular implementation, there are a couple of alternative Node-MongoDB modules that abstract this out a bit.

// Open the proximity database
db.open(function() {
    // Open the post collection
    db.collection('posts', function(err, collection) {
        // Start listening to the global stream
        twit.stream('statuses/sample', function(stream) {
            // For each post
            stream.on('data', function(data) {
                if ( !! data.geo) {
                    collection.insert(data);
                }
            });
        });
    });
});

Index the data

The hard work has all been done for us: Geospatial Indexing in MongoDB. That's a good thing.

Ensure the system has a Geospatial index on the tweets.

db.posts.ensureIndex({"geo.coordinates" : "2d"})

Standard Geospatial search query:

db.posts.find({"geo.coordinates": {$near: [50, 13]}}).pretty()
(find the closest points to (50,13) and return them sorted by distance)

By this point, we've got a database full of geo-searchable posts and a way to do a proximity search on them. To be fair, it's more down to mongodb than anything we've done.

Next, we extend the search on those posts to allow filtering by query


db.posts.find({"geo.coordinates": {$near: [50, 13]}, text: /.*searchterm.*/}).pretty()

API

Super simple API, we only have two main query types:

  • /proximity?latitude=55&longitude=13
  • /proximity?latitude=55&longitude=13&q=searchterm

Each of these can take an optional 'callback' parameter to enable jsonp. We're using express so the callback parameter and content type for returning JSON are both handled automatically.

api.js

(open the api.js file and read along with this bit)

This next chunk of code contains everything so don't panic.

db.open(function() {
  db.collection('posts', function(err, collection) {
    app.get('/proximity', function(req, res) {
      var latitude, longitude, q;
      latitude = parseFloat(req.query["latitude"]);
      longitude = parseFloat(req.query["longitude"]);
      q = req.query["q"];

      if (/^(-?d+(.d+)?)$/.test(latitude) && /^(-?d+(.d+)?)$/.test(longitude)) {
        if (typeof q === 'undefined') {
          collection.find({
            "geo.coordinates": {
              $near: [latitude, longitude]
            }
          }, function(err, cursor) {
            cursor.toArray(function(err, items) {
              writeResponse(items, res);
            });
          });
        } else {
          var regexQuery = new RegExp(".*" + q + ".*");
          collection.find({
            "geo.coordinates": {
              $near: [latitude, longitude]
            },
            'text': regexQuery
          }, function(err, cursor) {
            cursor.toArray(function(err, items) {
              writeResponse(items, res);
            });
          });
        }
      } else {
        res.send('malformed lat/lng');
      }

    });
  });
});

If you've already implemented the ingest.js bit, the majority of this api.js will be fairly obvious. The biggest change is that instead of loading the data stream then acting upon each individual post that comes in, we're acting on URL requests.

app.get('/proximity', function(req, res) {

For every request on this path, we try and parse the query string to pull out a latitude, longitude and optional query parameter.

if (/^(-?d+(.d+)?)$/.test(latitude) && /^(-?d+(.d+)?)$/.test(longitude)) {

If we do have valid coordinates, pass through to Mongo to do that actual search:

collection.find({
  "geo.coordinates": {
    $near: [latitude, longitude]
  }
}, function(err, cursor) {
  cursor.toArray(function(err, items) {
    writeResponse(items, res);
  });
});

To add a text search into this, we just need to add one more parameter to the collection.find call:

var regexQuery = new RegExp(".*" + q + ".*");
collection.find({
  "geo.coordinates": {
    $near: [latitude, longitude]
  },
  'text': regexQuery
}

This makes it so simple, making it it kind of feels like cheating. Somebody else did all the hard work first.

App.net Proximity

This works quite well on the App.net Global Timeline but it'll really become useful once the streaming API is switched on.

Of course, the code is all there. If you want to have a go yourself, feel free.