An API Stack

An API Server

An API server required for a mobile app or a web application should provide an HTTP interface for JSON data with at least the following capabilities:

  • Storage of objects with nested structure - NoSQL
  • Fetch of object by id
  • Search of objects by full text search
  • Geolocation queries
  • Filtering on time fields

Such as server is especially important in the common case of the same API serving both a mobile app and a web application (site).

We propose a full stack for an API server which is dead simple to construct.

Building the server in Node.js, server-side JavaScript, is easy because Node.js being JavaScript, is all JSON, so it fits naturally with the JSON data. Coming from th JSON point of view, a natural choice for the database and search servers, respectively, is:

These systems are built for JSON objects including fields (called keys) which are JSON objects or JSON arrays with any level of nesting. Moreover, queries are written as JSON objects. Hence, integration of the whole stack is simple.

The API Stack

So the full stack for an API server is:

  • Server - Node.js HTTP server - logic
  • Search - Elastic Search Server - full text search, geolocation, time queries
  • Database - MongoDB - full objects, SQL

API Logic

The API execution flow is:

  • The server does search and filtering of objects using Elastic Search Server - yielding a subset of their fields
  • Once an array of objects is fetched, a particular full object is fetched from MongoDB using its id.

Let's detail the layers of the stack describing for each layer the underlying technology and then the source code of the layer. In accordance with the asynchronous character of Node.js, the layers operate with asynchronous callbacks that get objects.

MongoDB

MongoDB is a NoSQL database, storing objects (called documents) that have a non-fixed set of fields (called keys). The value of a field can be a deeply nested object. Objects are really JSON objects.

To insert an object:

 db.portfolios.insert({ "name" : "John Doe",    
      "portfolio"  : [{"symbol" : "IBM", "quantity" :  
     200 }, { "symbol" : "FB", "quantity" : 120}]})

To query for portfolios containing Apple:

 db.portfolios.find({ "portfolio.symbol" : "AAPL"  });

MongoDB can be distributed over multiple servers through sharding.

Database Layer

Constructed using the Native MongoDB Node.js driver.

Operations (short list):

  1. connectToDatabase
  2. findById
  3. query - return array of objects
  4. streamQuery - call a function for each object and a function at the end
  5. overwriteDocument

Programmatically, we define a class:

/**
 * @file mongodb_interface.js interface to MongoDB
 */

module.exports.mongoDatabase = mongoDatabase;

// include the native MongoDB driver
var mongo = require('mongodb');


 function mongoDatabase()
 {

      this.db = null;
      this.BSON = null;

      this.connectToDatabase = function(configuration,                databaseName)
      {

        var Server = mongo.Server;
          var Db = mongo.Db;
          BSON = mongo.BSONPure;

        var server = new Server('localhost', 27017, {auto_reconnect: true});
         this.db = new Db(databaseName, server, { w : 0 });

      }

    // stream results row by row
     this.streamQuery = function(collectionName, queryObj, fieldsObj, optionsObj, iteration,  endIteration)
    {

           this.db.collection(collectionName, function(err, collection) {

             var stream = collection.find(queryObj, fieldsObj, optionsObj).stream();

                   // For each data item
             stream.on("data", function(item) {

                iteration(item);

                });

                // When the stream is done
                stream.on("close", function() {

                 endIteration();

                });

         });

      }

 }

Elastic Search Server

Elastic Search Server is a search server for sophisticated full text search (including stemming, e.g. "run" matches "running"), geo spatial queries, and time range queries. It is based on the well-known Apache Lucene text search engine.

Elastic Search Server works with an HTTP API, stores JSON objects, and uses JSON objects as queries.

To index (insert) an object of type analyst report, to a database (index) of stocks,

 curl -XPUT 'http://localhost:9200/stocks/analyst_reports/1' -d '{

  "symbol" : "IWM",
  "author" : "Chad Karnes",
  "channel" : "Yahoo! Finance",
  "subChannel" : "etfguide",
  "title" : "Diversified? Think Again",
  "body" : "Diversification ain't what it used to be. Since 2007 there has been dramatic changes in wordlwide markets ...",
  "date" : "2009-11-15T14:12:12"

 }'

To search for analyst reports referring to real estate in China, we write the query,

 {
  "match" : {      
      "message" : {           
          "query" : "China real estate",
          "operator" : "and"
      }
  }
 }

Elastic Search Server can be distributed over multiple servers using sharding.

Search Layer

Constructed with the Elastic Search Node.js client.

Operations (short list):

  1. connectToServer
  2. deleteById
  3. deleteByQuery - delete all objects matching a query
  4. search - by phrase, time range, geo location area (center + radius)
  5. index- store an object, overwrite object if exists already

Programmatically, we define a class:

/**
 * @file elasticsearch_unterface.js  Interface to Elastic Search Server
 */

 module.exports.elasticSearchServer = elasticSearchServer;

 // include the Elastic Search client
 var ElasticSearchClient = require('elasticsearchclient');

 function elasticSearchServer()
 {
      this.elasticSearchClient = null;

      this.connectToServer = function()
      {  
           var serverOptions = {
          host: 'localhost',
          port: 9201,
          secure: false
          };    
           this.elasticSearchClient = new ElasticSearchClient(serverOptions);
     }


     this.search = function(params, produceResult)
      {
        var qryObj =
        {
            // batch           
           "from" : params["from"],
           "size" : params["size"],

           // project fields
           "fields" : params["fields"],

          // set  search term, geo location, time range
          "query": {
                            .......
           }
        };

         var result = null;

         var mySearchCall = this.elasticSearchClient.search(qryObj);

          mySearchCall.on('data', function(data) {
            // accumulate result
            result = data;
        });

        mySearchCall.on('done', function(){
            // callback
            produceResult(result);        
        });

        mySearchCall.on('error', function(error){

        });

          mySearchCall.exec();

     }
}

HTTP Server

Web servers like Apache come out of the box with all bells and whistles which makes them fat. Compared to them, Node.js http module constructs a server with less than 10 lines of code.

Such a server can handle thousands of short-lived connections of the sort demanded by an API server. Node.js yields Comparison Apache and Node.js as measured in many experiments,

Server Layer

An HTTP server in Node.js is just a few lines of code,

var http = require('http');
var server = http.createServer(function(request, response) {
    var data = "";

    // accumulate the data of the request coming in chunks
      request.on('data', function(chunk) {
           data += chunk;        
      });

    // when all chunks of the request are received
     request.on('end', function () {

            // will see below how to generate the result of the API call
            var result = ....  ;

            // write the response headers
             response.writeHead(result.status, result.headers);

            // write the response body
             response.end(result.body);

      });

 });

 // listen on port 4000, can be 80 or whatever
 server.listen(4000);

And voila, we have a server.

To route the API calls decoratively, we use the Node.js Journey module.

var journey = require('journey');

// include and connect to the storage and search interfaces that we need

var db = new mongodbInterface.mongoDatabase();
db.connectToDatabase("workspaces");
var mySearchServer = new elasticSearchInterface.elasticSearchServer();
mySearchServer.connectToServer();

// create the router
var router = createRouter();


/**
 * cerate a Journey router
*/

function createRouter()
{

 var router = new(journey.Router);

 // Create the routing table. handling each API call
 router.map(function () {


        // search by parameters in Elastic Search Server
          this.post('/search').bind(function (req, res, data) {

                var params = buildElasticSearchParams(data);

                mySearchServer.search(data, function(result){

                res.send(200,  {"Content-Type":"application/json"},  {"code" : 0,  "result" : result )});

                 });

          });


        // get full item from MongoDB
          this.post('/item').bind(function (req, res, data) {

               db.findById(data["id"], function(result){

                res.send(200,  {"Content-Type":"application/json"},  {"code" : 0,  "result" : result )});

               })

          });

      });

 return router;

 }

So the http server handles the request with:

request.on('end', function () {

      // pass the request to the router which will branch by the API call    
      router.handle(request, data, function (result) {

        response.writeHead(result.status, result.headers);
        response.end(result.body);

      });

});

For brevity here, we omit other calls such as inserts and updates.

The Power to Forget

An API server often stores transient data such as user geolocation that is continuously updated. If no location data for a user is received within a certain period, the geolocation data for the user should be deleted.

The stack facilitates automatic expiration of transient data at all levels.

MongoDB TTL expires data of a user location collection (table), within half an hour:

 db.userLocation.ensureIndex( { "lastUpdate": 1 }, { expireAfterSeconds: 1800 } )

Elastic Search Server offers a similar functionality with the _ttl field.

Source Code

My plan is to release the full stack source code on GitHub sometime in the future.