Web Summit Videos

In November, I attended this year’s Web Summit in Lisbon. It is one of, if not the biggest tech conference on the planet with speakers this year including Edward Snowden, Katherine Maher (CEO of Wikipedia) and Marcelo Rebelo de Sousa (President of Portugal).

After coming home, I wanted to check the video recordings of some of the sessions that would be posted on the Web Summit Vimeo Account. It turns out that it is not really easy to find a specific video there.

So I built a little site that lists all the Websummit Videos together with the information about the corresponding sessions.

Screen Shot

The result can (for now) be found on Github Pages: Web Summit Videos. The sessions can be filtered by the Stage they took place on, the date and whether or not the video has been found yet. I periodically update the page when more Videos become available.

Behind the scenes, I am crawling the Web Summit API to get a list of all the Sessions and then call the Vimeo API to get all the Videos uploaded inside the Web Summit account. Then, I have implemented some matching logic based on the names of the videos, the presenters and some manual overrides.
Then I generate some static HTML pages that can be published on Github Pages, so I do not have to run any backend server anywhere. All filtering happens exclusively in the Frontend with JavaScript.

The code is available on Github: https://github.com/janhapke/websummit-videos – although the quality of the code can best be described as “throw-away weekend-project”.

How to print all SQL Queries made by waterline (Sails.js ORM)

The ORM used by sails.js is called waterline. I have been playing around with it for a couple of days now and noticed some subtle bugs, especially when associations are involved.

Fortunately, you can enable logging of all queries to the (server) console by setting the environment variable LOG_QUERIES to true when lifting sails:

$ LOG_QUERIES=true sails lift

This makes it much easier to figure out if strange behavior is caused by a bug in your own code or in waterline itself.

Bridging the gap between Sails.js and ng-admin

Sails.js is an MVC Framework for node.js with many features directly out of the box that make prototyping web applications a lot faster and easier. One of these features is called Blueprints and automatically generates a REST API based on your Model definition.

ng-admin is an Angular.js Application that can be connected to any REST API and, with some configuration, presents a fully-featured CRUD interface for all entities accessible through the API.

Bringing the two together allows us to add an admin interface to an API with minimal programming effort. However, the two don’t work together perfectly out of the box. Most features already work, but to get everything up and running properly, especially list views with pagination, some minor changes must be made in both projects.

Sorting and Limiting for List Views

ng-admin adds some parameters to its GET Requests that are supposed to tell Sails.js how many objects to return and how many to skip. However, Sails.js does not understand these parameters and therefore list views do not work initially. But, as described in the chapter “Customizing the API Mapping” in the ng-admin documentation, it is possible to change the way it builds up its request.

This can be done in the ng-admin.js file, where we set up your ng-admin application anyway. For Sails.js, the following code needs to be added after the call to admin.config:

admin.config(['RestangularProvider', function(RestangularProvider) {
    RestangularProvider.addFullRequestInterceptor(function(element, operation, what, url, headers, params, httpConfig) {
        if (operation == 'getList') {
            params.skip = (params._page - 1) * params._perPage;
            params.limit = params._perPage;
            params.sort = params._sortField + ' ' + params._sortDir;
            delete params._sortField;
            delete params._sortDir;
            delete params._page;
            delete params._perPage;
        }
        return { params: params };
    });
}]);

How this modifies the parameters is pretty obvious, but how when or why this is called not so much. It seems to be working with Angular.js dependency injection, but so far I haven’t worked enough with Angular.js to be sure.

Pagination in List Views

So far, ng-admin will only display a single page of results inside every list view. That happens because it has no idea how many items actually exist behind the API. It wants to read this information from a special HTTP-Header called X-Total-Count. Sails.js does not set this header, so pagination does not work.

Now, we could go and add logic to every Controller to count the total number of objects in our database and set the X-Total-Count Header. While it would work, it would be kind of dumb, because we would have to copy and maintain the same code in each and every controller that is part of our API.

Fortunately, Sails.js allows us to override parts of blueprints with custom logic. We need to override the find action with a version that counts the number of entries and sends it to ng-admin inside the X-Total-Count Header. To do so, we create a new directory inside our api directory, called blueprints. Inside this directory, we create a file called find.js with the following content:

'use strict';

/**
 * Adds the X-Total-Count Header to REST calls to support pagination in ng-admin
 */

/**
 * Module dependencies
 */
var actionUtil = require('sails/lib/hooks/blueprints/actionUtil');
var _ = require('lodash');
var async = require('sails/node_modules/async');

/**
 * Find Records
 *
 *  get   /:modelIdentity
 *   *    /:modelIdentity/find
 *
 * An API call to find and return model instances from the data adapter
 * using the specified criteria.  If an id was specified, just the instance
 * with that unique id will be returned.
 *
 * Optional:
 * @param {Object} where       - the find criteria (passed directly to the ORM)
 * @param {Integer} limit      - the maximum number of records to send back (useful for pagination)
 * @param {Integer} skip       - the number of records to skip (useful for pagination)
 * @param {String} sort        - the order of returned records, e.g. `name ASC` or `age DESC`
 * @param {String} callback - default jsonp callback param (i.e. the name of the js function returned)
 */

module.exports = function findRecords(req, res) {

    // Look up the model
    var Model = actionUtil.parseModel(req);

    // Lookup for records that match the specified criteria
    var queryData = Model.find()
        .where( actionUtil.parseCriteria(req) )
        .limit( actionUtil.parseLimit(req) )
        .skip( actionUtil.parseSkip(req) )
        .sort( actionUtil.parseSort(req) );
    queryData = actionUtil.populateRequest(queryData, req);

    var queryCount = Model.count().where(actionUtil.parseCriteria(req));

    // Expose header to the client
    res.set('Access-Control-Expose-Headers', 'X-Total-Count');

    async.parallel(
        { data: getData, count: getTotalCount },
        function (err, results) {
            res.set('X-Total-Count', results.count);
            res.ok(results.data);
        }
    );

    function getTotalCount(cb) {
        queryCount.exec(function (err, count) {
            cb(null, count);
        });
    }

    function getData(cb) {
        queryData.exec(function found(err, matchingRecords) {
            if (err) return res.serverError(err);
            // Only `.watch()` for new instances of the model if
            // `autoWatch` is enabled.
            if (req._sails.hooks.pubsub && req.isSocket) {
                Model.subscribe(req, matchingRecords);
                if (req.options.autoWatch) { Model.watch(req); }
                // Also subscribe to instances of all associated models
                _.each(matchingRecords, function (record) {
                    actionUtil.subscribeDeep(req, record);
                });
            }

            cb(null, matchingRecords);
        });
    }
};

This will now get called whenever ng-admin requests data for a list view and sends the proper header along with the data. The code is heavily based on the original implementation of find.js and some code I found in a Gist on Github. I only updated it a bit to work with a newer version of Sails.js (0.12 in this case).

Because the code above requires lodash and we cannot rely on lodash automatically shipping with Sails.js, we need to install it via npm before everything works:

npm require lodash --save

With these 2 changes in place, our ng-admin application should now be able to nicely display list views for our automatically generated Sails.js API.

Optimizing Doctrine in long running Jobs

I was recently building a long running PHP (CLI) job that exported large amounts of data from a MySQL database using the Doctrine DBAL. One problem I noticed was that the more data it exported, the more memory it consumed. Which was odd because it only reads from the database, accumulates new objects, sends them off to another service and then discards them. Analyzing memory consumption is not that easy, especially with larger programs that shuffle a lot of data around.

A simple yet effective method is to output the current memory usage at different places in the code:

echo memory_get_usage(true) . PHP_EOL;

This confirmed that indeed the memory was increasing over the runtime of the job. I suspected Doctrine was the culprit because the Entity Manager keeps all Entities it fetches from the Database in memory in case they are needed later. This is called the Unit of Work. I added some more debug output that periodically prints the class names of the entities Doctrine has under management and how many of each:

// Get Doctrine from the Symfony Dependency Injection Container
$doctrine = $this->getContainer()->get('doctrine.orm.entity_manager');

// Check what's actually inside the unit of work
$unitOfWork = $doctrine->getUnitOfWork();
echo 'Total number of entities: ' . $unitOfWork->size() . PHP_EOL;
foreach ($unitOfWork->getIdentityMap() as $entity => $map) {
    echo $entity . ' : ' . count($map) . PHP_EOL;
}

Some Entities indeed had close to a million instances in Doctrine. I knew at which point I wouldn’t need some of them any more, so I decided to remove all Entities of specific types from Doctrines unit of work:

$doctrine->clear('Path\To\Namespace\Entity');

This did decrease the memory consumption but not nearly as much as I had hoped or expected. I remembered that it is possible to get all SQL Queries that Doctrine has ever executed somehow. With lots of big entities and elaborate relationships between them, this could quickly amount to several millions of very big SQL queries, each stored as a string. It is of course possible to disable the SQL Logger:

$doctrine->getConnection()->getConfiguration()->setSQLLogger(null);

It turned out that this solved my memory problem. The SQL Logger seemed to indeed consume lots of RAM. An additional nice side-effect of this was that the entire job now ran about 2x faster than before. So disabling SQL Logging in Doctrine seems to generally be a good idea. You can still leave it enabled when the software runs in debug mode or with a verbose flag, if you need it for debugging purposes.