Optimizing Doctrine in long running Jobs

I was recently building a long running PHP (CLI) job that exported large amounts of data from a MySQL database using the Doctrine DBAL. One problem I noticed was that the more data it exported, the more memory it consumed. Which was odd because it only reads from the database, accumulates new objects, sends them off to another service and then discards them. Analyzing memory consumption is not that easy, especially with larger programs that shuffle a lot of data around.

A simple yet effective method is to output the current memory usage at different places in the code:

echo memory_get_usage(true) . PHP_EOL;

This confirmed that indeed the memory was increasing over the runtime of the job. I suspected Doctrine was the culprit because the Entity Manager keeps all Entities it fetches from the Database in memory in case they are needed later. This is called the Unit of Work. I added some more debug output that periodically prints the class names of the entities Doctrine has under management and how many of each:

// Get Doctrine from the Symfony Dependency Injection Container
$doctrine = $this->getContainer()->get('doctrine.orm.entity_manager');

// Check what's actually inside the unit of work
$unitOfWork = $doctrine->getUnitOfWork();
echo 'Total number of entities: ' . $unitOfWork->size() . PHP_EOL;
foreach ($unitOfWork->getIdentityMap() as $entity => $map) {
    echo $entity . ' : ' . count($map) . PHP_EOL;
}

Some Entities indeed had close to a million instances in Doctrine. I knew at which point I wouldn’t need some of them any more, so I decided to remove all Entities of specific types from Doctrines unit of work:

$doctrine->clear('Path\To\Namespace\Entity');

This did decrease the memory consumption but not nearly as much as I had hoped or expected. I remembered that it is possible to get all SQL Queries that Doctrine has ever executed somehow. With lots of big entities and elaborate relationships between them, this could quickly amount to several millions of very big SQL queries, each stored as a string. It is of course possible to disable the SQL Logger:

$doctrine->getConnection()->getConfiguration()->setSQLLogger(null);

It turned out that this solved my memory problem. The SQL Logger seemed to indeed consume lots of RAM. An additional nice side-effect of this was that the entire job now ran about 2x faster than before. So disabling SQL Logging in Doctrine seems to generally be a good idea. You can still leave it enabled when the software runs in debug mode or with a verbose flag, if you need it for debugging purposes.

Leave a Reply

Your email address will not be published.