For some currently unknown reasons, we are experiencing a very high CPU load on our PostgreSQL DB server when processing workflows with Activiti. Of course, we're in the process of investigating that problem, but on the way, we discovered the following problem leading to a quite high number of aborted jobs in the processing of the workflows:
In the performOperation method of the org.activiti.engine.impl.interceptor.CommandContext class, the passed in execution instance is accessed without checking it for null. However, that execution instance is loaded from the DB further down the stack trace in the org.activiti.engine.impl.persistence.entity.JobEntity class' execute method via the org.activiti.engine.impl.persistence.entity.ExecutionEntityManager's findExecutionById method. In case there is no entry found in the DB, that method returns null. In the end, this leads to a NullPointerException in the above mentioned CommandContext class without further logging (just the NullPointerException itself is logged).
Investigating the source code of the Activiti Engine, one can find that in other locations where a similar process is follows, there are checks for the execution instance being null. For example in the org.activiti.engine.impl.cmd.SignalEventReceivedCmd class and others.
We currently only know that our DB does not find the expected execution instance and returns null whenever the CPU load of the DB server increases above a critical level (more than 100% CPU usage). Of course, this should not be the case, but we also think that no entity loaded from the DB should be accessed directly without checking it for being null.
Below, you can find a stack trace of the NullPointerException:
Activiti version 5.21.0 in an OSGi (Eclipse Equinox) context on CentOS 6.8 64-Bit.
PostgreSQL 9.4 DB server.