Friday 5 December 2008

Hardware Transactional Memory I

The first multiprocessor I worked on was Nortel's XA-Core platform. This exotic platform was a replacement for the 'Computing Module' (CM) of their DMS telecoms switching platform.

Background
Previous CM generations are built on a pair of CPUs (Motorola 88k, 68k, BNR NT40) run in lockstep through a comparator for fault tolerance. The software running on these includes a multitasking OS (SOS) and a huge amount of call processing, database, hardware support and other telecoms code written in the proprietary PROTEL language, starting around 1979. SOS supports write-protectable memory, but not per-process memory protection, so the memory map resembles a heavily multithreaded process. Shared data is commonly used with an assumption of a strictly ordered memory model. Heavy use is made of a single-global-lock to enforce mutual exclusion between processes to the extent that the bulk of the computation time is spent with a single process holding the global lock in 'jumbo' timeslices of tens of milliseconds. Much of the large code base is > 10 years old and in a 'frozen' state where changes are not possible,

The problem
How to increase CM computation capacity beyond the incremental improvements available from successive generations of CPUs without a huge software rewriting and revalidation effort and while maintaining CPU and memory fault tolerance?

The solution ( XA-Core patent)
Create a fault tolerant SMP platform with replicated hardware transactional memory. Modify the OS so that a process claiming the 'single global lock' implictly sets the boundaries on a memory transaction. Handle inter-process memory access contention by rolling back one of the contenders. Handle CPU failure by rolling back in-progress memory transactions.
The achievable level of parallelism is then limited by the memory access patterns of the concurrently running processes at the cache-line level.
Code can still be written using the 'single CPU multitasking OS with big-global-lock' approach. Incremental improvements to available parallelism can be made by changing the data access patterns of the parallel processes. Tools exist to monitor contention between competing processes and map it to stack traces and/or data structures.

The interesting details and issues
The actual hardware used, the transaction handling in the operating system, handling IO, application modifications required etc.

In the spirit of actually completing some blog entries, I'll continue this post later.

To be continued...

No comments: