## Multi-threading – or How I Learned to Stop Worrying and Love my Cores

Dual- and quad-core PCs are now ubiquitous.  While making your operating system a better multi-tasking environment, they’ve had a limited effect on the code that most technical professionals write.  This is largely because of the perceived difficulty of parallel programming.   The evolution , however, of high-level languages that support multi-threading throughout the 90s and beyond, removed the need to manage threads at the low level, allowing engineers to concentrate on what part of the algorithm could be run in parallel.  Given the ever-increasing complexity of systems that have to be simulated, multi-threaded programming can offer significant time savings for many the problems that can be easily parallelized (and for which time-savings of parallelization outweigh the overhead).

Many industrially-relevant problems are highly parallelizable, including

 Maple 13’s multi-threading model reduced the time taken to generate this quaternion fractal
• Monte Carlo Option Pricing.  Financial engineers need to make split-second trading decisions.  Stochastic option pricing is highly parallelizable because each path can be calculated independently of others.
• 2D FFTs.   This can be considered an independent sequence of in-place FFTs on the rows and then the columns of the 2D data.
• Video or image processing.  Many of the filters in Adobe Photoshop, for example, are multi-threaded.

Take a simple example – the Monte Carlo integration of a single-variable function, f(x).  A simple single-threaded routine is simple to prototype.

1. Generate n random points X1,Y1…Xn,Yn in the solution space
2. Test whether each point Yi lies between f(Xi) and the x-axis; if it does, increment or decrement a value in a global vector, monteVector,  by 1
3. Add all the elements in monteVector, divide by n and multiply by the area of the solution space.

This is what a Maple procedure for part 2 could look like (i_low and i_high are 1 and n respectively for the single-threaded version – I’ve kept them as parameters to make this procedure easier to parallelize).

This procedure is computationally intensive but is also highly parallelizable.  (In fact, it’s embarrassingly parallel; each point Xi,Yi in the solution space can be considered independently of all others.)

Using the new Task Programming  Model in Maple 13, this is what a recursive procedure that queues small chunks of the solution space on separate processors would look like:

 Is the solution space larger than 1000 points? If yes, divide the solution space by two and spawn two instances of monteTaskDistribute() with Continue(), each with separate bounds for the solution space.  Maple distributes each instance to separate work queues (one for each processor). If no, pass that chunk of data to monteCarlo()  for processing

The Task Programming Model automatically scales to the number of processors you have on your machine – you use the same code whether you’re using a dual- or quad-core computer.

The complete implementation (together with a timing calculation) is in the worksheet attached to this post .  I’ve found that the multi-threaded routine is about 30% faster on my dual-core computer than the single-threaded version.

As a bonus, I’ve also attached a worksheet that calculates the value of Pi via a Monte-Carlo method – again the multi-threaded version is about 30% faster on a dual core computer. (There’s also a multi-threaded Mandelbrot Set Explorer on the Application Centre.)

This is just the beginning of the parallelization of Maple.  Some of the new polynomial algorithms we’re developing are superlinear in their speedup – an operation on 4 cores is more than 4 times faster than the same operation on a single core. Even with our current architecture, however, Maple still delivers an environment that allows you to write efficient, scalable, multi-threaded code in an easy-to-use, high-level programming language.

﻿