Lab 5: Parallel Matrix Multiplication
Submission deadline: 2359, Sunday, April 15, 2018.
(The deadline is preextended to Sunday :))
Setup
The skeleton code from Lab 5 is available on cs2030i
under the directory ~cs2030/lab5
. There are three files:
MatrixMultiplication.java
is the file that you need to edit to parallelize the matrix multiplication inside;LabFive.java
is the main file;Matrix.java
is the matrix class
To run LabFive
, you should:
1  java LabFive <n> 
where 2^n is the dimension of the matrices. For instance, to run with matrices of size 32x32,
1  java LabFive 5 
Background: Matrix Multiplication
Matrix multiplication is a fundamental operation on matrices with many applications in physics, engineering, mathematics, and computer science.
Given a matrix A of n \times m (n rows, m columns), and a matrix B of m \times p, the matrix produce C = AB is an n \times p matrix, where elements c_{ij} in C is given by: c_{ij} = \sum_{k=1}^m a_{ik}b_{kj}._
In this lab, we are interested in parallelizing the following divideandconquer algorithm for matrix multiplication:
Let A = \begin{pmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{pmatrix}, B = \begin{pmatrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{pmatrix}, C = \begin{pmatrix} C_{11} & C_{12} \\ C_{21} & C_{22} \end{pmatrix}
where A_{11}, A_{12} etc. are block partitioned matrices of equal sizes.
If C = AB, then:
C = \begin{pmatrix} C_{11} & C_{12} \\ C_{21} & C_{22} \end{pmatrix} = \begin{pmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{pmatrix} \begin{pmatrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{pmatrix} = \begin{pmatrix} A_{11}B_{11} + A_{12}B_{21} & A_{11}B_{12} + A_{12}B_{22} \\ A_{21}B_{11} + A_{22}B_{21} & A_{21}B_{12} + A_{22}B_{22} \end{pmatrix}
Task
Please implement the above divideandconquer algorithm as a RecursiveTask
and submit it to ForkJoinPool
for execution. For simplicity, we only need to handle square matrices of size 2^n for n up to 11.
A skeleton file MatrixMultiplication.java
has been given to you. The class MatrixMultiplication
is mostly empty, except that it inherits from RecursiveTask<Matrix>
. You should fill in the class with necessary fields and method.
The file Matrix.java
is also given to you. It implements a matrix with double values,
and stores the values of the matrix in a 2D double
array m
. It also stores the dimension of the matrix in a field dimension
. This file includes two methods to multiply two matrices, one sequentially with triple for loops, and another (also sequentially) with the recursive divideandconquer algorithms. There is a method to compare if two matrices are equals.
A method multiplyParallely
exists in the class Matrix
but at the moment it calls the nonparallel version of multiplyRecursively
. You should change this to appropriately invoke the parallel version of MatrixMultiplication
.
You are still required to
 follows the CS2030 Coding Style
 clearly documented with
javadoc
Running on sunfire
You need to get the code and submit your solution on cs2030i
per usual. The VM cs2030i
, however, has only one processor and therefore it is not much fun to run parallel programs on it.
Fortunately, the host sunfire
, has 256 processors^{1}. We will run your submitted solution on sunfire
, and you can test your code on sunfire
as well. You can ssh
into sunfire
just like cs2030i
.
On sunfire
, you should be able to get a speed up of at least 70 (i.e., parallel version is 70x faster than the sequential version) when running with n=11.
Note that sunfire
still uses Java 8. You won't need features from Java 9 or 10 in this lab in any case.
Tips:
 Try with small matrices first. Make sure the code is correct before you go for the larger matrices.
 Try running locally on your own machine first to make sure the code is correct and does not throw exceptions. Optimize for speed up on your machine, before moving the code to sunfire to measure the final speed up.
 The method in
LabFive
that measures the time runs a given multiplication three times before taking the average. It can take around 100120 seconds to multiply sequentially. When you debug for correctness and optimize for speed up for the parallel version, there is no need to multiply sequentially three times. Feel free to change this part of the code while you are debugging/optimizing your code.  For matrices of dimension 2^{10} and 2^{11}, you need to run
java
with the argumentXmx<size>
to increase the heap memory tosize
onsunfire
.Xmx1g
increases the heap memory to up to 1GB, and should work well for both cases.  If you grow impatient while waiting and want to stop the running process, type
ControlC
in your ssh window. You may have to wait up to a few seconds (depending on how slow issunfire
) for the process to stop.
Challenges:
Challenges for this lab include:
 Finding the right fork threshold so that you gain the maximum speed up for a matrix of size 2^{11} \times 2^{11};
 Not spawning too many tasks that block, which will in turns lead to too many compensation threads being created in
ForkJoinPool
, and aRejectedExecutionException
being thrown:Caused by: java.util.concurrent.RejectedExecutionException: Thread limit exceeded replacing blocked worker
. The order offork
andjoin
becomes important here.  Not creating too many unnecessary copies of the matrices (you will ran out of heap space even with
Xmx1g
 Not letting multiple tasks update the same matrix in place. Such side effects may lead to a incorrect result.
 Waiting patiently for the multiplication to complete :)
Grading
This lab contributes another 6 marks to your final grade.
We will still grade your code based on your implementation (not the resulting speed up), but as a guideline, using matrix of size 2^{11} as the benchmark on sunfire
with 1GB of heap space, you get:
 2 mark if you parallelize the code but achieve a slow down in performance.
 3 marks if your speed up is aronud 1  30x
 4 marks if your speed up is around 30x  50x
 5 marks if your speed up is around 50x  60x
 6 marks if your speed up is beyond 60x
You can get 0.5 mark deduction for a serious violation of style.
If you matrix multiplication output is incorrect or your code throws an exception, you will get deduction depending on how serious your bug is. Possible bugs, include:
 Incorrect order of fork and join (3)
 Letting multiple tasks update the same location of the matrix at the same time (2)
 Careless mistakes in referring to matrices or its indices (1)
Submission
When you are ready to submit your lab, on cs2030i
, run the script
1  ~cs2030/submit5 
which will copy all files matching *.java
(and nothing else) from your ~/lab5
directory on cs2030i
to an internal grading directory. We will test compile and test run with a tiny sample input to make sure that your submission is OK.
You can submit multiple times, but only the most recent submission will be graded.
May the fork()
s be with you.

You can find out by calling
Runtime.getRuntime().availableProcessors();
in Java. ↩