Page 1 of 1

Preformance MPI vs MPI/OpenMP hybrid code

Posted: 07 Mar 2016, 11:42
by vivanic
Dear all,

I was testing performance of MPI vs MPI/OpenMP hybrid code. Test is SCF run with DFT for organic dye. Details can be seen in attached output files.

MPI (20 processes)
>>> CPU Time used in LSDALTON is 4 hours 38 minutes 1 second
>>> wall Time used in LSDALTON is 4 hours 56 minutes 2 seconds
MPI/OMP (1 process / 20 threads)
>>> CPU Time used in LSDALTON is 101 hours 19 minutes 28 seconds
>>> wall Time used in LSDALTON is 7 hours 36 minutes 36 seconds
Both calculations were done on node with 20 physical cores. Math library is ATLAS. Correct installation/optimization of ATLAS is out of my control so I used math libraries "as is". MPI was build with ./setup --mpi. MPI/OMP was build with ./setup --mpi --omp.

As "LSDALTON is designed as a MPI/OpenMP hybrid code" I am puzzled with much larger time for MPI/OMP hybrid than for MPI. Am I doing something wrong or results are fine?

Regards,
Vedran

Re: Preformance MPI vs MPI/OpenMP hybrid code

Posted: 07 Mar 2016, 12:42
by tkjaer
Hi

While LSDalton is designed as a MPI/OpenMP code, in principle both your runs should take a approximately the same amount of time (as it uses the same resources).

Personally I am a little surprised.

As far as I can see

1. The Lapack calls takes the same amount of time on both runs as expected
2. The formation of the Coulomb matrix and exchange correlation matrix seem to be the same
3. The difference comes from the construction of the Exchange matrix. Here the pure MPI parallelization seem to be better than the OpenMP parallelization. I am not sure why and I will put this on my todo list to investigate further.

I should mention that the integral code that constructs the Exchange matrix was written when the usually number of cores was 2-8, so the code have not been profiled using 20 cores. I suspect that the problem arise due to an inefficiency in the exchange driver for many OpenMP threads.

Best Regards
Thomas Kjærgaard