OMP and MPI tricks

Find answers or ask questions regarding Dalton calculations.
Please upload an output file showing the problem, if applicable.
(It is not necessary to upload input files, they can be found in the output file.)

Post Reply
ljr.argentina
Posts: 3
Joined: 15 Jul 2014, 19:27
First name(s): LEOPOLDO
Middle name(s): JOSE
Last name(s): RIOS
Affiliation: IMIT-UNNE CONICET
Country: Argentina

OMP and MPI tricks

Post by ljr.argentina » 16 Jun 2015, 20:15

What is the trick to run DALTON the maximum memory capacity? :?:

I have a new brand node with 64GB of RAM, and 24 cores. :o

I have a 2015 Dalton MPI compilation for OMP and 64 bits.

With OMP version I tested a script with these scaling parameters:
export DALTON_TMPDIR=/scratch/$USER
export OMP_NUM_THREADS=24

/opt/dalton15OMP/dalton/dalton -omp 24 -nb 61000 -t $scratch $INP $MOL
..................
When running the MPI version, I use the following scaling parameters:
export DALTON_TMPDIR = / scratch / $ USER
export DALTON_NUM_MPI_PROCS = 24

/opt/dalton13/dalton/dalton -N 1 -nb 8000 -t $scratch $INP $MOL
-----------------------------
It happens that the end uses only the minimum memory.
This informs the file out:
* Work memory size: 64000000 = 488.28 megabytes.

Thanks!

taylor
Posts: 545
Joined: 15 Oct 2013, 05:37
First name(s): Peter
Middle name(s): Robert
Last name(s): Taylor
Affiliation: Tianjin University
Country: China

Re: OMP and MPI tricks

Post by taylor » 16 Jun 2015, 20:25

A first question that will help us help you: how did you compile Dalton? In particular, did you use the (default) 32-bit integers? If so you can under no circumstances address more than 16GB of memory because Dalton uses a single allocated-at-startup static workspace, and with 32-bit integers the most you can access is 2GW (where "W" means a 64-bit word or more importantly 8 bytes of 8 bits each) which is 16GB.

Compiling with 64-bit integers will sidestep the addressing issue and allow you to address much larger memory --- with current Intel hardware something like 64TB. But, many MPI releases (especially if this is not Intel's MPI but OpenMPI or some other non-proprietary implementation) are not compiled and in some cases not even designed to be compiled with 64-bit integers, so there may be complications here.

Please post the output of your setup phase. Then we can provide more detailed advice.

Best regards
Pete

ljr.argentina
Posts: 3
Joined: 15 Jul 2014, 19:27
First name(s): LEOPOLDO
Middle name(s): JOSE
Last name(s): RIOS
Affiliation: IMIT-UNNE CONICET
Country: Argentina

Re: OMP and MPI tricks

Post by ljr.argentina » 18 Jun 2015, 21:00

Pete.... thanks for you reply.
My output to OMP compile :
nodo13:~/soft/DALTON-Source # ./setup --prefix=/opt/dalton15OMP/ --omp --int64 --fc=gfortran --cc=gcc --cxx=g++ --blas /usr/lib64/libblas.so.3 --lapack /usr/lib64/liblapack.so.3
FC=gfortran CC=gcc CXX=g++ cmake -DENABLE_MPI=OFF -DENABLE_SGI_MPT=OFF -DENABLE_OMP=ON -DENABLE_64BIT_INTEGERS=ON -DENABLE_OPENACC=OFF -DENABLE_COLLAPSE=OFF -DENABLE_CSR=OFF -DENABLE_SCALASCA=OFF -DENABLE_VAMPIRTRACE=OFF -DENABLE_TIMINGS=OFF -DENABLE_XCFUN=OFF -DENABLE_INTEREST=OFF -DENABLE_ICHOR=OFF -DENABLE_STATIC_LINKING=OFF -DENABLE_SCALAPACK=OFF -DEXPLICIT_BLAS_LIB=/usr/lib64/libblas.so.3 -DENABLE_AUTO_BLAS=OFF -DEXPLICIT_LAPACK_LIB=/usr/lib64/liblapack.so.3 -DENABLE_AUTO_LAPACK=OFF -DCMAKE_INSTALL_PREFIX=/opt/dalton15OMP/ -DCMAKE_BUILD_TYPE=release /root/soft/DALTON-Source

-- BLAS: using explit library (/usr/lib64/libblas.so.3)
-- LAPACK: using explit library (/usr/lib64/liblapack.so.3)
-- System : Linux
-- Processor type : x86_64
-- Fortran compiler flags: -DVAR_GFORTRAN -DGFORTRAN=445 -ffloat-store -fcray-pointer -m64 -w -fopenmp -fdefault-integer-8 -O3 -ffast-math -funroll-loops -ftree-vectorize
-- C compiler flags : -std=c99 -DRESTRICT=restrict -DFUNDERSCORE=1 -DHAVE_NO_LSEEK64 -ffloat-store -w -m64 -fopenmp -O3 -ffast-math -funroll-loops -ftree-vectorize -Wno-unused
-- Libraries : /usr/lib64/libblas.so.3;/usr/lib64/liblapack.so.3
-- Definitions : SYS_LINUX;SYS_UNIX;VAR_GFORTRAN;COMPILER_UNDERSTANDS_FORTRAN_2003;VAR_OMP;BUILD_GEN1INT;BUILD_PELIB;BUILD_QFITLIB;VAR_MFDS;_FILE_OFFSET_BITS=64;IMPLICIT_NONE;BINARY_INFO_AVAILABLE;INSTALL_BASDIR="/root/soft/DALTON-Source/build/basis";VAR_INT64;VAR_64BITS;VAR_RSP;INSTALL_WRKMEM=840000000;INSTALL_MMWORK=1
-- The Fortran compiler identification is GNU
-- The C compiler identification is GNU 4.8.1
-- The CXX compiler identification is GNU 4.8.1
-- Check for working Fortran compiler: /usr/bin/gfortran
-- Check for working Fortran compiler: /usr/bin/gfortran -- works
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Checking whether /usr/bin/gfortran supports Fortran 90
-- Checking whether /usr/bin/gfortran supports Fortran 90 -- yes
-- Check for working C compiler: /usr/bin/gcc
-- Check for working C compiler: /usr/bin/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/g++
-- Check for working CXX compiler: /usr/bin/g++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Performing Test COMPILER_UNDERSTANDS_FORTRAN03
-- Performing Test COMPILER_UNDERSTANDS_FORTRAN03 - Success
-- Since you specified 64bit integers the math lib search order is (only) MKL;ACML
-- This is because apart from MKL and ACML default math library installations are built for 32bit integers
-- If you know that the library you want to use provides 64bit integers, you can select the library
-- with -D BLAS_TYPE=X or -D LAPACK_TYPE X (X: MKL ESSL OPENBLAS ATLAS ACML SYSTEM_NATIVE)
-- or by redefining MATH_LIB_SEARCH_ORDER
-- Configuring done
-- Generating done
-- Build files have been written to: /root/soft/DALTON-Source/build

configure step is done
now you need to compile the sources:
$ cd build
$ make

bast
Posts: 1210
Joined: 26 Aug 2013, 13:22
First name(s): Radovan
Last name(s): Bast
Affiliation: none
Country: Germany

Re: OMP and MPI tricks

Post by bast » 19 Jun 2015, 08:24

dear Leopoldo,
to my knowledge there is no explicit OMP parallelization in the Dalton
code - in contrast to the LSDalton code. Dalton can implicitly benefit
from shared memory parallelization via threaded MKL.
if you run SCF in Dalton you will probably be better off running MPI
and not using all cores (thus having more memory).
for CC you will probably be better off by running "sequential" Dalton
with threaded MKL.
the picture is different for LSDalton. depends therefore on what you want
to run exactly.
best wishes,
radovan

taylor
Posts: 545
Joined: 15 Oct 2013, 05:37
First name(s): Peter
Middle name(s): Robert
Last name(s): Taylor
Affiliation: Tianjin University
Country: China

Re: OMP and MPI tricks

Post by taylor » 19 Jun 2015, 14:43

Some further comments. Are the BLAS and LAPACK libraries compiled with 64-bit integers? Note that this is not the same as being compiled for a 64-bit architecture! Their location in lib64 confirms the latter. But if you want to use 64-bit integers it is necessary to ensure that the entire software stack can deal with this. For example, if one uses OpenMPI to provide the MPI layer for parallel Dalton and LSDalton that are built with 64-bit integers, then OpenMPI must be explicitly built to support 64-bit integers. The default is for 32-bit integers. If your BLAS and LAPACK libraries have not been built to support 64-bit integers there will be problems. These problems do not just relate to how much memory one can address or array dimensions --- there are numerous flag variables that may never get values other than -1, 0, or 1 in some cases but which still need to be defined as 64-bit integers for everything to be compatible.

I am intrigued that you need the ability to address beyond 16GB. This is already serious memory and I am not sure what sort of calculations you contemplate with Dalton that would have such memory needs? For example, for SCF/DFT calculations 16GB should accommodate something like a basis of 10,000 functions, which is far larger than makes any sense with Dalton. (With LSDalton, yes, but we have run calculations with almost 20,000 basis functions with 32-bit addressing and mixed OMP/Scalapack parallelized LSDalton so 64-bit does not seem necessary here.) Very large MCSCF properties calculations can be done using 32-bit addressing (well over 100,000,000 determinants, for example), and SCF/DFT properties calculations can be handled also, unless perhaps one is looking for transition properties between many electronic states. For very large coupled-cluster calculations addressing beyond 16GB might be needed, but as I say you do not tell us why you need 64-bit integers.

In general, unless it is absolutely necessary (and one also has basically total control over the entire software stack being used: how it is built and installed) it is preferable to avoid 64-bit integers. Radovan's advice is sound: if you are running SCF or DFT energies and properties, compile Dalton to use MPI parallelism, and run multiple tasks on a node but, depending on the memory, possibly not as many tasks as cores on a node. If you are running CC, use the threaded MKL library to exploit parallelism at the thread level in the (many) matrix multiplication calls the CC code invokes. If your aim is to run very large basis set SCF and DFT energies and properties (plus some CC functionality), by which I mean more than a couple of thousand basis functions, you should consider using LSDalton.

Best regards
Pete

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest