Page 1 of 1

Tests time out after build

Posted: 23 Jul 2020, 12:40
by dwh1d17
Installing on a RHEL6.10 x86 machine. Standard build but using openmpi.

mpicc --version
gcc (GCC) 6.1.0

Code: Select all

git clone --recursive https://gitlab.com/dalton/dalton.git
cd dalton/
git checkout Dalton2018.0
git submodule update
Configured the build -- configured dalton and created the directory build

Code: Select all

./setup --fc=mpif90 --cc=mpicc --mpi --mkl=sequential --prefix=/local/software/dalton/2018.0
cd build/
make -j 4 
The build seemed to work great. No errors. Now test the build -- creating a scratch directory for this purpose

Code: Select all

mkdir /scratch/hpc/dalton
export DALTON_TMPDIR=/scratch/hpc/dalton
export DALTON_NUM_MPI_PROCS=4
make test
Only the first test seems to pass can then they just start timing out.

1/496 Test #1: dft_ac_grac ...................... Passed 3.06 sec
Start 2: dft_b3lyp_cart
2/496 Test #2: dft_b3lyp_cart ...................***Timeout 1200.16 sec
Start 3: dft_b3lyp_magsus_nosym
3/496 Test #3: dft_b3lyp_magsus_nosym ...........***Timeout 1200.17 sec
Start 4: dft_b3lyp_molhes_nosym
4/496 Test #4: dft_b3lyp_molhes_nosym ...........***Timeout 1200.14 sec
Start 5: dft_b3lyp_nosym
5/496 Test #5: dft_b3lyp_nosym ..................***Timeout 1200.16 sec


Does anyone have any ideas? Is this a support build method?

kind regards,
David H

Re: Tests time out after build

Posted: 23 Jul 2020, 13:17
by magnus
Yes, it is supported. Could you provide the CMake output from when you run the setup command? There should be a file called setup_cmake_output in your build directory.

Re: Tests time out after build

Posted: 27 Jul 2020, 14:22
by dwh1d17
- The Fortran compiler identification is GNU 6.1.0
-- The C compiler identification is GNU 6.1.0
-- The CXX compiler identification is GNU 6.1.0
-- Check for working Fortran compiler: /local/software/openmpi/3.1.3/gcc/bin/mpif90
-- Check for working Fortran compiler: /local/software/openmpi/3.1.3/gcc/bin/mpif90 -- works
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Checking whether /local/software/openmpi/3.1.3/gcc/bin/mpif90 supports Fortran 90
-- Checking whether /local/software/openmpi/3.1.3/gcc/bin/mpif90 supports Fortran 90 -- yes
-- Check for working C compiler: /local/software/openmpi/3.1.3/gcc/bin/mpicc
-- Check for working C compiler: /local/software/openmpi/3.1.3/gcc/bin/mpicc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /local/software/gcc/6.1.0/bin/g++
-- Check for working CXX compiler: /local/software/gcc/6.1.0/bin/g++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Math lib search order is MKL;ESSL;OPENBLAS;ATLAS;ACML;SYSTEM_NATIVE
-- You can select a specific type by defining for instance -D BLAS_TYPE=ATLAS or -D LAPACK_TYPE=ACML
-- or by redefining MATH_LIB_SEARCH_ORDER
-- Found MPI_C: /local/software/openmpi/3.1.3/gcc/lib/libmpi.so
-- Found MPI_CXX: /local/software/openmpi/3.1.3/gcc/lib/libmpi.so
-- Found MPI_Fortran: /local/software/openmpi/3.1.3/gcc/lib/libmpi_usempif08.so;/local/software/openmpi/3.1.3/gcc/lib/libmpi_usempi_ignore_tkr.so;/local/software/openmpi/3.1.3/gcc/lib/libmpi_mpifh.so;/local/software/openmpi/3.1.3/gcc/lib/libmpi.so
-- Performing Test MPI_COMPATIBLE
-- Performing Test MPI_COMPATIBLE - Success
-- Performing Test MPI_F90_I4
-- Performing Test MPI_F90_I4 - Success
-- Performing Test MPI_F90_I8
-- Performing Test MPI_F90_I8 - Failed
-- Performing Test ENABLE_MPI3_FEATURES
-- Performing Test ENABLE_MPI3_FEATURES - Success
-- Found Git: /usr/bin/git
-- Polarizable Continuum Model via PCMSolver DISABLED
-- Configuring done
-- Generating done
-- Build files have been written to: /home/local/software/dalton/2018/dalton/build
I should note that I didn't use the --mkl=sequential flag in the end because it caused gfortan errors during build.

Re: Tests time out after build

Posted: 27 Jul 2020, 14:33
by magnus
Yes, --mkl is for Intel compilers only.

The CMake output looks ok, except that I would expect a message about which BLAS and LAPACK it found. Anyway it would not be able to finish the build without them.

It would be good to know whether or not it is related to MPI, so could you try to run serially (by not setting DALTON_NUM_MPI_PROCS) and running "ctest -L essential --output-on-failure" instead of "make test"

Re: Tests time out after build

Posted: 27 Jul 2020, 14:39
by magnus
I just noticed that you didn't specify "--cxx=mpicxx" in the setup command. Not sure if this will cause errors but perhaps worth a try.

Re: Tests time out after build

Posted: 28 Jul 2020, 11:33
by dwh1d17
The serial tests all pass fine and I tried adding "--cxx=mpicxx" but no improvement.

We have Atlas and blas installed via rpm so I'm guessing the installer is able to find then in the default location.

Code: Select all

|11:29:26| [dwh1d17@cyan02 lib64]$ rpm -aq | grep "blas\|atlas"
atlas-3.8.4-2.el6.x86_64
blas-3.2.1-5.el6.x86_64
I tried using openmpi v2.0.2 and this seems to be working much better. If I was to re-install openmpi, would it be better to use v3.1.6 or the newer v4.0.4?

Re: Tests time out after build

Posted: 28 Jul 2020, 14:41
by dwh1d17
Just to update, the tests are now finished using openmpi 2 and only the benchmark tests failed.

98% tests passed, 10 tests failed out of 496

Label Time Summary:
aosoppa = 61.51 sec (18 tests)
benchmark = 10.55 sec (10 tests)
cc = 34.01 sec (87 tests)
cc3 = 11.96 sec (31 tests)
ccr12 = 32.51 sec (68 tests)
cholesky = 4.77 sec (10 tests)
dalton = 4082.18 sec (475 tests)
dft = 609.43 sec (45 tests)
dpt = 3.09 sec (8 tests)
energy = 284.70 sec (19 tests)
essential = 105.42 sec (117 tests)
fde = 67.25 sec (2 tests)
gen1int = 29.74 sec (2 tests)
geo = 742.53 sec (29 tests)
long = 700.49 sec (18 tests)
mcscf = 61.14 sec (3 tests)
medium = 1552.28 sec (75 tests)
mp2r12 = 9.00 sec (17 tests)
multistep = 328.40 sec (9 tests)
numder = 20.52 sec (5 tests)
pcm = 397.13 sec (17 tests)
peqm = 85.67 sec (24 tests)
prop = 583.30 sec (44 tests)
qfit = 2.04 sec (3 tests)
qm3 = 100.10 sec (25 tests)
qmmm = 118.43 sec (8 tests)
rsp = 899.58 sec (79 tests)
runtest = 2871.17 sec (204 tests)
short = 1572.58 sec (373 tests)
soppa = 89.22 sec (18 tests)
unknown = 0.75 sec (1 test)
verylong = 198.26 sec (18 tests)
walk = 176.27 sec (8 tests)
weekly = 14.80 sec (21 tests)

Total Test time (real) = 4097.49 sec

The following tests FAILED:
487 - benchmark_eri_adz (Failed)
488 - benchmark_eri_adzs (Failed)
489 - benchmark_eri_atzs (Failed)
490 - benchmark_eri_r12 (Failed)
491 - benchmark_eri_r12xl (Failed)
492 - benchmark_her_adz (Failed)
493 - benchmark_her_adzs (Failed)
494 - benchmark_her_atzs (Failed)
495 - benchmark_her_r12 (Failed)
496 - benchmark_her_r12xl (Failed)
Errors while running CTest
make: *** [test] Error 8

Re: Tests time out after build

Posted: 28 Jul 2020, 19:33
by magnus
Great! The benchmark tests are known to fail so I wouldn't worry about those failing.

I thought that we used OpenMPI v3 in our CI but it turns out that we do not. So while I'd expect it to work in general, I cannot say for sure. We use OpenMPI v1.8, v2.1, and v4.0, so those should work.

Re: Tests time out after build

Posted: 29 Jul 2020, 10:29
by dwh1d17
Using opempi v4.0 seems to be the best solution. I built it using gcc 8.2 and now get the following errors.
The following tests FAILED:
51 - energy_stex (Failed)
487 - benchmark_eri_adz (Failed)
488 - benchmark_eri_adzs (Failed)
489 - benchmark_eri_atzs (Failed)
490 - benchmark_eri_r12 (Failed)
491 - benchmark_eri_r12xl (Failed)
492 - benchmark_her_adz (Failed)
493 - benchmark_her_adzs (Failed)
494 - benchmark_her_atzs (Failed)
495 - benchmark_her_r12 (Failed)
496 - benchmark_her_r12xl (Failed)
Errors while running CTest
The only other warning is the following from linking
/usr/bin/ld: warning: libgfortran.so.3, needed by /usr/lib64/libblas.so.3.2.1, may conflict with libgfortran.so.5
Which version of blas and gcc do you test against?

Re: Tests time out after build

Posted: 29 Jul 2020, 11:34
by magnus
At the moment, we test GCC 5-10 but only one per major version. We use Fedora docker images for this. In all cases, we use OpenBLAS together with the system native LAPACK. Not sure which versions though.

The warning you get looks like it could be related to the fact that BLAS has been compiled with a lower version of GCC (the one that comes standard with your system), whereas you use a more recent GCC to compile Dalton.