Page 1 of 1

MEMGET ERROR

Posted: 31 May 2019, 04:44
by wanghong
Dear all:
I encountered a problem when I used dalton2015.0 to do phosphorescence calculation,"Reason: MEMGET ERROR, insufficient work space in memory".I used 32 cores and 15000mb.I don't know what went wrong, and I don't know how to fix it.Does anyone have any ideas? Can you help me?
Attached are the input and output files, and the content in the image is the content in the run.sh file.
Looking forward to your reply. Thank you very much!

Re: MEMGET ERROR

Posted: 31 May 2019, 08:41
by hjaaj
Your output shows that the calculation only used 488 MB and not 15000 MB.

PS. I recommend upgrading to Dalton2018, but that is unrelated to your memory problem.

Re: MEMGET ERROR

Posted: 31 May 2019, 10:46
by wanghong
Thank you very much for your reply.In addition to downloading the new version of Dalton, do you know how to solve my current problem?
I also noticed that no matter how much computing memory I set, the system only runs around 480MB. Is this the system setting?What do I need to do?

Re: MEMGET ERROR

Posted: 31 May 2019, 12:28
by hjaaj
I looked at your output file again. I notice that you have asked for 150GB of memory with 32 bit integers. With 32 bit integers you can max ask for 15GB of memory, corresponding to approx. 2 gigawords in double precision, otherwise you get integer overflow in the addressing og the internal work array. So either you must reduce to max 15GB or compile with 64 bit integers. (I believe we have programmed Dalton 2018 to provide more information about the problem, and not just silently use the default because the specified work memory could not be read with 32 bit integers.)

Re: MEMGET ERROR

Posted: 03 Jun 2019, 09:22
by wanghong
Thank you very much for your reply.I encountered the following problem when updating Dalton, and I don't know how to solve it. Could you give me some advice?

[root@jiewei6 ~]# cmake --version
cmake version 3.14.4

CMake suite maintained and supported by Kitware (kitware.com/cmake).
[root@jiewei6 ~]# git clone --recursive https://gitlab.com/dalton/dalton.git
Cloning into 'dalton'...
fatal: unable to access 'https://gitlab.com/dalton/dalton.git/': Failed connect to gitlab.com:443; Connection timed out

I would appreciate it if you could give me some help.
Looking forward to your reply.

Re: MEMGET ERROR

Posted: 03 Jun 2019, 17:38
by magnus
There was a major Google Cloud outage that affected GitLab among many others, so perhaps that is why it failed. Can you try again?

Re: MEMGET ERROR

Posted: 04 Jun 2019, 07:34
by wanghong
No, not yet. I guess it's the network.I'm trying to figure out how to solve this problem.
Thank you very much for your reply.

Re: MEMGET ERROR

Posted: 24 Jul 2019, 11:42
by bast
Were you able to clone and build the code in the meantime? Anything I can help with?

Re: MEMGET ERROR

Posted: 26 Nov 2019, 22:18
by juanjoaucar
Hi wanghong

I have recently had some problem similar to the one you describe here (by the way, I dont know if you already solved it).
Have you tried running the calculation in less than 32 cores?
In order to clarify something Ive read on the answers, Id like to say here that the output line "* Work memory size : 64000000 = 488.28 megabytes." only refers to the work memory size assigned to each core (the 15.000 mb distributed equally in them all).

Re: MEMGET ERROR

Posted: 27 Nov 2019, 10:49
by hjaaj
Some clarifying comments (I hope):
- the 488.28 MB (64 megawords) is the default for each MPI master and worker
- "dalton -mb 15000" or "dalton -gb 15" will allocate for work memory 15 GB on each MPI master and worker. That is, if you run MPI on 32 cores with shared memory, you will use 32*15 GB of the shared memory = 480 GB. So you should have 512GB to use that option
- "dalton -mb 15000 -nb 2000" will allocate for work memory 15 GB on MPI master and 2 GB on each MPI worker; for MPI on 32 cores 1*15 + 31*2 GB = 77 GB. For most application (LUCITA excluded) 2 GB is enough for each MPI worker, but the master can often benefit from more memory.

Re: MEMGET ERROR

Posted: 27 Nov 2019, 16:01
by juanjoaucar
hjaaj wrote:
27 Nov 2019, 10:49
Some clarifying comments (I hope):
- the 488.28 MB (64 megawords) is the default for each MPI master and worker
- "dalton -mb 15000" or "dalton -gb 15" will allocate for work memory 15 GB on each MPI master and worker. That is, if you run MPI on 32 cores with shared memory, you will use 32*15 GB of the shared memory = 480 GB. So you should have 512GB to use that option
- "dalton -mb 15000 -nb 2000" will allocate for work memory 15 GB on MPI master and 2 GB on each MPI worker; for MPI on 32 cores 1*15 + 31*2 GB = 77 GB. For most application (LUCITA excluded) 2 GB is enough for each MPI worker, but the master can often benefit from more memory.
Thanks for the comments Hans Jørgen. Ive got confused about the -mb option

Re: MEMGET ERROR

Posted: 04 Jul 2020, 10:16
by wanghong
Hello, engineer:
I use a 56 core, 250GB server, The contents of run.sh are as follows. However, when I calculate 30 states, there will always be insufficient memory. Why? How can I solve this problem?Please help me.
Now parameter in rspprp. H file (maxlbl = 100000), parameter in infohso. H file (mxphos = 110).


#!/bin/sh
export PATH=/home/DALTON/build:$PATH
export DALTON_TMPDIR=/tmp/DALTON
export DALTON_LAUNCHER="mpirun -np 56"

dalton -gb 2 -noarch

Re: MEMGET ERROR

Posted: 04 Jul 2020, 20:10
by hjaaj
It fails with insufficient memory because Dalton uses the default of ca. 0.5 GB/node, i.e. 26 GB in total in your case. I do not understand this if you used "dalton -gb 2", did you forget the -gb 2? However, most memory is needed on the master. I would therefore suggest something like: "dalton -mb 15000 -nb 2000" for 15GB on master and 2 Gb on workers.

Hans Jørgen Aa. Jensen, professor in computational chemistry

Re: MEMGET ERROR

Posted: 05 Jul 2020, 04:28
by wanghong
Professor:
Thank you very much for your reply.
When I changed the contents of run.sh. No matter how I set -mb, -nb, the actual work memory size is displayed in the output file: 64000000 = 488.28 megabytes.
What's wrong with my settings? Looking forward to your reply.

Re: MEMGET ERROR

Posted: 05 Jul 2020, 13:24
by wanghong
I seem to have found out why I don't have enough working memory. The last line in the script file is not recognized, which results in insufficient working memory. Is there a problem with my script file, professor?

Re: MEMGET ERROR

Posted: 06 Jul 2020, 07:54
by hjaaj
Very strange. Maybe you could replace "dalton ..." with "bash -x -v dalton ... >& run.log", then the run.log should show what happens.

Re: MEMGET ERROR

Posted: 06 Jul 2020, 08:03
by taylor
How are you executing this script (i.e., do you run via a queueing system)? Many "interactive" shells, which includes using nohup, have inbuilt limits on resources that can be requested (see, e.g., the ulimit command). But I admit this does not seem very likely here, because your run has memory that is an "exact" number of 64-bit words (64000000) which is not a very convincing default number when specified in megabytes. Still, either the operating system or a queueing system, if you use one, might be causing the issue.

We run under SLURM. Because old habits die hard, I use the -mw parameter which specified memory in 64-bit words (not megawords) as (e.g.)

/home/taylor/src/2018/dalton/build_new/dalton \
-mw 2000000000 -o ${SLURM_JOB_NAME}.lis -dal MKcas -mol MKstart

and in the output this gives

* Work memory size : 2000000000 = 14.901 gigabytes.

Best regards
Pete

Re: MEMGET ERROR

Posted: 11 Jul 2020, 04:14
by wanghong
I am deeply grateful for your assistance.
I have solved the difficulty.It is not clear if the.bashrc file affected the permissions or if there were other issues.When I abandoned the run.sh file, I attached the working memory Settings when I submitted the calculation command, so that it could be recognized and run normally.