MEMGET ERROR

Find answers or ask questions regarding Dalton calculations.
Please upload an output file showing the problem, if applicable.
(It is not necessary to upload input files, they can be found in the output file.)

Post Reply
wanghong
Posts: 25
Joined: 30 May 2019, 09:01
First name(s): hong
Last name(s): wang
Affiliation: Nanjing Tech University
Country: China

MEMGET ERROR

Post by wanghong » 31 May 2019, 04:44

Dear all:
I encountered a problem when I used dalton2015.0 to do phosphorescence calculation,"Reason: MEMGET ERROR, insufficient work space in memory".I used 32 cores and 15000mb.I don't know what went wrong, and I don't know how to fix it.Does anyone have any ideas? Can you help me?
Attached are the input and output files, and the content in the image is the content in the run.sh file.
Looking forward to your reply. Thank you very much!
Attachments
4-3C2.out
(208.76 KiB) Downloaded 85 times
C2.dal
(170 Bytes) Downloaded 80 times
8RACGI3I]I1HE6EZA92Q0Z6.png
8RACGI3I]I1HE6EZA92Q0Z6.png (5.99 KiB) Viewed 5347 times
Last edited by wanghong on 28 Jun 2019, 07:47, edited 1 time in total.

hjaaj
Posts: 360
Joined: 27 Jun 2013, 18:44
First name(s): Hans Jørgen
Middle name(s): Aagaard
Last name(s): Jensen
Affiliation: Universith of Southern Denmark
Country: Denmark

Re: MEMGET ERROR

Post by hjaaj » 31 May 2019, 08:41

Your output shows that the calculation only used 488 MB and not 15000 MB.

PS. I recommend upgrading to Dalton2018, but that is unrelated to your memory problem.

wanghong
Posts: 25
Joined: 30 May 2019, 09:01
First name(s): hong
Last name(s): wang
Affiliation: Nanjing Tech University
Country: China

Re: MEMGET ERROR

Post by wanghong » 31 May 2019, 10:46

Thank you very much for your reply.In addition to downloading the new version of Dalton, do you know how to solve my current problem?
I also noticed that no matter how much computing memory I set, the system only runs around 480MB. Is this the system setting?What do I need to do?

hjaaj
Posts: 360
Joined: 27 Jun 2013, 18:44
First name(s): Hans Jørgen
Middle name(s): Aagaard
Last name(s): Jensen
Affiliation: Universith of Southern Denmark
Country: Denmark

Re: MEMGET ERROR

Post by hjaaj » 31 May 2019, 12:28

I looked at your output file again. I notice that you have asked for 150GB of memory with 32 bit integers. With 32 bit integers you can max ask for 15GB of memory, corresponding to approx. 2 gigawords in double precision, otherwise you get integer overflow in the addressing og the internal work array. So either you must reduce to max 15GB or compile with 64 bit integers. (I believe we have programmed Dalton 2018 to provide more information about the problem, and not just silently use the default because the specified work memory could not be read with 32 bit integers.)

wanghong
Posts: 25
Joined: 30 May 2019, 09:01
First name(s): hong
Last name(s): wang
Affiliation: Nanjing Tech University
Country: China

Re: MEMGET ERROR

Post by wanghong » 03 Jun 2019, 09:22

Thank you very much for your reply.I encountered the following problem when updating Dalton, and I don't know how to solve it. Could you give me some advice?

[root@jiewei6 ~]# cmake --version
cmake version 3.14.4

CMake suite maintained and supported by Kitware (kitware.com/cmake).
[root@jiewei6 ~]# git clone --recursive https://gitlab.com/dalton/dalton.git
Cloning into 'dalton'...
fatal: unable to access 'https://gitlab.com/dalton/dalton.git/': Failed connect to gitlab.com:443; Connection timed out

I would appreciate it if you could give me some help.
Looking forward to your reply.

User avatar
magnus
Posts: 514
Joined: 27 Jun 2013, 16:32
First name(s): Jógvan Magnus
Middle name(s): Haugaard
Last name(s): Olsen
Affiliation: Hylleraas Centre, UiT The Arctic University of Norway
Country: Norway

Re: MEMGET ERROR

Post by magnus » 03 Jun 2019, 17:38

There was a major Google Cloud outage that affected GitLab among many others, so perhaps that is why it failed. Can you try again?

wanghong
Posts: 25
Joined: 30 May 2019, 09:01
First name(s): hong
Last name(s): wang
Affiliation: Nanjing Tech University
Country: China

Re: MEMGET ERROR

Post by wanghong » 04 Jun 2019, 07:34

No, not yet. I guess it's the network.I'm trying to figure out how to solve this problem.
Thank you very much for your reply.

bast
Posts: 1210
Joined: 26 Aug 2013, 13:22
First name(s): Radovan
Last name(s): Bast
Affiliation: none
Country: Germany

Re: MEMGET ERROR

Post by bast » 24 Jul 2019, 11:42

Were you able to clone and build the code in the meantime? Anything I can help with?

juanjoaucar
Posts: 5
Joined: 24 Jul 2019, 19:51
First name(s): Juan
Middle name(s): Jose
Last name(s): Aucar
Affiliation: AFA
Country: Argentina

Re: MEMGET ERROR

Post by juanjoaucar » 26 Nov 2019, 22:18

Hi wanghong

I have recently had some problem similar to the one you describe here (by the way, I dont know if you already solved it).
Have you tried running the calculation in less than 32 cores?
In order to clarify something Ive read on the answers, Id like to say here that the output line "* Work memory size : 64000000 = 488.28 megabytes." only refers to the work memory size assigned to each core (the 15.000 mb distributed equally in them all).

hjaaj
Posts: 360
Joined: 27 Jun 2013, 18:44
First name(s): Hans Jørgen
Middle name(s): Aagaard
Last name(s): Jensen
Affiliation: Universith of Southern Denmark
Country: Denmark

Re: MEMGET ERROR

Post by hjaaj » 27 Nov 2019, 10:49

Some clarifying comments (I hope):
- the 488.28 MB (64 megawords) is the default for each MPI master and worker
- "dalton -mb 15000" or "dalton -gb 15" will allocate for work memory 15 GB on each MPI master and worker. That is, if you run MPI on 32 cores with shared memory, you will use 32*15 GB of the shared memory = 480 GB. So you should have 512GB to use that option
- "dalton -mb 15000 -nb 2000" will allocate for work memory 15 GB on MPI master and 2 GB on each MPI worker; for MPI on 32 cores 1*15 + 31*2 GB = 77 GB. For most application (LUCITA excluded) 2 GB is enough for each MPI worker, but the master can often benefit from more memory.

juanjoaucar
Posts: 5
Joined: 24 Jul 2019, 19:51
First name(s): Juan
Middle name(s): Jose
Last name(s): Aucar
Affiliation: AFA
Country: Argentina

Re: MEMGET ERROR

Post by juanjoaucar » 27 Nov 2019, 16:01

hjaaj wrote:
27 Nov 2019, 10:49
Some clarifying comments (I hope):
- the 488.28 MB (64 megawords) is the default for each MPI master and worker
- "dalton -mb 15000" or "dalton -gb 15" will allocate for work memory 15 GB on each MPI master and worker. That is, if you run MPI on 32 cores with shared memory, you will use 32*15 GB of the shared memory = 480 GB. So you should have 512GB to use that option
- "dalton -mb 15000 -nb 2000" will allocate for work memory 15 GB on MPI master and 2 GB on each MPI worker; for MPI on 32 cores 1*15 + 31*2 GB = 77 GB. For most application (LUCITA excluded) 2 GB is enough for each MPI worker, but the master can often benefit from more memory.
Thanks for the comments Hans Jørgen. Ive got confused about the -mb option

wanghong
Posts: 25
Joined: 30 May 2019, 09:01
First name(s): hong
Last name(s): wang
Affiliation: Nanjing Tech University
Country: China

Re: MEMGET ERROR

Post by wanghong » 04 Jul 2020, 10:16

Hello, engineer:
I use a 56 core, 250GB server, The contents of run.sh are as follows. However, when I calculate 30 states, there will always be insufficient memory. Why? How can I solve this problem?Please help me.
Now parameter in rspprp. H file (maxlbl = 100000), parameter in infohso. H file (mxphos = 110).


#!/bin/sh
export PATH=/home/DALTON/build:$PATH
export DALTON_TMPDIR=/tmp/DALTON
export DALTON_LAUNCHER="mpirun -np 56"

dalton -gb 2 -noarch
Attachments
FM.mol
(2.58 KiB) Downloaded 3 times
FM.dal
(209 Bytes) Downloaded 9 times
FM.out
(143.46 KiB) Downloaded 5 times

hjaaj
Posts: 360
Joined: 27 Jun 2013, 18:44
First name(s): Hans Jørgen
Middle name(s): Aagaard
Last name(s): Jensen
Affiliation: Universith of Southern Denmark
Country: Denmark

Re: MEMGET ERROR

Post by hjaaj » 04 Jul 2020, 20:10

It fails with insufficient memory because Dalton uses the default of ca. 0.5 GB/node, i.e. 26 GB in total in your case. I do not understand this if you used "dalton -gb 2", did you forget the -gb 2? However, most memory is needed on the master. I would therefore suggest something like: "dalton -mb 15000 -nb 2000" for 15GB on master and 2 Gb on workers.

Hans Jørgen Aa. Jensen, professor in computational chemistry

wanghong
Posts: 25
Joined: 30 May 2019, 09:01
First name(s): hong
Last name(s): wang
Affiliation: Nanjing Tech University
Country: China

Re: MEMGET ERROR

Post by wanghong » 05 Jul 2020, 04:28

Professor:
Thank you very much for your reply.
When I changed the contents of run.sh. No matter how I set -mb, -nb, the actual work memory size is displayed in the output file: 64000000 = 488.28 megabytes.
What's wrong with my settings? Looking forward to your reply.
Attachments
FM-1.out
(139.07 KiB) Downloaded 3 times
run.sh.png
run.sh.png (7.75 KiB) Viewed 74 times

wanghong
Posts: 25
Joined: 30 May 2019, 09:01
First name(s): hong
Last name(s): wang
Affiliation: Nanjing Tech University
Country: China

Re: MEMGET ERROR

Post by wanghong » 05 Jul 2020, 13:24

I seem to have found out why I don't have enough working memory. The last line in the script file is not recognized, which results in insufficient working memory. Is there a problem with my script file, professor?

hjaaj
Posts: 360
Joined: 27 Jun 2013, 18:44
First name(s): Hans Jørgen
Middle name(s): Aagaard
Last name(s): Jensen
Affiliation: Universith of Southern Denmark
Country: Denmark

Re: MEMGET ERROR

Post by hjaaj » 06 Jul 2020, 07:54

Very strange. Maybe you could replace "dalton ..." with "bash -x -v dalton ... >& run.log", then the run.log should show what happens.

taylor
Posts: 576
Joined: 15 Oct 2013, 05:37
First name(s): Peter
Middle name(s): Robert
Last name(s): Taylor
Affiliation: Tianjin University
Country: China

Re: MEMGET ERROR

Post by taylor » 06 Jul 2020, 08:03

How are you executing this script (i.e., do you run via a queueing system)? Many "interactive" shells, which includes using nohup, have inbuilt limits on resources that can be requested (see, e.g., the ulimit command). But I admit this does not seem very likely here, because your run has memory that is an "exact" number of 64-bit words (64000000) which is not a very convincing default number when specified in megabytes. Still, either the operating system or a queueing system, if you use one, might be causing the issue.

We run under SLURM. Because old habits die hard, I use the -mw parameter which specified memory in 64-bit words (not megawords) as (e.g.)

/home/taylor/src/2018/dalton/build_new/dalton \
-mw 2000000000 -o ${SLURM_JOB_NAME}.lis -dal MKcas -mol MKstart

and in the output this gives

* Work memory size : 2000000000 = 14.901 gigabytes.

Best regards
Pete

Post Reply

Who is online

Users browsing this forum: Majestic-12 [Bot] and 7 guests