General considerations on printing of results

Post here if you would like to suggest new features for Dalton
Post Reply
danielhfriese
Posts: 23
Joined: 09 Mar 2015, 15:28
First name(s): Daniel Henrik
Last name(s): Friese
Affiliation: Scientist
Country: Norway

General considerations on printing of results

Post by danielhfriese » 29 Jul 2015, 09:13

Hi all,

Some of you might rememember my posting on a better output for TPCD. Following the suggestions here I wrote the output myself and everything is fine. This posting is somewhat similar in topic but more general. I am not really sure if it really fits into this part of the forum, however I think it is an important point developers of new features should have in mind.

My point is on general readability of the output. Although my problems with TPCD are solved I will take this as an example. Properties like this often require e.g. rotational averaging (as it is done in the program e.g. for two-photon absorption) and therefore the elements of the relevant tensors have to be gathered and postprocessed. And this is the important point: Tensor elements from Dalton (as in the case for TPCD) are difficult to gather. Before I implemented my own output for this I could only get the tensor elements from the output by grepping in loops according to

Code: Select all

grep 'omega B' $1 | head -n $stateblock | tail -n $relevantblock | head -n $indexbundle | tail -n 6  | cut -b 63-75 >> rotstr_dipvel
grep 'omega B' $1 | head -n $stateblock | tail -n $relevantblock | head -n $indexbundle | tail -n 9  | head -n 3 | cut -b 63-75 >> angmom_dipvel
grep 'omega B' $1 | head -n $stateblock | tail -n $relevantblock | head -n $indexbundle | tail -n 12 | head -n 3 | cut -b 63-75 >> dipvel_dipvel
The only "signal word" for grepping the tensor elements from the output was "omega B" as an output line for the tensor elements looks like this

Code: Select all

omega B, excitation energy, moment :    0.064106    0.128212    1.110576
This line does not contain any single information about the perturbations involved in this tensor element. It would be a good thing to have an output like this

Code: Select all

sym, state, pert 1, pert 2, freq 1, freq 2, 2nd order moment:  1 1, XDIPLEN XDIPLEN 0.064106 0.064106  1.110576
Then all information about the tensor element is gathered in one line which makes postprocessing way easier.

I know that I can (and I am always allowed) to change the code but I think it would be a good thing for the user in general if developers of new features could keep in mind the needs of postprocessing when they write an output. This includes a general "greppability" (The possibility to gather results by grep from the output file) and also a reasonable order of the results.

To stress again: This does not focus on TPCD as the problems there have been solved. It is a more general thing and TPCD was a good example for this.

kennethruud
Posts: 241
Joined: 27 Aug 2013, 16:42
First name(s): Kenneth
Last name(s): Ruud
Affiliation: UiT The Arctic University of Norway
Country: Norway

Re: General considerations on printing of results

Post by kennethruud » 29 Jul 2015, 09:25

Hi!

Let me perhaps in this context note that in quadratic and cubic response calculations, a file is created called RESULTS.RSP. I am not going to say this is perfect for grepping either, but it used by Dalton itself to test whether a result matches that of a component already calculated (and is thus also for instance used during restarts if the file exists), and considering the limited functionality of F77 of text parsing, it should solve a few of your needs in terms of post-processing.

I think it is important to distinguish between user-friendliness and computer-friendliness, and I would rather have computer-friendly output separated (in a separate file) from user-friendly output. Of course, it should be documented that these computer-friendly files exist :-)


Best regards,

Kenneth

danielhfriese
Posts: 23
Joined: 09 Mar 2015, 15:28
First name(s): Daniel Henrik
Last name(s): Friese
Affiliation: Scientist
Country: Norway

Re: General considerations on printing of results

Post by danielhfriese » 29 Jul 2015, 09:28

Then I would ask the question if the circumstances I described are really computer-friendlyness. As it has to do with the postprocessing performed bz the user (using a computer anyway) I would also consider this to be user-friendlyness. But of course it would be best to have outputs which are both human-readable and computer-readable.

kennethruud
Posts: 241
Joined: 27 Aug 2013, 16:42
First name(s): Kenneth
Last name(s): Ruud
Affiliation: UiT The Arctic University of Norway
Country: Norway

Re: General considerations on printing of results

Post by kennethruud » 29 Jul 2015, 09:38

I consider this a problem of computer-friendliness: You would not really care about whether the data is easy to process by a human, you do not need explanations of what is printed (except in a manual), you need a file which is easy to process e.g. by scripting languages. What you instead print in an output (e.g. not printing zero elements) is different.

Kenneth

bast
Posts: 1197
Joined: 26 Aug 2013, 13:22
First name(s): Radovan
Last name(s): Bast
Affiliation: none
Country: Germany

Re: General considerations on printing of results

Post by bast » 29 Jul 2015, 09:40

hi,
in my understanding grep-ability is computer-friendliness. human-friendliness is grep-ability for the human eye
and the human eye may not be happy about very long lines repeating something that could be stated
once above the block of numbers. this is easy to parse for the human eye but of course hard for the computer.
i support Kenneth (and Ulf Ekstrom who is advocating this within DIRAC since many years) to separate
computer-friendly output (IMO ideally in JSON or XML format) and human-friendly output.
human-friendly output can be generated from computer-friendly but not necessarily the other way.
radovan

taylor
Posts: 525
Joined: 15 Oct 2013, 05:37
First name(s): Peter
Middle name(s): Robert
Last name(s): Taylor
Affiliation: Tianjin University
Country: China

Re: General considerations on printing of results

Post by taylor » 29 Jul 2015, 12:27

So far one of the most important aspects of post-processing output files has not been mentioned, and that is avoiding transcription errors. The more work that is done by hand, so to speak, the more chance for errors to creep in: the reductio ad absurdum here would be writing results down by hand while looking at the output file on the screen --- sooner or later... Charlie Bauschlicher and I used to press this point with some of our colleagues at NASA Ames years ago, for example. And even some computerized post-processing is not immune from issues: I recall using Excel (in about 1990, this was) to generate some higher-order polarizabilities from finite-field calculations, and discovering, by having to publish an erratum, that the number of digits Excel carried (about 11, at the time) was not enough to guarantee significance in our published values to better than a few percent.

So I am a big fan of having a result file, not necessarily the actual output file, available for post-processing. As someone who is (far too) often on travel I prefer an output file that is formatted for a modest-sized screen: for a long time we kept to 80 chars width although that now seems to be violated in various places --- not a problem on a big screen but sometimes a nuisance on a laptop. But the "result" file to be post-processed could of course be much less constrained. I would not stop anyone creating such a thing in JSON or XML, but in that case unless there is a tool available to convert it to plain text I would get no use from it. I don't care how modest the effort is to learn JSON or XML, there are (for instance) various aspects of quantum chemistry, or computational mathematics, or group theory that I don't know, and until I have learned those things I would not waste a microsecond learning XML and other mark-up formats: that's just my prioritization.

I emphasize plain text because this provides unmatched flexibility for the user (stressing the user, not the computer). While I think grep itself is a bit basic for this sort of thing, there are plenty of more advanced tools (my partner uses perl or, for simple tasks, awk for post-processing, for example). I myself have yet to find anything I need to do that cannot be done in Fortran90 (indeed, some musings on that might form the basis for next year's Pete's Annual Old-Fart Posting...), and so I have not bothered with anything else. People talk about the complexity of doing various things with it compared to other approaches, but this has never been a problem for me, and anyway that has led me to write a good deal of code that I reuse frequently in other programs.

So: a good result file, in plain text, to allow everyone to post-process it and to avoid transcription errors seems to me the most desirable path, although a result file in some more sophisticated format but which could be transformed into plain text with trivial effort would do as well.

Best regards
Pete

bast
Posts: 1197
Joined: 26 Aug 2013, 13:22
First name(s): Radovan
Last name(s): Bast
Affiliation: none
Country: Germany

Re: General considerations on printing of results

Post by bast » 29 Jul 2015, 12:56

dear Pete,
just a quick clarification on my point about JSON/XML: i am not suggesting that the user
should be presented with output in JSON format. i don't enjoy reading any of those formats myself
and much prefer the good old plain text.
it's just that JSON/XML can easily be mapped to data used in post-processing/visualization.
libraries exist and the data can then be read in with one-liners.
i support that the user output should be in plain text.
and i am looking forward to the F90 P.A.O.F. posting!
best greetings,
radovan

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest