Dalton on ARM (Raspberry Pi 3): low-hanging fruit

Problems with Dalton installation? Find answers or ask for help here
Post Reply
taylor
Posts: 526
Joined: 15 Oct 2013, 05:37
First name(s): Peter
Middle name(s): Robert
Last name(s): Taylor
Affiliation: Tianjin University
Country: China

Dalton on ARM (Raspberry Pi 3): low-hanging fruit

Post by taylor » 31 Mar 2016, 11:21

I originally posted this to the Dalton developers' forum, but it may be of use to others and I re-post it here in slightly modified form. Partly because it was originally posted there it has something of a different flavour (that is, even less gravitas than my usual postings on the general forum) but I was not prepared to do a lot of editing. So, having got Dalton running on a Raspberry Pi 3 I thought there might be
value in sharing my experiences with the larger community. If you
aren't interested just stop reading here.

The first point to be clear on is that the instructions here apply
only to the Raspberry Pi model 3, which is the latest incarnation. I
emphasize this because earlier models were 32-bit and/or software
floating-point, and I would guess that the performance, compared to
the model 3, would be rubbish. Second, the operating system on the
Pi, "Raspbian", is based on Debian Linux. This is one of my least
favourite flavours of Linux, although to the credit of the Pi mob, it
is at least not Ubuntu, which would likely have put me off buying it
in the first place. But Raspbian, like Ubuntu, thinks that somehow
the risks of the sysadmin screwing up can be reduced by hiding the
root user and letting suitably qualified users execute commands via
"sudo". I can accept the use of sudo in an environment where many
people have subsets of responsibilities for administering a system,
but if (a) there's only two of you, as is the case here, and (b) you
know what you're doing, or at least think you do, a pansy-ass solution
like sudo is just more characters to type. So at the earliest
opportunity we will do away with it here. This means that if you want
to reproduce what I have done and you haven't enabled root, you will
need to precede almost every command by sudo. And possibly type in
your user password to enable it on occasion. In addition, at one
point I wanted to enable mounting of filesystems from other machines
on the Pi using NFS, and the initial response from the system was "you
have to be root to do that", which meant it could not be done via
sudo without modifying the configuration of the latter.

Background: the model 3 Pi has a 64-bit 4-core 1.2GHz clock ARM chip
with (unfortunately) 1GB of RAM which cannot be expanded. This is
perhaps the biggest downside: it would be better off with 4GB. The
end-user supplies "external storage" in the form of an SD card. The
smallest capacity that makes sense is 8GB but a bit bigger makes more
sense. I happened to have a 128GB card lying around (as one does...)
and am using that. Since the Raspberry Pi Foundation has set and
maintained a price point of $35 for the top-of-the-line, currently the
model 3, I reflect that the Pi cost less than half of what I paid for
the 128GB SD card six months ago... For your $35 you also get 4
USB2.0 ports, a 100BaseT (i.e., not gigE) wired Ethernet port, Wi-Fi,
and Bluetooth. The maximum power drain is 12W, although if you aren't
using a lot of the peripherals it will be considerably less.

The easiest way to start is to install Raspbian either from an SD card
you buy with the Pi or by downloading it onto a formatted SD card
yourself. One tip if you go the latter course is that there is no
need to fart about with exFAT or any of that other crap designed to
extend the shelf-life of what was originally a 32-bit operating system
designed near Seattle if you have a card larger than VFAT will handle.
Instead, format a partition (up to 32GB in size) on the SD card as
VFAT and copy the Raspbian download to this partition. When you
insert the card into the Pi and install it will reformat it as ext4
and will expand that filesystem to the full size of the card by
default. The install for me goes completely seamlessly and the Pi
recognizes immediately our wired network, our wireless network, and my
USB combined micro keyboard/trackpad (Riitek: cost more than the Pi!)
without any further need for intervention.

The install is (at least as I see it) worryingly incomplete, but this
is easily fixed. First, to avoid any of that sudo crap in future
execute

Code: Select all

sudo passwd
which will let you define a root password so you can get serious. The
root account is enabled but the box ships with a scrambled random root
password, so you have to change it. (By the way, the box ships with a
user "pi" whose password is "raspberry": you might want to take a
moment at this point to make that a little more secure...). Then as
root

Code: Select all

apt upgrade
(which will probably come back with nothing since the downloaded
Raspbian is usually completely up-to-date) and then

Code: Select all

apt install emacs
which provides you now with a properly functioning system. You can
also use the graphical software installer if you log in via X,
although I thought this was a bit clumsy, unlike say Synaptics on a
regular Debian system. There are a few other things you may want to
fix at this stage, but they are not relevant to Dalton and I defer
them to a P.S. at the bottom of this posting.

You should find that gcc is installed but gfortran is not, so one
needs to install the latter. Irritatingly the package installer
installs the binary as /usr/bin/gfortran-4.9 --- you can either link this
directly to /usr/bin/gfortran or via /etc/alternatives. I chose the
latter, although rusty with all this because on our x86 boxes at home
we use modules to deal with versioning. Do not install the ATLAS
distribution that the package manager locates! This was built long
before the current chip was available and the performance will be
rubbish. If you are not sure of where you stand here, write a
twenty-liner code that calls DGEMM. Most likely, if you haven't
installed ATLAS, the link step will fail. But if it succeeds, run the
resulting binary. Most likely the best performance you will see will
be about 280 MFLOPS, which guarantees you have the prebuilt, useless
ATLAS.

While you are installing software you may as well do

Code: Select all

apt install cmake
because you'll need it for building Dalton.

Your first need is to turn off "CPU throttling", that is, idling the
CPU when it "thinks" it's not actually doing anything. Leaving
throttling enabled makes it impossible for ATLAS to do any
optimization, and in fact ATLAS won't even try to build when it
detects throttling. To disable throttling add the line

Code: Select all

force_turbo=1
to the file /boot/config.txt and reboot.

Download ATLAS source from sourceforge and unpack it on the Pi.
Download the LAPACK source from netlib.org as a tgz and leave it
compressed --- the build knows how to deal with it. You will need to
pay attention to the following. Download also the file
http://math-atlas.sourceforge.net/fixes ... rchdef.tar and
untar it in a directory of your choice named for our purposes in what
follows as path_to_ARMHARDFPdir. Edit the file
ATLAS/CONFIG/src/atlcomp.txt and replace every occurrence of
"-mfloat-abi=softfp" with "-mfloat-abi=hard". There is no typo here,
by the way: it is not "hardfp", just "hard"! Then to build ATLAS you
should cd to the build directory and run their configure script with
at least the following options

Code: Select all

./configure -b 64 -D c -DATL_ARM_HARDFP=1 -Ss ADdir \
<path_to_ARMHARDFPdir> --prefix=<install_dir> \
--with-netlib-lapack-tarfile=<path_to_lapack.tgz>
After that,

Code: Select all

make build
make check
make ptcheck
make time
make install
As usual the "make build" step takes some time, although I thought
rather less than the last time(s) I built it on x86 boxes (we
invariably use MKL these days on our x86s so I was a bit out of
practice). The "make time" step wants the clock frequency, which is
as I noted above 1200MHz. If you want to make your own performance
checks, go back to your wrapper that calls DGEMM. Unlike the anaemic
280MFLOPS obtained with ATLAS from the Raspbian repo, you should see
something around 1.4GFLOPS with the single-threaded BLAS (via
libf77blas.a) and about 5.2GFLOPS with the threaded libptf77blas.a. I
have not bothered building shared-object libraries, although you can,
but be aware that there are potential compromises then in the
optimization process during the build, and it may be that the
performance is not quite as good. As I said, I haven't tried.

We are now ready for the final lap. I freely admit I still have not
come to terms with cmake (as I've been saying for several years, I
bought a book, but I haven't read it yet...) and I spent a certain
amount of time initiall trying to hack things directly at Makefile
level (boy, those autogenerated Makefiles are something --- 400KB and
more...!). But in fact cmake (despite the implication in the
aforesaid book that it will not pick up the machine type correctly
because of the output of the uname command) works fine straight off,
provided you add "-mfloat-abi=hard -mcpu=cortex-a8 -mfpu=vfpv3" to the
setup step as both --extra-fc-flags and --extra-cc-flags. If you're
proposing also to have a go at LSDalton (which I haven't done, so this is
just one step on that road) you will need also to specify
those same flags in --extra-cxx-flags. You can then set about the
build in the usual way with

Code: Select all

cd build
make
Obviously you need to make sure you are eventually going to link
against your sparkling new ATLAS libraries (see below), but before
that if your software is the same version as mine you will find
repeated problems trying to compile the C routines in the dft
directory, at least with optimization at -O3. Some produce an
internal compiler error, some seem to reult in the compiler going into
an infinite loop (even on a 1.2 GHz ARM chip it seems unlikely that
compiling a few thousand lines of C should require more than 30
minutes, which was when my patience ran out). I ended up building all
of the DFT C routines "by hand" with -O1 instead of -O3 in the gcc
invocation, but Rob De Remigio posted a much more elegant way to do
this by defining additional variables to cmake and if you want that let
me know.

I referred above to the link step. Even if setup finds your built
ATLAS (as it did mine) it produces a link command that specifies
libraries in the "wrong" order (recall libraries are only searched
when encountered on the link command and are not revisitied
subsequently). As a result you get a shitload of unsatisfied
externals which are easily resolved by reentering the link line (which
I did by hand) but ensuring that libraries are visited in the order
(I had set MATH_ROOT)

Code: Select all

$MATH_ROOT/liblapack.a $MATH_ROOT/libf77blas.a $MATH_ROOT/libatlas.a -lpthread -lm
or replace libf77blas.a with libptf77blas.a if you want the threaded
version.
You should end up with a functioning dalton.x. Run the test
suite with

Code: Select all

make test
and wait. For a while... But for me virtually everything passed
except for a few timeouts. Only one test (choact_mp2) actually
failed.

For this expenditure of effort (and it took me a couple of days but
hopefully with this handy cut-out guide it will only take you a few
hours, excluding the time ATLAS is building, and that you can always
spend in the pub) you get a $35 computer that runs at about 17-25% of
a typical desktop on Dalton, on a core-to-core comparison (apples-to
apples --- see what I did there?), but I note that x86 MKL
performance, especially threaded, is well ahead of even threaded ATLAS
on the Pi. The Dalton performance varies significantly: the entire
test suite takes about six times on one of our x86s, but on more
"realistic" tests, that is, larger than the test cases, I have seen
20-25% of the x86 performance. So no speed demon, but, nevertheless,
$35. It costs us more than that to get take-out from one of our local
Thai restaurants, and that's just food, not drink cost, and we're not
big eaters! But more importantly, it's not just the cost. This is
one path (recall I said above that the total power draw with all ports
in use was only 12W) that we may have to go down to solve the
FLOPS/watt problem that makes it highly unlikely that exascale can be
reached via x86. Another path is of course Xeon Phi/GPUs, but at this
point I don't think anyone can call a decision on this. And if the
above works for ARM, something analogous, and, given the common
software platform, likely easier, will work for Intel Atom. So even
if you think this is all irrelevant, well, in a few years time you
maybe facing it directly...

Best regards
Pete

P.S. I mentioned up at the top one or two other things you might want
to do with your Pi. It comes prepared to "decide" whether to route
audio throught the HDMI port or the minijack automagically. This is
probably (certainly, in my case) not what you want, and can be changed
using the command raspi-config, which looks a lot like the old-school
Red Hat installer. The audio is under the tab Advanced Options. I
also dislike XDM-style logins and this is easily disabled in the usual
systemctl way by changing the default target. As in modern distros
(sigh) invoking an X session actually sits on top of the virtual
console you were on instead of the one six locations further up, so if
you are on VT1 and do startx it opens on VT1, not VT7 as God intended.
This is not a major issue if you are using a regular keyboard, but it
caused some complications for me with the micro keyboard, because
unfortunately the "Alt" key on that produces a right-Alt, not
left-Alt, keycode, and so Ctrl-Alt-Fn does not work. I have to use
chvt to switch virtual consoles and then you better know that your X
session is not on VT7, because if you chvt to VT7, well you're dead...
Rebooting by powering off and on seems the only option in that case.

Also, when I bought my Pi I did not pay enough attention to the video
side of things. The board has an HDMI port and I simply went with the
default, which is an HDMI cable both ends. I guess this is OK if you
have an HDMI socket on a TV and want to use it as a display, but we don't.
I had to nip round the corner to the nearest computer store and buy an
HDMI/DVI adapter to use it with a regular computer-style display:
the adapter cost me more than half the cost of the Pi!
If you want to use it with a monitor, get an HDMI to DVI or an HDMI to
VGA cable in the first place. Most Pi vendors have all these things and
they are cheaper that way.

Post Reply

Who is online

Users browsing this forum: No registered users and 3 guests