Python for Astrostatistics - Linux Installation

Note: These instructions presume you are logged in as "root" on your machine (e.g., via "su -"). It is possible to install this software as a non-root user, in your home directory, and adjust your environment variables so it will run from there. I haven't followed this procedure myself and cannot help with it here.
See the Python Statistical Computing Essentials page if you'd like to learn more about this software before or after installation.

Install Python

Most linux distributions come with a version of Python installed; many use it as part of their package management systems. The distributed version usually lags the current version by a minor version number. The current Python version is 2.4.2 (i.e., minor version "4"). You need at least 2.3 to run the software we'll use. Even if you have a 2.3.x version, upgrading to a 2.4.x version is usually so painless that you might as well do it. You can probably find a binary package for your platform that you can install very simply with a package manager like rpm, yum, or apt. The download page may help you locate a package. It also hosts the source distribution. On a fast machine, it builds very quickly (10 min), via the standard "./configure - make - make install" process; the distribution comes with detailed instructions (including how to run a test suite before the install step).

If your linux has an installed Python, you should not overwrite it with a new version; the OS may count on its particular installation. The OS typically installs Python in /usr/bin (e.g., /usr/bin/python for Red Hat and Fedora Core). The installer installs by default in /usr/local/bin. Thus it will not overwrite the OS version. This is typically true of binary packages as well. To make sure you use any new version you install, make sure /usr/local/bin is before /usr/bin in your shell's PATH variable.

An option worth considering is ActiveState's ActivePython, a commercially-supported but free Python binary installation available for many platforms. Besides Python and the standard library, it also includes some extra 3rd-party packages, and a collection of documentation. It is very easy to install. Unfortunately, due to license issues, it does not include readline, a module facilitating editing at the interactive Python prompt via the GNU readline package. You will really miss this if you don't install it; you'll have to locate a copy of the binary library for your platform and add it to ActivePython to get this useful capability.

Prepare for Package Installation

You will need to be able to install Python packages from source; that is, you will run (simple!) commands that compile C, C++, and Fortran source code. The latest versions of some linux distributions have made a major upgrade from gcc-3.x to gcc-4. The earliest versions of gcc-4 had bugs that will prevent proper operation of some Python software (and much other software). You will likely also have gcc-3.x. If you want to play it safe, use the appropriate command for your distribution to set the default gcc version to 3.x (for whatever "x" you have). If you have the latest gcc-4 version, you may be fine, but you've been forewarned. Exactly how to go about using an earlier gcc than the default version varies with the linux distribution; many distributions include an earlier version and you can make it the default by defining an environment variable. Search the web for info for your platform. For example, here is a web page addressing gcc selection for Fedora Core 4 and SuSE 10.0.

For the best performance, consider installing the ATLAS linear algebra library and the FFTW Fourier transform library. ATLAS is large and complicated to build from source. If you can find a binary for your platform, install it. Install it from source only if you feel a bit courageous (it's not difficult, but it may require some nontrivial interaction with the install process). FFTW is easy to install; use version 2, not the new version 3. NumPy/SciPy will work without these libraries, but they can speed some calculations up by significant factors if present.

You will soon be editing files containing Python source code. It will be handy to use an editor with a Python mode. There are many choices; some that I have used are xemacs, gedit (both included with many linux distributions), jEdit, and Eclipse with its pydev plugin (huge but feature-filled; good for organizing multi-file projects). If you are an emacs/xemacs fan, there is a useful Python mode for emacs. The IPython page hosts a widely-used, self-contained version: python-mode.el. You might also consider a newer, multi-file version that offers more functionality, just released by the python-mode project.

The matplotlib plotting package needs a GUI tool set to manage its plotting window. Two popular choices on linux are Tcl/Tk and GTK. Make sure you have recent installs of Tcl/Tk and GTK2+ on your machine (they needn't be the very latest versions). If you'd like to use matplotlib's GTK backend, you'll also need to install PyGTK, Python's GTK interface. A recent version of PyGTK is included in the tarball mentioned below; see the instructions below for installation.

The matplotlib package also requires the freetype (v. 2.1.7 or later), libpng, and zlib libraries. Most linux distributions ship with these. Make sure you installed the developer version with your package manager (e.g., freetype-devel). More info on matplotlib's requirements are at the matplotlib installation page.

Install Packages for Scientific Computing

For some packages, you should be using a recent developer snapshot from the developer SVN or CVS archive, rather than the last official release, due to many recent innovations and bugfixes. I've packaged all the releases in a single gzipped tarball. Download it and unpack it ("tar zxf samsi-dist.tar.gz") somewhere on your hard drive where you have substantial free space (300 MB or so). This will create a directory "samsi-dist" containing all the software. Then, in the following order, in a Terminal window, "cd" into the following directories and run the following install commands. The numbers in brackets indicate the [minutes:seconds] the command took on my dual 2 GHz processor workstation, to give you some idea of what to expect. Commands with no brackets run very quickly. You may delete each directory after the installation if you'd like to free the space (if you keep them, you can update the contents later via SVN or CVS and rebuild). You can safely ignore the many innocuous compiler warnings that will appear during some install procedures (they're warnings, not errors; real errors will abort the install process).

If you have trouble with the installation, you can ask for help on one of the following email support lists. Please contact me for help only as a last resort, or if you think there is a mistake in these instructions; I will be in very limited email contact until the CASt/SAMSI School is over.

Some packages come with test suites that you can run to verify the installation. I describe some below. I recommend you keep a separate shell open in which to run the tests. This will ensure that Python is finding the package installed in its site-packages directory, and not in the distribution directory. (Most distributions can't confuse Python this way, but better safe than sorry.)

Now to the package installations, by directory name under the unpacked distribution:


This build is quite fast and will get you the latest bugfixes. If you dread building from source, RPM and EXE packages of the last official release are available at the SourceForge NumPy site.

To build from source (in the numpy directory):

python install [0:49]

If you'd like to verify the installation, try this in your home directory (">>>" is the Python prompt; hit Ctl-D to quit Python at the end):

>>> import numpy
>>> numpy.__version__
>>> numpy.test(1,1)

It will print out a lot of information, including several lines of periods. Each "." indicates a passed test. There may be a few warnings, errors or failures among the ~200 tests (they are being sorted out and won't affect our basic use). On my FC3 machine all tests pass with this numpy release.


Work on binary packages of SciPy is in progress; you might find one for your platform at the SciPy download page. But if you installed NumPy from source, it's safest to install SciPy from source, as follows:

python install [4:44]

Test with:

>>> import scipy
>>> scipy.__version__
>>> scipy.test()

For a lengthier suite of tests, you may also try "scipy.test(10)". Around 1000 tests will be run; as above you can ignore the small number of possible failures for now (all tests pass). Among the tests are a large number of tests of random number generators. By the very nature of randomness, one will occassionally fail a test at the 0.1% level, given the large number performed. So if any failures concern you, try running the tests again to see if the failure recurs.


python install --gencode [1:46]

Test with:

>>> import numarray.testall as testall
>>> testall.test()


This is a few versions behind the current version, to suit users who don't have the latest version of GTK2+. If you keep your GTK up-to-date and want the latest support, visit the PyGTK web site for the latest version of PyGTK. This is the biggest package in this series of installations, and you do not need it if you'll be happy with Tcl/Tk as your plotting GUI. Since it is optional, you'll find it as a gzipped tar archive that you'll have to unpack. Then in the pygtk-2.4.1 directory:

./configure --prefix=/usr/local  [10:08]
make [50m]
make install [3m]


python build [2:21]
python install

Test it by going into the "examples" directory and trying to plot one of the examples. Try:

cd examples

In the course of making your first plot, matplotlib will create a ".matplotlib" directory in your home directory where it can access a startup profile and cache various information. You will likely see a bunch of warnings referring to obscure fonts it cannot find as it tries to build a font cache; you can safely ignore these! You should soon see a plot; when you close it, the Python command will terminate.

At this point, return to the install directory, where you'll find a default startup profile, "matplotlibrc". Copy this to your .matplotlib directory:

cd ..
cp matplotlibrc ~/.matplotlib

Use your favorite text editor to edit the copy in your .matplotlib directory (not the original!). About 30 lines down from the top you should find two lines that look like this:

backend      : TkAgg  
numerix      : numpy  # numpy, Numeric or numarray

Make sure the first line reflects your desired backend (Tcl/Tk with Agg antialiasing here; use GTKAgg if you installed PyGTK), and that the second line says ": numpy" as shown above, telling matplotlib to use numpy arrays.


If you'd rather install IPython via a standard package, packages for many platforms are available at the IPython web site. Building from source is easy enough:

pythonw install

The first time you run IPython (just type "ipython" at the shell), it will create a ".ipython" directory in your home directory (and warn you about it). It installs a set of default startup profiles there that you may edit to customize how IPython behaves.

One of IPython's profiles is intended for use with SciPy and is invoked with "ipython -p scipy". However, it tries do some "magic" tailored to the last official release and won't work with the latest version. So just use plain "ipython" for now, or write your own profile that automatically imports scipy.


python install

If you'd like to work through the STScI Data Analysis With Python Tutorial, you should also install the DS9 image viewer (binaries for all major platforms are at the DS9 web site) and the following Python package from STScI, included in the tarball:


python install

SAMSI extras

Finally, copy the "samsi-for-home.tar.gz" file to your home directory and unpack it there (not as root!). It will create a "samsi" directory that includes a copy of the matplotlib examples directory (a good source of plotting recipes), and copies of some basic documentation: the official Python Tutorial and Standard Library docs, the IPython manual, the matplotlib tutorial, and the STScI Python Data Analysis Tutorial.


The numpy, scipy, and numarray installations are from current SVN or CVS developer archive revisions. There will be solid official releases of all of these packages quite soon, including binary installers for several platforms; check the web sites (linked on the Essentials page) for news about official releases. If you are ambitious (courageous?) and wish to stay on the "bleeding edge," do not delete the unpacked directories after installing the software. As you wish, update them via SVN or CVS as follows (execute these in your shell in the directory right above the package directory), and re-install the packages.

NOTE: Before you reinstall a developer snapshot, which may have significant internal reorganization, you should manually remove all results of an earlier installation. To do this, delete the "build" folder within the package's source directory (the installer creates this to store files during the installation process). Also, delete the installed Python package in your Python's "site-packages" directory. This is a directory with the same name as the package, located here on my machine:


10 January 2006 — Tom Loredo