StatPy: Statistical Computing with Python

Welcome to StatPy, a collection of resources to help you do statistical computing with Python, with a special emphasis on astrostatistics (statistics in astronomy). This web site is brand-spanking-new, and still very much under construction; please be patient with our "dust" and check back again frequently as building continues.

Python is a is an interpreted, interactive, object-oriented programming language. It is a "very high level language" or "scripting language," often compared to Perl, Tcl, Scheme, and Java. It combines remarkable power with a very clear and simple syntax that resembles the "pseudocode" one might use to describe the essentials of an algorithm. Implementations are availabe for free on all major (and many minor) computing platforms. Python is an "open source" project with a very liberal license.

Why Consider Python?
  • What is Python?
  • Python is a good general purpose language
  • Python is a good language for numerical computing
  • Python is a good language for education
  • Python's limitations
Getting Started
Going Further
  • Resources for numerical computing
    • SciPy.org (Hosted by Enthought.com; Travis Oliphant is the lead maintainer)

      A recent project, SciPy is a unified collection of open source libraries adding scientific computing capability to Python. "SciPy supplements the popular Numeric module, gathering a variety of high level science and engineering modules together as a single package. Within SciPy are modules for graphics and plotting, optimization, integration, special functions, signal and image processing, genetic algorithms, ODE solvers, and others. There is also an experimental "compiler" that takes a Numeric array expression in Python and compiles it to C++ code on the fly. SciPy is developed concurrently on both Linux and Windows. It has also been compiled successfully on Sun, and should port to most other platforms where Python is available." Some of the separate packages listed below have been incorporated into SciPy.

    • Modules to enhance numerical Python (Travis Oliphant)

      This site offers a variety of extremely useful Python modules and extensions for scientific computing. Most are Python interfaces to widely-used and well-tested C or FORTRAN libraries. Problem areas addressed include: sparse matrices (interfaces to SPARSEKIT2 and SuperLU); special functions (an interface to CEPHES); FFTs (an interface to FFTW, complementing the FFT capability already in NumPy); and signal processing (convolutions and filters). There is also a "Multipack" module with interfaces to selected algorithms from the ODEPACK, QUADPACK, and MINPACK libraries, and pure Python modules with selected algorithms for optimization, Gaussian quadrature, and orthogonal polynomials. The extensions at this site are distributed as source for UNIX-like platforms, and RPMs for Linux platforms. No Windows or MacOS ports are currently available.

    • ScientificPython (Konrad Hinsen)

      This package contains modules that implement basic geometry (vectors, tensors, transformations, vector and tensor fields), quaternions, automatic derivatives, (linear) interpolation, polynomials, elementary statistics, nonlinear least-squares fits, unit calculations, Fortran-compatible text formatting, 3D visualization via VRML, and two Tk widgets for simple line plots and 3D wireframe models.

    • MatPy (Huaiyu Zhu, admin)

      MatPy is a Python package for numerical computation and plotting using a MatLab-like interface to the NumPy package, Oliphant's CEPHES interface (for special functions), and the Python GnuPlot interface.

    • Matfunc (Raymond Hettinger)

      Matfunc provides pure Python modules (no C extensions) for elementwise operations, matrix operations, and various types of curve fitting (polyomial, rational functions, etc.).

    • Global Arrays Python Interface (Robert Harrison)

      The pyGA module provides a Python interface to the C Global Arrays (GA) library. GA is a public-domain numerically-oriented, portable, parallel programming environment including distributed shared-memory with both one-sided and collective operations, message passing, and interfaces to parallel BLAS and linear algebra packages. It runs on a wide variety of parallel processing platforms.

    • Sparse matrix packages:
    • Cassowary (Greg Badros)

      C++ constraint-solving toolkit (linear systems with equalities and inequalities), with a Python interface.

    • Simple Recipes in Python (William Park)

      Pure Python translations of selected algorithms from Numerical Recipes by Press et al.. Includes elementary functions, polynomial operations, 1-D zero finding, Simpson's rule quadrature, vector operations, and FFT-based operations.

  • Resources for statistical computing
    • stats.py (Gary Strangman)

      A collection of statistical functions, ranging from descriptive statistics (mean, median, histograms, variance, skew, kurtosis, etc.) to inferential statistics (t-tests, F-tests, chi-square, etc.). The functions are defined for operation on lists and, if Numeric is installed, also defined for array arguments.

    • ScientificPython (Konrad Hinsen)

      Hinsen's package includes modules implementing elementary statistical procedures (calculation of moments, correlation, and median; histograms) and nonlinear least-squares fitting.

    • odr.py -- Orthogonal Distance Regression (Robert Kern)

      This package wraps the Fortran-77 ODRPACK library containing routines for performing a large variety of least-squares regressions with an efficient trust-region algorithm.

    • R/SPlus-Python Interface (Omegahat Project)

      R and SPlus are data analysis packages that are very popular among statisticians; there are many R/SPlus packages written by statisticians implementing sophisticated methods. R is an open source package based on the commercial SPlus package. This interface allows Python code to call R functions, and R code to create Python objects and call Python functions and methods. "This allows Python programmers unfamiliar with the syntax of R to easily use its functionality and vice versa. It also allows data to be manipulated using Pythons tools and then passed to R's rich graphical and statistical tools."

  • Storing and retrieving numerical data
    • FITS (Flexible Image Transport System) format (the standard for astronomical data)
      • PyFITS (Paul Barrett/STScI)

        An object-oriented, easy-to-use interface to the FITS file format. Under development, but currently has good handling of headers and binary tables..

      • pCFITSIO (Norbert Pirzkal/ESO)

        Python access to most of the functions in the CFITSIO FITS file format library, produced using SWIG. Includes a FITSio module with a simplified Python interface to the functions.

      • Qfits (ESO/Nicolas Devillard)

        qfits is a stand-alone library written in ANSI C, that takes care of the most usual stuff you want to do with FITS files. It offers very fast keyword queries in FITS headers through the use of the mmap() system call on Unix. It can be compiled as a Python extension, providing full functionality via Python.

      • FITS (Andrew Williams)

        A FITS file class for Python. Currently only supports reading and parsing FITS header cards.

    • netCDF format
      • nc (Bill Noon)

        A Python module for accessing data in netCDF format.

      • ScientificPython (Konrad Hinsen)

        Among its many features (see above), ScientificPython includes a module for accessing data files in netCDF format.

    • Miscellaneous formats
      • NumpyIO (Travis Oliphant)

        NumpyIO contains methods designed for reading and writing large blocks of binary data into Numerical Python arrays. The author also has a class that uses NumpyIO to access volume data in ANALYZE format.

      • fortranio (Konrad Hinsen)

        An experimental module that reads and writes FORTRAN binary files on Unix platforms.

      • io (Gary Stagman)

        A collection of input/output routines for flat space/tab delimited text files and "flat" binary files, including some special file handlers for MRI files.

  • Plotting and graphing with Python
    • Plotting with Python (Janko Hauser)

      A useful annotated list of Python interfaces to popular plotting packages on various platforms, current as of the end of 1998. The material below reflects developments since 1998.

    • ppgplot/Pgplot (Nick Patavalis/Scott Ransom)

      Ransom's modification of Patavalis's Python interface to the pgplot plotting library.

    • The MayaVi Data Visualizer (Prabhu Ramachandran)

      "MayaVi is a free, easy to use scientific data visualizer. It is written in Python and uses the amazing Visualization Toolkit (VTK) for the graphics. It provides a GUI written using Tkinter. MayaVi is free and distributed under the GNU GPL. It is also cross platform and should run on any platform where both Python and VTK are available (which is almost any *nix, Mac OSX or Windows)." It is available as a stand-alone application and as a Python package.

    • gracePlot.py (Nathan Gray)

      This module provides a Python interface to the Grace package that implements 2-D interactive plotting. Grace provides GUI access to plot properties, allowing changes to the plot on-the-fly. This interface allows one to use Grace from a Python prompt, and integrates the Numeric package with Grace.

    • Py-OpenDX (Randall Hopper)

      This package provides a Python interface to Open-DX, the open source version of the IBM Data Explorer (DX) "industrial-strength" scientific visualization package developed at IBM.

  • Extending Python with C/C++/FORTRAN code
Application Examples
  • PyRAF (Perry Greenfield & Rick White)

    "PyRAF is a new command language for IRAF based on the Python scripting language. It is useful both for interactive data analysis and for writing analysis scripts. PyRAF coexists with the current IRAF CL; no changes need be (or should be) made to your installed IRAF system to use it. PyRAF has been developed by the Science Software Group at the Space Telescope Science Institute."

  • PyEphem (Brandon Rhodes)

    PyEphem is a module for performing astronomical computations from the Python scripting language. Its primary purpose is to compute for an arbitrary date and location on earth the position of the sun, moon, a planet, or any asteroid or comet whose orbital elements are available. Additional functions are also included, including facilities to compute the angular separation between two objects in the sky, to determine the constellation in which an object lies, and to find the times at which any object rises, transits, and sets on a particular day. It uses procedures from Elwood Downey's XEphem planetarium program.

  • XAssist (Andy Ptak/CMU)

    XAssist is a package under development for the automation of extra-galactic X-ray data. It's goal is to be capable of reducing and performing initial spectral, spatial, and temporal analysis of extra-galactic X-ray data from ROSAT, ASCA, XMM and Chandra. It consists of low-level C++ code and a high level Python interface. The XIMGFIT module for 2-D image fitting to FITS data using PSFs is currently available.

  • PyAstro (Pavlos Christoforou)

    An all python module that implements most of the algorithms in Peter Duffett-Smith's book Practical astronomy with your calculator. Good for casual observations of planets.

  • Eclipse-Python (Nicolas Devillard)

    "Eclipse is a general-purpose image processing library written in ANSI C for portability and performance. It has been successfully used as a basis for a number of VLT pipeline developments and has been reported to be used extensively for other projects outside ESO (without ESO support!). As a C library, eclipse is meant to be used as a basis for specific instrument developments (pipelines, or data reduction recipes). For convenience, an interface to Python has been produced using SWIG, that allows the programming of data reduction recipes in a high-level language. This interface is offered today in two parts: a dynamic library (c_eclipse.so) and a Python module (eclipse.py) which should shield Python programmers from changes happening in the library. The Python module offers a number of classes to deal with FITS images and cubes (tables are there but not yet interfaced). The idea is not to offer a new data analysis environment but an easy way of prototyping recipes before they are later frozen into C code for deployment."

  • Astrolabe (Bill McClain)

    "Astrolabe is a collection of subroutines and applications for calculating the positions of the sun, moon, planets and other celestial objects. The emphasis is on high accuracy over a several thousand year time span. The subroutine library attempts to (someday) implement all the techniques described in Astronomical Algorithms, second edition 1998, by Jean Meeus."