This guide will outline how to install the following Python software stack:
- NumPy (1.6.2)
- SciPy (0.10.1)
- MatPlotLib (1.2.0)
- IPython (0.13.2)
- Scikit-Learn (0.13)
- RPy2 (2.3.2)
Preparatory Setup
It is assumed you are using distribute and pip to install Python packages. This means, you need to have the following setup done already:curl -O http://python-distribute.org/distribute_setup.py
sudop python distribute_setup.py
curl -O https://raw.github.com/pypa/pip/master/contrib/get-pip.py
sudo python get-pip.py
Furthermore, we are assuming a 64 bit build of Python 2.7 as the target environment. If you only need/want to use a stack compiled for a 32 bit architecture, simpler paths than the one laid out here might work as well.
NumPy
Download and install a version of numpy that is not too new (currently, 1.6 worked, while numpy-dev is at 1.8) for the latest version of OSX and the oldest still supported version of Python (currently, 2.7) to have any chance of success.The "newest" NumPy package that was found to work was the numpy-1.6.2 package for Python 2.7, while when using any newer package, the post-installation check in the Python interpreter:
>>> import numpy; numpy.test('full')
did not pass without errors. What you never want to be seeing are errors directly related to the Fortran compiler. This would probably mean that you have your own version of Fortran installed; The best remedy in that case is to remove it and the tests should pass.
SciPy
Again, fetch a version where not too many tests fail (and even on Ubuntu LTS 12.04, the tests "test_io.test_imread" and "test_expon" are known too fail and are considered to be a non-issue). On OSX 10.7 with Python 2.7, it is possible to install the 0.10.1 package and the final>>> import scipy; scipy.test()
check passes with "only" 9 failures. If you use newer versions, more tests will fail. In general, these two core libraries are the hardest part and it is essential to get particularly NumPy installed correctly for everything else to work.
MatPlotLib
The next step is the installation of matplotlib; There are pre-compiled OSX packages for Python 2.7 available, and the latest version (1.2.0 at the time of this writing) should work without any trouble. To ensure the installation worked, try this in the Python interpreter:>>> from pylab import *; plot([1,2,3]); show()
and you should see a plot with a straight diagonal. To ensure you have the right library, also check:
>>> import matlpotlib; matplotlib.__version__
And you should see the desired version number you were trying to install.
IPython
First of all, a different readline installation is necessary:sudo easy_install-2.7 readline
Note that readline has to be installed using easy_install, not pip! Now, the default installation way should work and we can simply do:
sudo pip install ipython[zmq,qtconsole,notebook,test]
To make sure the installation worked, execute the newly installed iptest script.
Scikit-Learn
This again is pretty straightforward; Do:sudo pip install scikit-learn
nosetests sklearn --exe
This nosetest will produce one (and only one) error: "Split arrays or matrices into random train and test subsets". But according to the developer, this is a non-issue and can be ignored.
Last, if you are interested in using two more experimental and novel libraries on Python that are attempting to rid the requirement of using R (and/or rpy), you might want to install Pandas and StatsModels. If you prefer non-experimental, production stable libraries, you are probably advised to use R and RPy2, as RPy ("version 1") often tends to have issues.
Pandas
(Python Data Analysis Library) Again, the default installation route should work:sudo pip install pandas
To ensure the library is operational, run (should not produce any errors):
nosetests pandas
StatsModels
As with Pandas, we can use the "default installation pathway", but need to first install an undocumented dependency for this module (patsy):sudo pip install patsy
sudo pip install statsmodels
To check the installation worked, open a Python interpreter session and do:
>>> import statsmodels.api as sm
>>> sm.test()
Here, several tests seem to be failing and it is not clear at all if this is expected or not. StatsModels has several hundreds of open issues and should probably be considered very experimental at this stage.
RPy2
Again, the standard installation works (assuming you have R itself installed already, at least!):sudo pip install rpy2
To ensure the install worked, run the tests as:
python -m 'rpy2.tests'
You should not be seeing any problems.
E voilà - you now have a fully functioning environment for running all kinds and sorts of statistical data analyses and machine learning algorithms!