This guide will outline how to install the following Python software stack:
- NumPy (1.6.2)
- SciPy (0.10.1)
- MatPlotLib (1.2.0)
- IPython (0.13.2)
- Scikit-Learn (0.13)
- RPy2 (2.3.2)
Preparatory SetupIt is assumed you are using distribute and pip to install Python packages. This means, you need to have the following setup done already:
curl -O http://python-distribute.org/distribute_setup.py
sudop python distribute_setup.py
curl -O https://raw.github.com/pypa/pip/master/contrib/get-pip.py
sudo python get-pip.py
Furthermore, we are assuming a 64 bit build of Python 2.7 as the target environment. If you only need/want to use a stack compiled for a 32 bit architecture, simpler paths than the one laid out here might work as well.
Download and install a version of numpy that is not too new (currently, 1.6 worked, while numpy-dev is at 1.8) for the latest version of OSX and the oldest still supported version of Python (currently, 2.7) to have any chance of success.
The "newest" NumPy package that was found to work was the numpy-1.6.2 package for Python 2.7, while when using any newer package, the post-installation check in the Python interpreter:
>>> import numpy; numpy.test('full')
did not pass without errors. What you never want to be seeing are errors directly related to the Fortran compiler. This would probably mean that you have your own version of Fortran installed; The best remedy in that case is to remove it and the tests should pass.
fetch a version where not too many tests fail (and even on Ubuntu LTS 12.04, the tests "test_io.test_imread" and "test_expon" are known too fail and are considered to be a non-issue). On OSX 10.7 with Python 2.7, it is possible to install the 0.10.1 package and the final
>>> import scipy; scipy.test()
check passes with "only" 9 failures. If you use newer versions, more tests will fail. In general, these two core libraries are the hardest part and it is essential to get particularly NumPy installed correctly for everything else to work.
pre-compiled OSX packages for Python 2.7 available, and the latest version (1.2.0 at the time of this writing) should work without any trouble. To ensure the installation worked, try this in the Python interpreter:
>>> from pylab import *; plot([1,2,3]); show()
and you should see a plot with a straight diagonal. To ensure you have the right library, also check:
>>> import matlpotlib; matplotlib.__version__
And you should see the desired version number you were trying to install.
sudo easy_install-2.7 readline
Note that readline has to be installed using easy_install, not pip! Now, the default installation way should work and we can simply do:
sudo pip install ipython[zmq,qtconsole,notebook,test]
To make sure the installation worked, execute the newly installed iptest script.
sudo pip install scikit-learn
nosetests sklearn --exe
This nosetest will produce one (and only one) error: "Split arrays or matrices into random train and test subsets". But according to the developer, this is a non-issue and can be ignored.
Last, if you are interested in using two more experimental and novel libraries on Python that are attempting to rid the requirement of using R (and/or rpy), you might want to install Pandas and StatsModels. If you prefer non-experimental, production stable libraries, you are probably advised to use R and RPy2, as RPy ("version 1") often tends to have issues.
sudo pip install pandas
To ensure the library is operational, run (should not produce any errors):
sudo pip install patsy
sudo pip install statsmodels
To check the installation worked, open a Python interpreter session and do:
>>> import statsmodels.api as sm
Here, several tests seem to be failing and it is not clear at all if this is expected or not. StatsModels has several hundreds of open issues and should probably be considered very experimental at this stage.
sudo pip install rpy2
To ensure the install worked, run the tests as:
python -m 'rpy2.tests'
You should not be seeing any problems.
E voilà - you now have a fully functioning environment for running all kinds and sorts of statistical data analyses and machine learning algorithms!