Logo Monte - gradient based learning in Python

Usage example
Documentation (pdf)
Monte (python) is a Python framework for building gradient based learning machines, like neural networks, conditional random fields, logistic regression, etc. Monte contains modules (that hold parameters, a cost-function and a gradient-function) and trainers (that can adapt a module's parameters by minimizing its cost-function on training data).

Modules are usually composed of other modules, which can in turn contain other modules, etc. Gradients of decomposable systems like these can be computed with back-propagation.


  • (Jun 29, 2010) Version 0.2.0 released. New directory layout, that is significantly simpler and flatter. NOTE: The change in the directory layout introduces incompatibilities with previous versions. Read documentation or the README file for more information.
  • (Apr 23, 2008) Version 0.1.0 constitutes a major overhaul, several new models, slight changes in directory structure, bug-fixes.
  • (May 23, 2007) Version 0.0.11 has been released. Bug fixes, minor changes, improved compatibility with newest numpy release.
  • (May 21, 2007) Version 0.0.10 has been released. Bug fix in gbm.
  • (May 21, 2007) Version 0.0.9 has been released. Bug fixes, minor changes.
  • (May 11, 2007) Version 0.0.8 has been released. Main changes are new-style classes and bugfixes.
  • (April 22, 2007) Version 0.0.7 has been released. This is a major upgrade to improve PEP8-compliance.
  • (April 1, 2007) Version 0.0.6 has been released to address an annoying bug due to a wrong import.
  • (March 30, 2007) Version 0.0.5 has been released:
    We're slowly getting closer to a nice and stable release 1...
    Monte now also includes sparse regression (Tibshirani's LASSO), neural networks, nonlinear CRFs based on neural networks, nearest neighbors, and more...
  • Version 0.0.4 has been released.
  • An initial version of the documentation (pdf) is now available.


Monte requires Python (2.4 or later) and the packages numpy (Version 1.0 or later), scipy (Version 0.5.1 or later) and matplotlib (Version 0.87 or later) to be fully functional. (Installing those packages is advisable in any case, if you are ever planning to do any serious data-analysis, machine learning, or numerical processing with Python.)

Example: Training a neural network

from numpy import arange, newaxis, sin
from pylab import randn, plot, scatter, hold
from monte.models.neuralnet import NNIsl
import monte.train 

mynn        = NNIsl(1,10,1)   #neural network with one input-, one output-,
                              #and one hidden layer with 10 sigmoid units
mytrainer  = monte.train.Conjugategradients(mynn,10)

inputs = arange(-10.0,10.0,0.1)[newaxis,:]        #produce some inputs
outputs = sin(inputs) + randn(1,inputs.shape[1])  #produce some outputs
testinputs  = arange(-10.5,10.5,0.05)[newaxis,:]  #produce some test-data
testoutputs = sin(testinputs)

for i in range(50):
    print mynn.cost((inputs,outputs),0.0001)


Monte documentation is being written incrementally. The latest pdf of the documentation can be found here


If you want to contribute or have questions, send mail to: monte[at]
Monte was written by Roland Memisevic


While Monte contains some simple implementations of a variety of learning methods (such as neural networks, k-nearest neighbors, logistic regression, kmeans, and others), Monte's focus is on gradient based learning of parametric models. The simple reason for this is that these methods keep turning out to be the most useful in applications -- especially when dealing with large datasets.

Monte is not yet another wrapper around some C++ SVM library. Python comes with powerful numerical processing facilities itself these days, and a lot of interesting machine learning is possible in pure Python.

Monte's design philosophy is inspired mainly by the amazing gradient based learning library available for the lush-language, written by Yann LeCun and Leon Bottou. The idea of that library is to use trainable components (objects) that have 'fprop-' and 'bprop-'methods, and to build complicated architectures by simply sticking these together. Error derivatives can then be computed easily using back-propagation. Modular design based on back-propagation relies on the observation that back-propagation (despite the common misguided claims to the contrary) is not just "an application of the chain-rule".