MNIST Database
The MNIST database is a database of handwritten digits, which consists of a training set of 60,000 examples, and a test set of 10,000 examples. It was made available by Yann Le Cun and Corinna Cortes (MNIST database). The data was originally extracted from a larger set made available by NIST, before being size-normalized and centered in a fixed-size image (28x28 pixels).
The actual raw data for the database should be downloaded from the original website. This package only contains the Bob accessor methods to use this database directly from python, with our certified protocols.
You would normally not install this package unless you are maintaining it. What you would do instead is to tie it in at the package you need to use it. There are a few ways to achieve this:
- You can add this package as a requirement at the
setup.py
for your own satellite package or to your Buildout.cfg
file, if you prefer it that way. With this method, this package gets automatically downloaded and installed on your working environment, or - You can manually download and install this package using commands like
easy_install
orpip
.
The package is available in two different distribution formats:
- You can download it from PyPI, or
- You can download it in its source form from its git repository.
The database raw files must be installed somewhere in your environment.
You can mix and match points 1/2 above based on your requirements. Here are some examples:
Modify your setup.py and download from PyPI
That is the easiest. Edit your setup.py
in your satellite package and add
the following entry in the install_requires
section (note: ...
means
whatever extra stuff you may have in-between, don't put that on your
script):
install_requires=[
...
"xbob.db.mnist",
],
Proceed normally with your bootstrap/buildout
steps and you should be all
set. That means you can now import the namespace xbob.db.mnist
into your scripts.
Modify your buildout.cfg and download from git
You will need to add a dependence to mr.developer to be able to install from our
git repositories. Your buildout.cfg
file should contain the following
lines:
[buildout]
...
extensions = mr.developer
auto-checkout = *
eggs = bob
...
xbob.db.mnist
[sources]
xbob.db.mnist = git https://github.com/bioidiap/xbob.db.mnist.git
...
How to use this database API
After launching the python interpreter (assuming that the environment is properly set up), you could get the training set as follows:
>>> import xbob.db.mnist
>>> db = xbob.db.mnist.Database('PATH_TO_DATA_FROM_YANN_LECUN_WEBSITE') # 4 binary .gz compressed files
>>> images, labels = db.data(groups='train', labels=[0,1,2,3,4,5,6,7,8,9])
In this case, this should return two NumPy arrays:
- images contain the raw data (60,000 samples of dimension 784 [28x28 pixels images])
- labels are the corresponding classes (digits 0 to 9) for each of the 60,000 samples