new in toolbox
o
Using GPU in Backpropagation
o
Revision of some demo scripts
o
Function approximation with multiple
outputs
o
Feature extraction with GRBM in first
layer
Features
o
It is an object oriented toolbox with the
most important abilities needed for the implementation of DBNs.
o
According to object oriented programming,
DeeBNet is designed to be very modular, extensible, reusable and can easily
be modified and extended
o
It can be run using both MATLAB and Octave
and is platform independent (Windows and Linux)
o
Different sampling methods including
Gibbs, CD, PCD and our new FEPCD method are implemented in out toolbox
o
Different sparsity methods, including
quadratic, rate distortion and our normal sparsity method are included in
DeeBNet
o
DeeBNet supports different RBM types
(including generative and discriminative)
o
Efficiency in using GPU power properly
(high GPU load)
o
Possibility of using DeeBNet in many
different tasks such as classification, feature extraction, data
reconstructing, noise reduction, generating new data, etc.
o
Data management in DataStore class and
optimized codes for engagement with big data in some functions
Downloading the Toolbox V3.2
o
Download it
from here in zip format (.zip).
o
The Toolbox is tested with MATLAB R2015a and
Octave 4.0.0 in Windows and Linux.
o
To install the Toolbox simply unpack the
archive.
o
Download tested datasets (or use
prepareMNIST_Small function in MNIST demos).
o
MNIST
o
ISOLET
o
MATLAB
version of the 20 Newsgroups data set
o
Run the demo scripts to see what it can do
(change dataset path in scripts). See the documentations for more details.
o
test_classificationMNIST.m
o
test_classificationISOLET.m
o
test_classification20Newsgroups.m
o
test_generateDataMNIST.m
o
test_getFeatureMNIST.m
o
test_getFeatureMNIST_usingGPU.m
o
test_plotDataMNIST.m
o
test_reconstructDataMNIST.m
o
test_sparsityAndBasesFunctionMNIST.m
o
…
Documentation
o
Full Description [89 pages in Persian]
o
Technical Report Description [27 pages in English]
History
o
DeeBNet V1.0 (.zip)
o
Release Date: 12/9/2014
o
Documentation: in English , in Persian
o
DeeBNet V2.0 (.zip)
o
Release Date: 7/10/2015
o
Documentation: in English , in Persian
o
New features
o
Sparsity in RBM with three different
methods
o
Plotting bases function
o
Classification and feature extraction on 20
Newsgroups datasets
o
Code correction in using back propagation.
o
Runtime and memory code optimization
o
Normalization and Shuffling
o
DeeBNet V2.1 (.zip)
o
Release Date: 7/23/2015
o
Documentation: in English , in Persian
o
New features
o
GPU support (about 5 times faster than CPU
- GPU: NVIDEA GeForce GTX 780 CPU: AMD FX 8150 Eight-Core 3.6
GHz)
o
Cast DBN parameters to single and double
data types
o
Runtime and memory code optimization
o
Normalization and Shuffling
o
DeeBNet V2.2 (.zip)
o
Release Date: 9/7/2015
o
Documentation: in English , in Persian
o
New features
o
Bug was fixed for computeBatchSize
function in Linux.
o
Revision of some demo scripts.
o
DeeBNet V3.0 (.zip)
o
Release Date: 1/9/2016
o
Documentation: in English , in Persian
o
New features
o
Editing toolbox for using in Octave.
o
DeeBNet V3.1 (.zip)
o
Release Date: 1/19/2016
o
Documentation: in English , in Persian
o
New features
o
Bug fix in changing learning rate.
o
Expanded generateData function in using
after backpropagation.
o
Expanded reconstructData function in using
after backpropagation.
Related publications
o
If you like the Toolbox and want to cite
it please reference it as:
o
M. A. Keyvanrad and M. M. Homayounpour, “A brief survey on deep belief networks
and introducing a new object oriented toolbox (DeeBNet),” arXiv:1408.3264
[cs], Aug. 2014.
Introduction
Since many years ago,
artificial neural networks have been used in artificial intelligence
applications. Pattern recognition, voice and speech analysis and natural
language processing are some of these applications that use artificial neural
networks. Due to some theoretical and biological reasons, deep models and
architectures with many nonlinear processing layers were suggested.
These deep models have many
layers and parameters that must be learnt. When the learning process is so complicated
and a huge number of parameters are needed, artificial neural networks are
rarely used. The problem of this number of layers is that training is time
consuming and training becomes trapped at local
minima. Therefore we can’t achieve acceptable results. One
important tool for dealing with this problem is to use DBNs (Deep Belief
Network) that can create neural networks including many hidden layers (Liu
et al., 2011).Deep Belief Networks can be
used in classification and feature learning. Data representation is very
important in machine learning. Therefore, much work has been done for feature
preprocessing, feature extraction and feature learning. In feature learning,
we can create a feature extraction system and then use the extracted features
in classification and other applications. Using unlabeled data in high level
feature extraction (Lee
et al., 2008) and also increasing discrimination between extracted features
are the benefits of DBN for feature learning (Hinton
and Salakhutdinov, 2006).
Layers of DBN are created from
Restricted Boltzmann Machine (RBM) that is a generative and undirected
probabilistic model. RBMs use a hidden layer to model the probability
distribution of visible variables. Indeed, we can create a DBN for
hierarchical processing using stacking RBMs. Therefore most of improvements
in DBNs are due to improvement in RBMs. This paper studies different
developed RBM models and introduces a new MATLAB toolbox with many DBN
abilities.
Hinton presented DBNs and used
it in the task of digit recognition on MNIST data set (Hinton
et al., 2006). He used a DBN with 784-500-500-2000-10 structure, where the
first layer possesses 784 features from 28*28 MNIST digit images. The last
layer is related to 10 digit labels and other three layers are hidden layers
with stochastic binary neurons. Finally this paper achieved 1.25%
classification error rate on MNIST test data set.
In another paper from this
author (Hinton
and Salakhutdinov, 2006), DBN is used as a nonlinear model for feature
extraction and dimension reduction. Indeed, the DBN may be considered as a
model that can generate features in its last layer with
the ability to reconstruct visible data from generated features. When a
general Neural Network is used with many layers, the Neural Network becomes
trapped in local minima and the performance will decrease. Therefore
determining the initial values for NN weights is critical.
Another
paper proposed DDBN (Discriminative Deep Belief Network) is based on DBN as a
new classifier (Liu
et al., 2011). This paper showed the power of DBN in using unlabeled data and
also performance improvement by increasing layers (even by 50 hidden layers).
DBN applications are not
limited to image processing and can be used in voice processing (Hamel
and Eck, 2010; Lee et al., 2009; Mohamed et al., 2009; Vinyals and Ravuri,
2011) with significant efficiency. Some toolkits have been developed and introduced
to facilitate the use of DBNs in different applications. The implemented
toolboxes are developed to be used for many different tasks including
classification, feature extraction, data reconstructing, noise reduction,
generating new data, etc. Some of these toolboxes are listed and compared in Table 1. The comparison is based on some features and characteristics including
programming language, open source object oriented programming, learning
method, discriminative ability, type of visible nodes, fine tuning,
possibility of being used using GPUs, and documentation. As Table 1 depicts, in comparison to other DBN toolboxes, our toolbox possesses all
main features as well as different types of classes. Also it is designed to
be very modular, extensible and reusable.
Table
1: A brief comparison with other implemented toolboxes.
Toolkit Name
|
Progr.
Lang
|
open
source
|
OOP[1]
|
Learning method
|
DRBM[2]
|
Sparse
RBM
|
type
visible nodes
|
fine-tuning
|
GPU
|
User Manual
|
deepLearn,
2014
|
MATLAB,
Octave
|
✔
|
✘
|
CD1
|
✘
|
✔
|
probability
|
✔
|
✘
|
Incomplete
|
deep autoencoder, 2006
|
MATLAB
|
✔
|
✘
|
CD1
|
✘
|
✘
|
probability
|
✔
|
✘
|
Incomplete
|
matrbm,
2010
|
MATLAB
|
✔
|
✘
|
CD1,
PCD
|
✔
|
✘
|
probability
|
✘
|
✘
|
Incomplete
|
deepmat,
2014
|
MATLAB
|
✔
|
✘
|
CDk,
PCD, FPCD
|
✔
|
✔
|
Probability, Gaussian
|
✔
|
✔
|
Incomplete
|
DigitDemo,
2010
|
MATLAB
|
✘
|
✘
|
CDk,
PCD, RM, PL
|
✘
|
✘
|
Probability
|
✔
|
✘
|
Incomplete
|
DBN
Toolbox,
2010
|
MATLAB
|
✔
|
✔
|
CDk
|
✘
|
✘
|
Probability, Gaussian
|
✔
|
✘
|
Incomplete
|
DeeBNet
(our toolbox)
|
MATLAB,
Octave
|
✔
|
✔
|
Gibbs,
CDk, PCD, FEPCD
|
✔
|
✔
|
binary, probability, Gaussian
|
✔
|
✔
|
complete (in English), perfect (in Persian)
|
The DeeBNet is an object
oriented MATLAB toolbox to provide tools for conducting research using Deep
Belief Networks. The toolbox has two packages with some classes and functions
for managing data and sampling methods and also has some classes to define
different RBMs and DBN. The following sections describe these packages and
classes in more details. The Figure 1 shows relationships between implemented
classes.
Figure 1: Relationships between implemented classes
in DeeBNet toolbox
Base classes
In this section, the basic
classes are defined. These classes will be used in RBM and DBN. The first
class is ValueType that is an enumeration. This class define different types
of units in DBN. These defined types can be binary (with 0 or 1 value),
probability (with values in interval) and Gaussian
(with any real values with zero mean and unit variance).
RbmType is also an
enumeration. This class defines different types of RBMs. These defined types
are generative (use data without their labels) and discriminative (need data
with their labels and can classify data).
Another important class is RbmParameters
that includes all parameters of an RBM such as weight matrix, biases,
learning rate, etc. Most of these parameters are defined in (Hinton,
2010).
DataClasses package has
one class to manage train, test and validation data. The DataStore
class has some useful functions such as normalize and shuffle
function for normalizing and shuffling data. Also it provides the cut
function to cut training data and choose a part of it as training data.
Finally the plotData function can be used for plotting some parts of
data. It is useful for compare data before and after some processing stages
(see Figure 2).
Figure 2: Plotting 100 samples with plotData function in DataStore class. The
first image is 100 samples from MNIST dataset and the second one is
reconstructed samples with a DBN model. The related code is in “test_plotData.m”
file.
The second package includes the
implementation of some different sampling methods. These sampling methods are
Gibbs, CD, PCD and FEPCD. In Gibbs class
we can generate samples from an RBM model with random initialization samples.
Also this class is a parent class for other sampling classes. In the CD
(Contrastive Divergence) class, we can generate samples from an RBM model
with training samples initialization. This class inherits from Gibbs
class.
In PCD (Persistent
Contrastive Divergence) class, samples can generated from an RBM model.
Unlike CD method that uses training data as initial value for visible
units, PCD method uses last chain state in the last update step. Also
this class inherits from Gibbs class. In this class many persistent chains
can be run in parallel and we will refer to the current state in each of
these chains as new sample or a “fantasy” particle.
In FEPCD (Free Energy in
Persistent Contrastive Divergence) class, we define a criterion for goodness
of a chain and therefore generated samples and gradient computation will be
more accurate. The proposed criterion for selecting the best chain is the
free energy of visible sample (Mohammad
Ali Keyvanrad and Homayounpour, 2015). This class inherits from PCD
class.
Finally the Sampling
class is an interface class for using implemented sampling classes. Other
classes can use implemented sampling classes such as CD or PCD with this
useful class. In this class we use SamplingMethodType class that is an
enumeration and contains types of sampling methods that are used in RBM.
RBM classes
The toolbox has six types of
RBM classes. The first one, RBM class, is an abstract class that
defines all necessary functions (such as training method) and features (like
sampler object) in all types of RBMs and therefore we can't create an object
from it. Other RBM classes are inherited from this abstract class.
The second one is GenerativeRBM
class. This class has been used as a generative model and can model many
different types of data. Their most important use is as learning modules that
are composed to form DBNs. The GenerativeRBM class has many methods
like train, getFeature, generateData, reconstructData,
etc. The train method takes a DataStore object (that has
training, validation and test data) and modifies the RBM parameters. The
termination condition is the number of training epochs. The getFeature
method, extracts features (or activity in hidden layer) from data. In other
words this method samples hidden units from visible units with determined
sampling method.
The generateData method
can generate values of visible units from determined hidden values (or extracted
features). Similar to getFeature method, generateData samples
visible units from hidden units with determined sampling method. Figure 3 shows some outputs of the method. These results have been obtained from an
RBM with 250 hidden units that has been trained on MNIST dataset. In this
experiment, after extracting 250 feature from 9 MNIST images (28*28 pixel),
the new images have been generated from extracted features. According to Figure 3, by increasing (number of
sampling iterations), the generated images will be more natural and more
similar to data distribution.
Figure 3:
Results from an RBM with 250 hidden units that has been trained on MNIST
dataset. In this experiment, after extracting 250 features from 9 MNIST
images (28*28 pixel), the new images have been generated from extracted features.
(a) 9 MNIST images. (b) Generated images from extracted features with sampling
iteration. (c) Generated images from extracted features with sampling
iterations. (d) Generated images from extracted features with sampling
iterations. The related code is in “test_ generateData.m”.
The last useful method is reconstructData.
This method is used for reconstructing input data. Indeed the method
reconstruct data by extracting features from input data and then generating
data from extracted features. In Figure 4 this method has been used to reduce
noise in images. According to Figure 4, Gaussian noise has been reduced after
reconstructing corrupted images.
Figure 4:
reducing noise from corrupted images using reconstructData method. (a)
9 MNIST images. (b) Corrupted data with Gaussian noise with zero mean and
0.02 variance. (c) Reconstructed images from corrupted images. Gaussian noise
has been reduced after reconstructing corrupted images. The related code is
in “test_reconstructData2.m”.
The third RBM class is DiscriminativeRBM.
With some changes, we can convert generative RBM to a discriminative RBM that
can classify data. This class includes methods like methods in GenerativeRBM
class. Two different methods are generateClass and predictClass.
The generateClass can generate data with a specified class number (or
label). According to Figure 5, the model can generate different images with
only activating label unit in model. Note that the model can’t generate
images for two digits (2 and 8) using only activating label unit.
Figure 5
: synthesized images with generateClass method. Generating different
images with only activating label unit in model. Using this method, the model
can generate different images by only activating label unit in model. Note
that the model can’t generate images for two digits (2 and 8) by only
activating label unit. The related code is in “test_generateClass.m”.
The other different method is predictClass.
This method can predict class number (or label) of input data. The first method
(byFreeEnergy) is to train a joint density model using a single RBM
that has two sets of visible units. In addition to the units that represent a
data vector, there is a “softmax” label unit that represents the class. After
training, each possible label is tried in turn with a test vector and the one
that gives lowest free energy is chosen as the most likely class (Hinton,
2010). The second method (bySampling) is to reconstruct data and
return most activated softmax unit (that correspond to a label). Usually the byFreeEnergy
is more accurate but is more time consuming.
Another three RBM classes are SparseRBM,
SparseGenerativeRBM and SparseDiscriminativeRBM. The first one,
SparseRBM class, is an abstract class that define gradient of
regularization term for different sparsity methods such as quadratic sparse RBM, rate distortion sparse RBM and normal sparse
RBM (M.A.
Keyvanrad and Homayounpour, 2015). The SparseGenerativeRBM
and SparseDiscriminativeRBM classes combine generative RBM or discriminative RBM
with sparse RBM features in a separate class that GenerativeRBM
and DiscriminativeRBM can be sparse.
DBN class
DBN is a generative model that is
composed of multiple layers of RBMs. The class architecture allows using
different RBM classes to create an arbitrary DBN and utilizes
back-propagation after DBN training if needed. A DBN can be used as an autoEncoder
or classifier.
An autoEncoder DBN may
be used to create a generative model and can be used in many applications
such as feature extraction. Figure 6 shows an autoEncoder DBN with two
RBM layers. The last layer hidden units can be used as a feature vector that
has been extracted from input visible data.
Figure 6:
An autoEncoder DBN with two RBM layers.
A DBN can also be used as a classifier.
The goal of classifier DBN is to obtain labels from input data. In
this type of DBN, we need a discriminative RBM in last layer as a classifier
RBM. Figure 7 shows a classifier DBN with two RBM layers where the
last RBM is a discriminative RBM.
Figure 7:
A classifier DBN with two RBM layers. The last RBM is a discriminative
RBM.
The DBN class has some useful
methods like addRBM, train, getFeature, backpropagation,
getOutput, plotBases, etc. The addRBM method is used to
stack RBMs. This method add each defined RBM (with RbmParameters
object) to its DBN.
The train method trains
DBN, layer by layer. In other words, this method trains RBMs one after
another and uses their extracted features for training in the next RBM.
The “getFeature” method
is used to extract features from input data. This method extracts features
layer by layer and returns hidden units activation values in last hidden layer
as extracted feature (see Figure 6).
Figure 8 shows extracted
features in a DBN on MNIST dataset. The features were produced by a
784-1000-500-250-3 autoEncoder DBN that maps input images (784 pixel)
to 3 features.
Figure 8:
extracted features in a DBN on MNIST dataset. The features produced by a
784-1000-500-250-3 autoEncoder DBN that maps input images (784 pixel)
to 3 features. The related code can be found in “test_getFeatureMNIST.m”file.
In another test, ISOLET dataset
is used (Fanty
and Cole, 1991). In ISOLET data set, 150 subjects utter twice the name of
each letter of the alphabet. There are 7797 examples in total, referred to as
isolet1-isolet5 (6238 training examples and 1559 test examples). Figure 9 shows extracted features in a DBN on ISOLET dataset. The features produced
by a 617-2000-1000-500-250-2 and a 617-2000-1000-500-250-3 autoEncoder
DBN that maps input data (617 features) to 2 or 3 features.
Figure 9: extracted features from a DBN on ISOLET dataset with 617 features and 26 different
classes (26 different spoken letters). Ten randomly selected letters are
shown. Left: The features produced by a 617-2000-1000-500-250-2 autoEncoder
DBN. Right: The features produced by a 617-2000-1000-500-250-3 autoEncoder
DBN. The related code is in “test_getFeatureISOLET.m”.
Also in another test, 20
Newsgroups dataset is used. The 20 Newsgroups
dataset is organized into 20 different newsgroups, each corresponding to a
different topic. The 20 Newsgroups dataset has become a popular data set for
experiments in text applications of machine learning techniques, such as text
classification and text clustering. Figure 10 shows extracted features in a
DBN on 20 Newsgroups dataset. The features produced by a 5000-500-500-250-3 autoEncoder
DBN that maps input data (5000 features) to 3 features.
According to Figure
8, Figure 9 and Figure 10, DBN can obtain good features with acceptable discrimination
between them. Note that these features has been learnt without using their
labels.
Figure 10: extracted features from a DBN on 20 Newsgroups dataset with 5000 features and 20
different classes (20 different newsgroups). In this figure, five selected
newsgroups are shown. The features produced by a 5000-500-500-250-3 autoEncoder
DBN. The related code is in “test_getFeature20Newsgroups.m”.
The next useful method is backpropagation
method. This method uses back-propagation algorithm to fine-tune
pertained parameters. Our toolbox uses MATLAB neural network toolbox. Hence
the method first converts a DBN to a MATLAB neural network object (according
to DBN type) and then uses its back-propagation algorithm.
Figure 11 shows, how a DBN with
a discriminative RBM in last layer converts to a MATLAB neural network structure.
In this conversion, the softmax units in discriminative RBM and their
corresponding weights are set as output neural network layer.
Figure 11: Conversion of a classifier DBN to a MATLAB neural network structure. Left:
A DBN with a discriminative RBM in last layer. Right: A neural network
structure with softmax units and their weights in DBN as output layer.
In an autoEncoder DBN,
conversion to neural network structure is done differently. Figure 12 shows, how we add an upside down DBN to reconstruct input data (Hinton
and Salakhutdinov, 2006). This neural network structure can be fine-tuned using
back-propagation algorithm.
Figure 12:
Conversion of an autoEncoder DBN to a MATLAB neural network structure.
Left: A DBN with generative RBMs. Right: A neural network structure with the added
upside down DBN to reconstruct input data.
The other method is getOutput
that is used to get DBN outputs. This method returns results according to
type of the DBN. Therefore in an autoEncoder or classifier DBN,
results are extracted features or labels respectively.
The last method is plotBases
that can be used to plot bases function that has been learned by DBN.
Figure 13: Bases function that plotBases function can
plot. These bases function are from a two layer sparse normal DBN that has
been learned on MNIST dataset. Left: bases function in first layer. Right:
bases function in second layer.
Table 2 shows a classification
experiment using this toolbox on MNIST and ISOLET dataset. This table
compares different sampling method types that has been implemented in our
toolbox, before and after back-propagation.
Table
2: Classification error on MNIST dataset for a DBN (784-500-500-2000) and
on ISOLET dataset for a DBN (617-1000-1000-2000) and on 20 Newsgroups dataset
for a DBN (5000-500-500-2000) using different sampling methods. After
training each RBM, the DBN was fine-tuned in 200 epochs using back-propagation
method.
Method
|
MNIST
|
ISOLET
|
20
Newsgroups
|
Before
BP
|
After
BP
|
Before
BP
|
After
BP
|
Before
BP
|
After
BP
|
CD
|
0.0636
|
0.0124
|
0.0552
|
0.0372
|
0.3087
|
0.2686
|
PCD
|
0.0307
|
0.0122
|
0.0500
|
0.0385
|
0.3183
|
0.2642
|
FEPCD
|
0.0248
|
0.0099
|
0.0449
|
0.0353
|
0.3161
|
0.2678
|
References
Fanty,
M.A., Cole, R.A., 1991. Spoken Letter Recognition. Presented at the Advances
in Neural Information Processing Systems, pp. 220–226.
Hamel, P., Eck, D., 2010. Learning features from
music audio with deep belief networks. In: 11th International Society for
Music Information Retrieval Conference (ISMIR 2010).
Hinton, G., 2010. A practical guide to training
restricted boltzmann machines (Technical report), 2010-003. Machine Learning
Group, University of Toronto.
Hinton, G.E., Osindero, S., Teh, Y.-W., 2006. A Fast
Learning Algorithm for Deep Belief Nets. Neural Computation 18, 1527–1554.
Hinton, G.E., Salakhutdinov, R.R., 2006. Reducing
the dimensionality of data with neural networks. Science 313, 504–507.
Keyvanrad, M.A., Homayounpour, M.M., 2015. Deep
Belief Network Training Improvement Using Elite Samples Minimizing Free
Energy. Int. J. Patt. Recogn. Artif. Intell.
Keyvanrad, M.A., Homayounpour, M.M., 2015. Effective
Sparsity Control in Deep Belief Networks using Normal Regularization Term.
submitted to Neural Networks.
Lee, H., Ekanadham, C., Ng, A., 2008. Sparse deep
belief net model for visual area V2. Advances in neural information
processing systems 20, 873–880.
Lee, H., Largman, Y., Pham, P., Ng, A.Y., 2009.
Unsupervised feature learning for audio classification using convolutional
deep belief networks. Advances in neural information processing systems 22,
1096–1104.
Liu, Y., Zhou, S., Chen, Q., 2011. Discriminative
deep belief networks for visual data classification. Pattern Recognition 44,
2287–2296.
Mohamed, A., Dahl, G., Hinton, G., 2009. Deep belief
networks for phone recognition. In: NIPS Workshop on Deep Learning for Speech
Recognition and Related Applications. Canada, pp. 1–9.
Vinyals, O., Ravuri, S.V., 2011. Comparing
multilayer perceptron to Deep Belief Network Tandem features for robust ASR.
In: Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International
Conference on. pp. 4596–4599.
|