DeeBNet (Deep Belief Networks) toolbox

Last update: 6/26/2016

DeeBNet (Deep Belief Networks) toolbox in MATLAB and Octave

V3.2

Nowadays, this is very popular to use the deep architectures in machine learning. Deep Belief Networks (DBNs) are deep architectures that use stack of Restricted Boltzmann Machines (RBM) to create a powerful generative model using training data. DBNs have many ability like feature extraction and classification that are used in many applications like image processing, speech processing and etc. This paper introduces a new object oriented MATLAB toolbox with most of abilities needed for the implementation of DBNs. According to the results of the experiments conducted on MNIST (image), ISOLET (speech), and 20 Newsgroups (text) datasets, it was shown that the toolbox can learn automatically a good representation of the input from unlabeled data with better discrimination between different classes. Also on all datasets, the obtained classification errors are comparable to those of state of the art classifiers. In addition, the toolbox supports different sampling methods (e.g. Gibbs, CD, PCD and our new FEPCD method), different sparsity methods (quadratic, rate distortion and our new normal method), different RBM types (generative and discriminative), GPU, etc. The toolbox is a user-friendly open source software and is freely available. In the new version, the toolbox can be used in Octave.

new in toolbox

o Using GPU in Backpropagation

o Revision of some demo scripts

o Function approximation with multiple outputs

o Feature extraction with GRBM in first layer

Features

o It is an object oriented toolbox with the most important abilities needed for the implementation of DBNs.

o According to object oriented programming, DeeBNet is designed to be very modular, extensible, reusable and can easily be modified and extended

o It can be run using both MATLAB and Octave and is platform independent (Windows and Linux)

o Different sampling methods including Gibbs, CD, PCD and our new FEPCD method are implemented in out toolbox

o Different sparsity methods, including quadratic, rate distortion and our normal sparsity method are included in DeeBNet

o DeeBNet supports different RBM types (including generative and discriminative)

o Efficiency in using GPU power properly (high GPU load)

o Possibility of using DeeBNet in many different tasks such as classification, feature extraction, data reconstructing, noise reduction, generating new data, etc.

o Data management in DataStore class and optimized codes for engagement with big data in some functions

Downloading the Toolbox V3.2

o Download it from here in zip format (.zip).

o The Toolbox is tested with MATLAB R2015a and Octave 4.0.0 in Windows and Linux.

o To install the Toolbox simply unpack the archive.

o Download tested datasets (or use prepareMNIST_Small function in MNIST demos).

o MNIST

o ISOLET

o MATLAB version of the 20 Newsgroups data set

o Run the demo scripts to see what it can do (change dataset path in scripts). See the documentations for more details.

o test_classificationMNIST.m

o test_classificationISOLET.m

o test_classification20Newsgroups.m

o test_generateDataMNIST.m

o test_getFeatureMNIST.m

o test_getFeatureMNIST_usingGPU.m

o test_plotDataMNIST.m

o test_reconstructDataMNIST.m

o test_sparsityAndBasesFunctionMNIST.m

o …

Documentation

o Full Description [89 pages in Persian]

o Technical Report Description [27 pages in English]

History

o DeeBNet V1.0 (.zip)

o Release Date: 12/9/2014

o Documentation: in English , in Persian

o DeeBNet V2.0 (.zip)

o Release Date: 7/10/2015

o Documentation: in English , in Persian

o New features

o Sparsity in RBM with three different methods

o Plotting bases function

o Classification and feature extraction on 20 Newsgroups datasets

o Code correction in using back propagation.

o Runtime and memory code optimization

o Normalization and Shuffling

o DeeBNet V2.1 (.zip)

o Release Date: 7/23/2015

o Documentation: in English , in Persian

o New features

o GPU support (about 5 times faster than CPU - GPU: NVIDEA GeForce GTX 780 CPU: AMD FX 8150 Eight-Core 3.6 GHz)

o Cast DBN parameters to single and double data types

o Runtime and memory code optimization

o Normalization and Shuffling

o DeeBNet V2.2 (.zip)

o Release Date: 9/7/2015

o Documentation: in English , in Persian

o New features

o Bug was fixed for computeBatchSize function in Linux.

o Revision of some demo scripts.

o DeeBNet V3.0 (.zip)

o Release Date: 1/9/2016

o Documentation: in English , in Persian

o New features

o Editing toolbox for using in Octave.

o DeeBNet V3.1 (.zip)

o Release Date: 1/19/2016

o Documentation: in English , in Persian

o New features

o Bug fix in changing learning rate.

o Expanded generateData function in using after backpropagation.

o Expanded reconstructData function in using after backpropagation.

Related publications

o If you like the Toolbox and want to cite it please reference it as:

o M. A. Keyvanrad and M. M. Homayounpour, “A brief survey on deep belief networks and introducing a new object oriented toolbox (DeeBNet),” arXiv:1408.3264 [cs], Aug. 2014.

Introduction

Since many years ago, artificial neural networks have been used in artificial intelligence applications. Pattern recognition, voice and speech analysis and natural language processing are some of these applications that use artificial neural networks. Due to some theoretical and biological reasons, deep models and architectures with many nonlinear processing layers were suggested.

These deep models have many layers and parameters that must be learnt. When the learning process is so complicated and a huge number of parameters are needed, artificial neural networks are rarely used. The problem of this number of layers is that training is time consuming and training becomes trapped at local minima. Therefore we can’t achieve acceptable results. One important tool for dealing with this problem is to use DBNs (Deep Belief Network) that can create neural networks including many hidden layers (Liu et al., 2011).Deep Belief Networks can be used in classification and feature learning. Data representation is very important in machine learning. Therefore, much work has been done for feature preprocessing, feature extraction and feature learning. In feature learning, we can create a feature extraction system and then use the extracted features in classification and other applications. Using unlabeled data in high level feature extraction (Lee et al., 2008) and also increasing discrimination between extracted features are the benefits of DBN for feature learning (Hinton and Salakhutdinov, 2006).

Layers of DBN are created from Restricted Boltzmann Machine (RBM) that is a generative and undirected probabilistic model. RBMs use a hidden layer to model the probability distribution of visible variables. Indeed, we can create a DBN for hierarchical processing using stacking RBMs. Therefore most of improvements in DBNs are due to improvement in RBMs. This paper studies different developed RBM models and introduces a new MATLAB toolbox with many DBN abilities.

Hinton presented DBNs and used it in the task of digit recognition on MNIST data set (Hinton et al., 2006). He used a DBN with 784-500-500-2000-10 structure, where the first layer possesses 784 features from 28*28 MNIST digit images. The last layer is related to 10 digit labels and other three layers are hidden layers with stochastic binary neurons. Finally this paper achieved 1.25% classification error rate on MNIST test data set.

In another paper from this author (Hinton and Salakhutdinov, 2006), DBN is used as a nonlinear model for feature extraction and dimension reduction. Indeed, the DBN may be considered as a model that can generate features in its last layer with the ability to reconstruct visible data from generated features. When a general Neural Network is used with many layers, the Neural Network becomes trapped in local minima and the performance will decrease. Therefore determining the initial values for NN weights is critical.

Another paper proposed DDBN (Discriminative Deep Belief Network) is based on DBN as a new classifier (Liu et al., 2011). This paper showed the power of DBN in using unlabeled data and also performance improvement by increasing layers (even by 50 hidden layers).

DBN applications are not limited to image processing and can be used in voice processing (Hamel and Eck, 2010; Lee et al., 2009; Mohamed et al., 2009; Vinyals and Ravuri, 2011) with significant efficiency. Some toolkits have been developed and introduced to facilitate the use of DBNs in different applications. The implemented toolboxes are developed to be used for many different tasks including classification, feature extraction, data reconstructing, noise reduction, generating new data, etc. Some of these toolboxes are listed and compared in Table 1. The comparison is based on some features and characteristics including programming language, open source object oriented programming, learning method, discriminative ability, type of visible nodes, fine tuning, possibility of being used using GPUs, and documentation. As Table 1 depicts, in comparison to other DBN toolboxes, our toolbox possesses all main features as well as different types of classes. Also it is designed to be very modular, extensible and reusable.

Table 1: A brief comparison with other implemented toolboxes.

Toolkit Name	Progr. Lang	open source	OOP^{^[1]}	Learning method	DRBM^{^[2]}	Sparse RBM	type visible nodes	fine-tuning	GPU	User Manual
deepLearn, 2014[3]	MATLAB, Octave	✔	✘	CD1	✘	✔	probability	✔	✘	Incomplete
deep autoencoder, 2006[4]	MATLAB	✔	✘	CD1	✘	✘	probability	✔	✘	Incomplete
matrbm, 2010[5]	MATLAB	✔	✘	CD1, PCD	✔	✘	probability	✘	✘	Incomplete
deepmat, 2014[6]	MATLAB	✔	✘	CDk, PCD, FPCD	✔	✔	Probability, Gaussian	✔	✔	Incomplete
DigitDemo, 2010[7]	MATLAB	✘	✘	CDk, PCD, RM, PL	✘	✘	Probability	✔	✘	Incomplete
DBN Toolbox, 2010[8]	MATLAB	✔	✔	CDk	✘	✘	Probability, Gaussian	✔	✘	Incomplete
DeeBNet (our toolbox)	MATLAB, Octave	✔	✔	Gibbs, CDk, PCD, FEPCD	✔	✔	binary, probability, Gaussian	✔	✔	complete (in English), perfect (in Persian)

The DeeBNet is an object oriented MATLAB toolbox to provide tools for conducting research using Deep Belief Networks. The toolbox has two packages with some classes and functions for managing data and sampling methods and also has some classes to define different RBMs and DBN. The following sections describe these packages and classes in more details. The Figure 1 shows relationships between implemented classes.

Figure 1: Relationships between implemented classes in DeeBNet toolbox

Base classes

In this section, the basic classes are defined. These classes will be used in RBM and DBN. The first class is ValueType that is an enumeration. This class define different types of units in DBN. These defined types can be binary (with 0 or 1 value), probability (with values in interval) and Gaussian (with any real values with zero mean and unit variance).

RbmType is also an enumeration. This class defines different types of RBMs. These defined types are generative (use data without their labels) and discriminative (need data with their labels and can classify data).

Another important class is RbmParameters that includes all parameters of an RBM such as weight matrix, biases, learning rate, etc. Most of these parameters are defined in (Hinton, 2010).

DataClasses package has one class to manage train, test and validation data. The DataStore class has some useful functions such as normalize and shuffle function for normalizing and shuffling data. Also it provides the cut function to cut training data and choose a part of it as training data. Finally the plotData function can be used for plotting some parts of data. It is useful for compare data before and after some processing stages (see Figure 2).

Figure 2: Plotting 100 samples with plotData function in DataStore class. The first image is 100 samples from MNIST dataset and the second one is reconstructed samples with a DBN model. The related code is in “test_plotData.m” file.

The second package includes the implementation of some different sampling methods. These sampling methods are Gibbs, CD, PCD and FEPCD. In Gibbs class we can generate samples from an RBM model with random initialization samples. Also this class is a parent class for other sampling classes. In the CD (Contrastive Divergence) class, we can generate samples from an RBM model with training samples initialization. This class inherits from Gibbs class.

In PCD (Persistent Contrastive Divergence) class, samples can generated from an RBM model. Unlike CD method that uses training data as initial value for visible units, PCD method uses last chain state in the last update step. Also this class inherits from Gibbs class. In this class many persistent chains can be run in parallel and we will refer to the current state in each of these chains as new sample or a “fantasy” particle.

In FEPCD (Free Energy in Persistent Contrastive Divergence) class, we define a criterion for goodness of a chain and therefore generated samples and gradient computation will be more accurate. The proposed criterion for selecting the best chain is the free energy of visible sample (Mohammad Ali Keyvanrad and Homayounpour, 2015). This class inherits from PCD class.

Finally the Sampling class is an interface class for using implemented sampling classes. Other classes can use implemented sampling classes such as CD or PCD with this useful class. In this class we use SamplingMethodType class that is an enumeration and contains types of sampling methods that are used in RBM.

RBM classes

The toolbox has six types of RBM classes. The first one, RBM class, is an abstract class that defines all necessary functions (such as training method) and features (like sampler object) in all types of RBMs and therefore we can't create an object from it. Other RBM classes are inherited from this abstract class.

The second one is GenerativeRBM class. This class has been used as a generative model and can model many different types of data. Their most important use is as learning modules that are composed to form DBNs. The GenerativeRBM class has many methods like train, getFeature, generateData, reconstructData, etc. The train method takes a DataStore object (that has training, validation and test data) and modifies the RBM parameters. The termination condition is the number of training epochs. The getFeature method, extracts features (or activity in hidden layer) from data. In other words this method samples hidden units from visible units with determined sampling method.

The generateData method can generate values of visible units from determined hidden values (or extracted features). Similar to getFeature method, generateData samples visible units from hidden units with determined sampling method. Figure 3 shows some outputs of the method. These results have been obtained from an RBM with 250 hidden units that has been trained on MNIST dataset. In this experiment, after extracting 250 feature from 9 MNIST images (28*28 pixel), the new images have been generated from extracted features. According to Figure 3, by increasing (number of sampling iterations), the generated images will be more natural and more similar to data distribution.

Figure 3: Results from an RBM with 250 hidden units that has been trained on MNIST dataset. In this experiment, after extracting 250 features from 9 MNIST images (28*28 pixel), the new images have been generated from extracted features. (a) 9 MNIST images. (b) Generated images from extracted features with sampling iteration. (c) Generated images from extracted features with sampling iterations. (d) Generated images from extracted features with sampling iterations. The related code is in “test_ generateData.m”.

The last useful method is reconstructData. This method is used for reconstructing input data. Indeed the method reconstruct data by extracting features from input data and then generating data from extracted features. In Figure 4 this method has been used to reduce noise in images. According to Figure 4, Gaussian noise has been reduced after reconstructing corrupted images.

Figure 4: reducing noise from corrupted images using reconstructData method. (a) 9 MNIST images. (b) Corrupted data with Gaussian noise with zero mean and 0.02 variance. (c) Reconstructed images from corrupted images. Gaussian noise has been reduced after reconstructing corrupted images. The related code is in “test_reconstructData2.m”.

The third RBM class is DiscriminativeRBM. With some changes, we can convert generative RBM to a discriminative RBM that can classify data. This class includes methods like methods in GenerativeRBM class. Two different methods are generateClass and predictClass. The generateClass can generate data with a specified class number (or label). According to Figure 5, the model can generate different images with only activating label unit in model. Note that the model can’t generate images for two digits (2 and 8) using only activating label unit.

Figure 5 : synthesized images with generateClass method. Generating different images with only activating label unit in model. Using this method, the model can generate different images by only activating label unit in model. Note that the model can’t generate images for two digits (2 and 8) by only activating label unit. The related code is in “test_generateClass.m”.

The other different method is predictClass. This method can predict class number (or label) of input data. The first method (byFreeEnergy) is to train a joint density model using a single RBM that has two sets of visible units. In addition to the units that represent a data vector, there is a “softmax” label unit that represents the class. After training, each possible label is tried in turn with a test vector and the one that gives lowest free energy is chosen as the most likely class (Hinton, 2010). The second method (bySampling) is to reconstruct data and return most activated softmax unit (that correspond to a label). Usually the byFreeEnergy is more accurate but is more time consuming.

Another three RBM classes are SparseRBM, SparseGenerativeRBM and SparseDiscriminativeRBM. The first one, SparseRBM class, is an abstract class that define gradient of regularization term for different sparsity methods such as quadratic sparse RBM, rate distortion sparse RBM and normal sparse RBM (M.A. Keyvanrad and Homayounpour, 2015). The SparseGenerativeRBM and SparseDiscriminativeRBM classes combine generative RBM or discriminative RBM with sparse RBM features in a separate class that GenerativeRBM and DiscriminativeRBM can be sparse.

DBN class

DBN is a generative model that is composed of multiple layers of RBMs. The class architecture allows using different RBM classes to create an arbitrary DBN and utilizes back-propagation after DBN training if needed. A DBN can be used as an autoEncoder or classifier.

An autoEncoder DBN may be used to create a generative model and can be used in many applications such as feature extraction. Figure 6 shows an autoEncoder DBN with two RBM layers. The last layer hidden units can be used as a feature vector that has been extracted from input visible data.

Figure 6: An autoEncoder DBN with two RBM layers.

A DBN can also be used as a classifier. The goal of classifier DBN is to obtain labels from input data. In this type of DBN, we need a discriminative RBM in last layer as a classifier RBM. Figure 7 shows a classifier DBN with two RBM layers where the last RBM is a discriminative RBM.

Figure 7: A classifier DBN with two RBM layers. The last RBM is a discriminative RBM.

The DBN class has some useful methods like addRBM, train, getFeature, backpropagation, getOutput, plotBases, etc. The addRBM method is used to stack RBMs. This method add each defined RBM (with RbmParameters object) to its DBN.

The train method trains DBN, layer by layer. In other words, this method trains RBMs one after another and uses their extracted features for training in the next RBM.

The “getFeature” method is used to extract features from input data. This method extracts features layer by layer and returns hidden units activation values in last hidden layer as extracted feature (see Figure 6).

Figure 8 shows extracted features in a DBN on MNIST dataset. The features were produced by a 784-1000-500-250-3 autoEncoder DBN that maps input images (784 pixel) to 3 features.

Figure 8: extracted features in a DBN on MNIST dataset. The features produced by a 784-1000-500-250-3 autoEncoder DBN that maps input images (784 pixel) to 3 features. The related code can be found in “test_getFeatureMNIST.m”file.

In another test, ISOLET dataset is used (Fanty and Cole, 1991). In ISOLET data set, 150 subjects utter twice the name of each letter of the alphabet. There are 7797 examples in total, referred to as isolet1-isolet5 (6238 training examples and 1559 test examples). Figure 9 shows extracted features in a DBN on ISOLET dataset. The features produced by a 617-2000-1000-500-250-2 and a 617-2000-1000-500-250-3 autoEncoder DBN that maps input data (617 features) to 2 or 3 features.

Figure 9: extracted features from a DBN on ISOLET dataset with 617 features and 26 different classes (26 different spoken letters). Ten randomly selected letters are shown. Left: The features produced by a 617-2000-1000-500-250-2 autoEncoder DBN. Right: The features produced by a 617-2000-1000-500-250-3 autoEncoder DBN. The related code is in “test_getFeatureISOLET.m”.

Also in another test, 20 Newsgroups dataset is used. The 20 Newsgroups[9] dataset is organized into 20 different newsgroups, each corresponding to a different topic. The 20 Newsgroups dataset has become a popular data set for experiments in text applications of machine learning techniques, such as text classification and text clustering. Figure 10 shows extracted features in a DBN on 20 Newsgroups dataset. The features produced by a 5000-500-500-250-3 autoEncoder DBN that maps input data (5000 features) to 3 features.

According to Figure 8, Figure 9 and Figure 10, DBN can obtain good features with acceptable discrimination between them. Note that these features has been learnt without using their labels.

Figure 10: extracted features from a DBN on 20 Newsgroups dataset with 5000 features and 20 different classes (20 different newsgroups). In this figure, five selected newsgroups are shown. The features produced by a 5000-500-500-250-3 autoEncoder DBN. The related code is in “test_getFeature20Newsgroups.m”.

The next useful method is backpropagation method. This method uses back-propagation algorithm to fine-tune pertained parameters. Our toolbox uses MATLAB neural network toolbox. Hence the method first converts a DBN to a MATLAB neural network object (according to DBN type) and then uses its back-propagation algorithm.

Figure 11 shows, how a DBN with a discriminative RBM in last layer converts to a MATLAB neural network structure. In this conversion, the softmax units in discriminative RBM and their corresponding weights are set as output neural network layer.

Figure 11: Conversion of a classifier DBN to a MATLAB neural network structure. Left: A DBN with a discriminative RBM in last layer. Right: A neural network structure with softmax units and their weights in DBN as output layer.

In an autoEncoder DBN, conversion to neural network structure is done differently. Figure 12 shows, how we add an upside down DBN to reconstruct input data (Hinton and Salakhutdinov, 2006). This neural network structure can be fine-tuned using back-propagation algorithm.

Figure 12: Conversion of an autoEncoder DBN to a MATLAB neural network structure. Left: A DBN with generative RBMs. Right: A neural network structure with the added upside down DBN to reconstruct input data.

The other method is getOutput that is used to get DBN outputs. This method returns results according to type of the DBN. Therefore in an autoEncoder or classifier DBN, results are extracted features or labels respectively.

The last method is plotBases that can be used to plot bases function that has been learned by DBN.

Figure 13: Bases function that plotBases function can plot. These bases function are from a two layer sparse normal DBN that has been learned on MNIST dataset. Left: bases function in first layer. Right: bases function in second layer.

Table 2 shows a classification experiment using this toolbox on MNIST and ISOLET dataset. This table compares different sampling method types that has been implemented in our toolbox, before and after back-propagation.

Table 2: Classification error on MNIST dataset for a DBN (784-500-500-2000) and on ISOLET dataset for a DBN (617-1000-1000-2000) and on 20 Newsgroups dataset for a DBN (5000-500-500-2000) using different sampling methods. After training each RBM, the DBN was fine-tuned in 200 epochs using back-propagation method.

Method	MNIST		ISOLET		20 Newsgroups
Method	Before BP	After BP	Before BP	After BP	Before BP	After BP
CD	0.0636	0.0124	0.0552	0.0372	0.3087	0.2686
PCD	0.0307	0.0122	0.0500	0.0385	0.3183	0.2642
FEPCD	0.0248	0.0099	0.0449	0.0353	0.3161	0.2678

References

Fanty, M.A., Cole, R.A., 1991. Spoken Letter Recognition. Presented at the Advances in Neural Information Processing Systems, pp. 220–226.

Hamel, P., Eck, D., 2010. Learning features from music audio with deep belief networks. In: 11th International Society for Music Information Retrieval Conference (ISMIR 2010).

Hinton, G., 2010. A practical guide to training restricted boltzmann machines (Technical report), 2010-003. Machine Learning Group, University of Toronto.

Hinton, G.E., Osindero, S., Teh, Y.-W., 2006. A Fast Learning Algorithm for Deep Belief Nets. Neural Computation 18, 1527–1554.

Hinton, G.E., Salakhutdinov, R.R., 2006. Reducing the dimensionality of data with neural networks. Science 313, 504–507.

Keyvanrad, M.A., Homayounpour, M.M., 2015. Deep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy. Int. J. Patt. Recogn. Artif. Intell.

Keyvanrad, M.A., Homayounpour, M.M., 2015. Effective Sparsity Control in Deep Belief Networks using Normal Regularization Term. submitted to Neural Networks.

Lee, H., Ekanadham, C., Ng, A., 2008. Sparse deep belief net model for visual area V2. Advances in neural information processing systems 20, 873–880.

Lee, H., Largman, Y., Pham, P., Ng, A.Y., 2009. Unsupervised feature learning for audio classification using convolutional deep belief networks. Advances in neural information processing systems 22, 1096–1104.

Liu, Y., Zhou, S., Chen, Q., 2011. Discriminative deep belief networks for visual data classification. Pattern Recognition 44, 2287–2296.

Mohamed, A., Dahl, G., Hinton, G., 2009. Deep belief networks for phone recognition. In: NIPS Workshop on Deep Learning for Speech Recognition and Related Applications. Canada, pp. 1–9.

Vinyals, O., Ravuri, S.V., 2011. Comparing multilayer perceptron to Deep Belief Network Tandem features for robust ASR. In: Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on. pp. 4596–4599.

[2] Discriminative Restricted Boltzmann Machine