Structural Optimization of Deep Belief Network by Evolutionary Computation Methods including Tabu Search
DOI:
https://doi.org/10.14738/tmlai.61.4048Keywords:
Structural optimization, Deep Belief Network, Tabu search, Modularization, Evolutionary ComputationAbstract
This paper proposes structural optimization method of a Deep Belief Network (DBN) which consists of multiple Restricted Boltzmann Machines (RBMs) and a single Feedforward Neural Network (FNN) using several kinds of evolutionary computation methods and modularization. The performance, accuracy of data classification or data prediction, should strongly depend on the structure of the network. Concretely, the number of RMBs, the number of nodes in the hidden layer of RMB. The result of the experiments using some benchmarks for image data classification problems by DBN optimized by the proposed method, DBN without any structural optimization, and some other data classification methods indicate that our proposed method defeats other existing classification methods.
References
(1) [AgrawalEtal08] S. Agrawal, Y. Dahora, M.K. Tiwari and Y.-J. Son, “Interactive particle swarm: a Pareto-adaptive metaheuristic to multiobjective optimization,” IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 38, pp. 258-277, 2008.
(2) [Bengio09] Y. Bengio, “Learning deep architectures for AI”, M. Jordan eds., Foudations and Trends in Machine Learning, 2, Berkley, CA, USA, pp. 1-127, 2009.
(3) [Chen08] E. Chen, X. Yang, H. Zha, R. Zhang and W. Zhang, “Learning object classes from image thumbnails through deep neural networks”, International Conference on Acoustics, Speech and Signal Processing, pp. 829-832, 2008.
(4) M. Delgado, M.P. Cuellar and M.C. Pegalajar, “Multiobjective hybrid optimization and training of recurrent neural networks,” IEEE Transactions on Systems, Man, And Cybernetics-Part B: Cybernetics, 38, pp. 381-403, 2008.
(5) J.L. Elman, “Finding structure in time,” Cognitive Science, 14, pp. 179-211, 1999.
(6) F. Glover and M. Laguna, Tabu Search, Kluwer Academic Publishers, 1997.
(7) T. Hayashida, I. Nishizaki and T. Matsumoto, “Structural optimization of neural network for data prediction using dimensional compression and tabu search,” 2013 IEEE 6th International Workshop on
Computational Intelligence & Applications Proceedings
(IWCIA 2013), Hiroshima, Japan, pp. 85-88, 2013.
(8) T. Hayashida, I. Nishizaki and A. Suemune, “Structural optimization of recurrent neural networks using tabu search,” Journal of Japan Society for Fuzzy Theory and Intelligent Informatics, 27, pp. 638-349, 2015.
(9) G.E. Hinton, S. Osindero and Y.-W. Teh, “A fast lerning algorithm for deep belief nets,” Neural Computation, 18, pp. 1527-1554, 2006.
(10) G.E. Hinton and R.R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, 313, pp. 504-507, 2006.
(11) B.Q. Huang, T. Rashid and M.-T. Kechadi, “A new modified network based on the elman network,” Proceedings of IASTED International Conference on Artificial Intelligence and Application, 1, pp. 379-384, 2004.
(12) B.Q. Huang, T. Rashid and M.-T. Kechadi., “Multi-context recurrent neural network for time series application”, International Journal of Computer, Electrical, Automation, Control and Information Engineering, 1, pp. 3073-3082, 2007.
(13) Y. LeCun, L. Bottou, L. Bengio and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, pp. 1-46, 1998.
(14) M.I. Jordan, “Attractor dynamics and parallelism in a connectionist sequential machine,” Proceedings of the 8th Annual Conference of the Cognitive Science Society, E. Cliffs, NJ: Erlbaum, pp. 531-546, Reprinted in IEEE Tutorials Series, New York: IEEE Publishing Services, 1990.
(15) H. Katagiri, I. Nishizaki, T. Hayashida and T. Kadoma, “Multiobjective evolutionary optimization of training and topology of recurrent neural networks for time-series prediction,” The Computer Journal, 55, pp. 325-336, 2011.
(16) Y. Liu, S. Zhou and Q. Chen,``Discriminative deep
belief networks for visual data classification'', Pattern Recognition, 44, pp. 2287-2296, 2011.
(17) F.-F. Li, R. Fergus and P. Perona, “Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories”, Computer Vision and Image Understanding, 106, pp. 59-70, 2007.
(18) [Martens10] J. Martens, “Deep learning via Hessian-free optimization,” Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, 2010.
(19) [McCullochPits43] W.S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity,” Bulletin of Mathematical Biophysics, 5, pp. 115-133, 1943.
(20) M. Nishida, T. Hayashida, I. Nishizaki, and S. Sekizaki, “Structual optimization of neural network and training data selection based on SOM,” 2014 IEEE SMC Hiroshima Chapter Young Researchers’ Workshop Proceedings, pp. 117-120, 2014.
(21) [Park13] D.-C. Park, “Structure optimization of bilinear recurrent neural networks and its application to ethernet network traffic prediction,” Information Sciences, 237, pp. 18-28, 2013.
(22) [Prokhorov98] D.V. Prokhorov, E.W. Saad and D.C. Wunsch, “Comparative study of stock trend prediction using time delay, recurrent and probabilistic neural networks,” IEEE Transactions on Neural Networks, 9, pp. 1456-1470, 1998.
(23) [QuocEtal11] V. Le Quoc, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow and A.Y. Ng, ”On optimization methods for deep learning”, Proceedings of the 28 th International Conference on Machine Learning, Bellevue, WA, USA, 2011.
(24) [SrivastraSalakhutdinov12] N. Srivastava and R. Salakhutdinov, “Multimodal learning with deep Boltzmann machines,” Representation Learning Workshop, 2012.
(25) R. Salakhutdinov and G. Hinton, “Deep Boltzmann machines,” Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, Clearwater Beach, Florida, USA, pp. 448-455, 2009.
(26) [ScullerEtal11] B. Sculler, S. Steidl, A. Batliner, F. Schiel and J. Krajewski, The interspeech 2011 Speaker State Challenge, Florence, Italy, 2011.
(27) N. Srivastave, G. Hinton, A. Krizhevsky, I. Sutskever and R. Salakhutdinov, “Drop out: A simple way to prevent neural networks from overfitting”, Journal of Machine Learning Research, 15, pp. 1929-1958, 2014.
(28) [VriesPrincipe90] B. de Vries and J.C. Principe, “A theory for neural networks with time delays,” Proceedings of the 1990 Conference on Advances in Neural Information Processing Systems, 3, pp. 162-168, 1990.
(29) [DoSSweb] “DoSS@d”, http://mo161.soci.ous.ac.jp/@d/index.html, (accessed 2017-12-13).
(30) [UCweb] “UC Irvine Machine Learning Repositor”, http://archive.ics.uci.edu/ml/, (accessed 2017-12-13).