You are here

Learning Coefficient in Bayesian Estimation of Restricted Boltzmann Machine

Journal Name:

Publication Year:

Author NameUniversity of Author
Abstract (Original Language): 
We consider the real log canonical threshold for the learning model in Bayesian estimation. This threshold corresponds to a learning coefficient of generalization error in Bayesian estimation, which serves to measure learning efficiency in hierarchical learning models [30, 31, 33]. In this paper, we clarify the ideal which gives the log canonical threshold of the restricted Boltzmann machine and consider the learning coefficients of this model.
31
58

REFERENCES

References: 

[1] H. Akaike. A new look at the statistical model identification. IEEE Trans. on Automatic
Control, 19:716–723, 1974.
[2] M. Aoyagi. The zeta function of learning theory and generalization error of three
layered neural perceptron. RIMS Kokyuroku, Recent Topics on Real and Complex
Singularities, 1501:153–167, 2006.
[3] M. Aoyagi. Log canonical threshold of Vandermonde matrix type singularities and
generalization error of a three layered neural network. International Journal of Pure
and Applied Mathematics, 52(2):177–204, 2009.
[4] M. Aoyagi. A Bayesian learning coefficient of generalization error and Vandermonde
matrix-type singularities. Communications in Statistics - Theory and Methods,
39(15):2667–2687, 2010.
[5] M. Aoyagi. Stochastic complexity and generalization error of a restricted Boltzmann
machine in Bayesian estimation. Journal of Machine Learning Research,
11(Apr):1243–1272, 2010.
[6] M. Aoyagi and K. Nagata. Learning coefficient of generalization error in Bayesian estimation
and Vandermonde matrix type singularity. Neural Computation, 24(6):1569–
1610, 2012.
[7] M. Aoyagi and S. Watanabe. Resolution of singularities and the generalization error
with Bayesian estimation for layered neural network. IEICE Trans. J88-D-II,
10:2112–2124, 2005a.
[8] M. Aoyagi and S. Watanabe. Stochastic complexities of reduced rank regression in
Bayesian estimation. Neural Networks, 18:924–933, 2005b.
[9] I. N. Bernstein. The analytic continuation of generalized functions with respect to a
parameter. Functional Anal. Appl., 6:26–40, 1972.
[10] J. E. Bj˝ork. Rings of differential operators. Amsterdam: North-Holland, 1979.
[11] M. A. Cueto, J. Morton, and B. Sturmfels. Geometry of the restricted Boltzmann
machine. Contemporary Mathematics: Algebraic Methods in Statistics and Probability
II, 516:135–153, 2010.
[12] M. Drton. Conference lecture: Reduced rank regression. Workshop on Singular
Learning Theory, AIM 2011, http://math.berkeley.edu/ critch/slt2011/, 2011.
[13] M. Drton. Conference lecture: Bayesian information criterion for singular models.
Algebraic Statistics 2012 in the Alleghenies at The Pennsylvania State University,
http://jasonmorton.com/aspsu2012/, 2012.
[14] W. Fulton. Introduction to toric varieties, Annals of Mathematics Studies. Princeton
University Press, 1993.
[15] E. J. Hannan and B. G. Quinn. The determination of the order of an autoregression.
Journal of Royal Statistical Society Series B, 41:190–195, 1979.
[16] H. Hironaka. Resolution of singularities of an algebraic variety over a field of characteristic
zero. Annals of Math, 79:109–326, 1964.
[17] M. Kashiwara. B-functions and holonomic systems. Inventions Math., 38:33–53, 1976.
[18] J. Koll´ar. Singularities of pairs. Algebraic geometry-Santa Cruz 1995, Proc. Sympos.
Pure Math., Amedsr. Math. Soc., Providence, RI,, 62:221–287, 1997.
[19] S. Lin. Asymptotic approximation of marginal likelihood integrals. (preprint), 2010.
[20] G. Montufar and N. Ay. Refinements of universal approximation results for deep belief
networks and restricted Boltzmann machines. Neural Computation, 2011, 23(5):1306–
1319, 2011.
[21] N. J. Murata, S. G. Yoshizawa, and S. Amari. Network information criterion - determining
the number of hidden units for an artificial neural network model. IEEE
Trans. on Neural Networks, 5(6):865–872, 1994.
[22] M. Mustata. Singularities of pairs via jet schemes. J. Amer. Math. Soc., 15:599–615,
2002.
[23] K. Nagata and S.Watanabe. Exchange Monte Carlo sampling from Bayesian posterior
for singular learning machines. IEEE Transactions on Neural Networks, 19(7):1253–
1266, 2008a.
[24] K. Nagata and S. Watanabe. Asymptotic behavior of exchange ratio in exchange
Monte Carlo method. International Journal of Neural Networks, 21(7):980–988,
2008b.
[25] J. Rissanen. Universal coding, information, prediction, and estimation. IEEE Trans.
on Information Theory, 30(4):629–636, 1984.
[26] D. Rusakov and D. Geiger. Asymptotic model selection for naive Bayesian networks.
Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence,
pages 438–445, 2002.
[27] D. Rusakov and D. Geiger. Asymptotic model selection for naive Bayesian networks.
Journal of Machine Learning Research, 6:1–35, 2005.
[28] G. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6(2):461–464,
1978.
[29] K. Takeuchi. Distribution of an information statistic and the criterion for the optimal
model. Mathematical Science, 153:12–18, 1976.
[30] S. Watanabe. Algebraic analysis for nonidentifiable learning machines. Neural Computation,
13(4):899–933, 2001a.
[31] S. Watanabe. Algebraic geometrical methods for hierarchical learning machines. Neural
Networks, 14(8):1049–1060, 2001b.
[32] S. Watanabe. Algebraic geometry of learning machines with singularities and their
prior distributions. Journal of Japanese Society of Artificial Intelligence, 16(2):308–
315, 2001c.
[33] S. Watanabe. Algebraic Geometry and Statistical Learning Theory, volume 25. Cambridge
University Press, 2009.
[34] S. Watanabe. Equations of states in singular statistical estimation. Neural Networks,
23(1):20–34, 2010.
[35] P. Zwiernik. An asymptotic behavior of the marginal likelihood for general Markov
models. Journal of Machine Learning Research, 12:3283–3310, 2011.

Thank you for copying data from http://www.arastirmax.com