Neural networks for conditional probability estimation : forecasting beyond point predictions

Dirk Husmeier

This volume presents a neural network architecture for the prediction of conditional probability densities - which is vital when carrying out universal approximation on variables which are either strongly skewed or multimodal. Two alternative approaches are discussed: the GM network, in which all parameters are adapted in the training scheme, and the GM-RVFL model which draws on the random functional link net approach. Points of particular interest are: - it examines the modification to standard approaches needed for conditional probability prediction; - it provides the first real-world test results for recent theoretical findings about the relationship between generalisation performance of committees and the over-flexibility of their members; This volume will be of interest to all researchers, practitioners and postgraduate / advanced undergraduate students working on applications of neural networks - especially those related to finance and pattern recognition.

「Nielsen BookData」より

[目次]

  • 1. Introduction.- 1.1 Conventional forecasting and Takens' embedding theorem.- 1.2 Implications of observational noise.- 1.3 Implications of dynamic noise.- 1.4 Example.- 1.5 Conclusion.- 1.6 Objective of this book.- 2. A Universal Approximator Network for Predicting Conditional Probability Densities.- 2.1 Introduction.- 2.2 A single-hidden-layer network.- 2.3 An additional hidden layer.- 2.4 Regaining the conditional probability density.- 2.5 Moments of the conditional probability density.- 2.6 Interpretation of the network parameters.- 2.7 Gaussian mixture model.- 2.8 Derivative-of-sigmoid versus Gaussian mixture model.- 2.9 Comparison with other approaches.- 2.9.1 Predicting local error bars.- 2.9.2 Indirect method.- 2.9.3 Complete kernel expansion: Conditional Density Estimation Network (CDEN) and Mixture Density Network (MDN).- 2.9.4 Distorted Probability Mixture Network (DPMN).- 2.9.5 Mixture of Experts (ME) and Hierarchical Mixture of Experts (HME).- 2.9.6 Soft histogram.- 2.10 Summary.- 2.11 Appendix: The moment generating function for the DSM network.- 3. A Maximum Likelihood Training Scheme.- 3.1 The cost function.- 3.2 A gradient-descent training scheme.- 3.2.1 Output weights.- 3.2.2 Kernel widths.- 3.2.3 Remaining weights.- 3.2.4 Interpretation of the parameter adaptation rules.- 3.2.5 Deficiencies of gradient descent and their remedy.- 3.3 Summary.- 3.4 Appendix.- 4. Benchmark Problems.- 4.1 Logistic map with intrinsic noise.- 4.2 Stochastic combination of two stochastic dynamical systems.- 4.3 Brownian motion in a double-well potential.- 4.4 Summary.- 5. Demonstration of the Model Performance on the Benchmark Problems.- 5.1 Introduction.- 5.2 Logistic map with intrinsic noise.- 5.2.1 Method.- 5.2.2 Results.- 5.3 Stochastic coupling between two stochastic dynamical systems.- 5.3.1 Method.- 5.3.2 Results.- 5.3.3 Auto-pruning.- 5.4 Brownian motion in a double-well potential.- 5.4.1 Method.- 5.4.2 Results.- 5.4.3 Comparison with other approaches.- 5.5 Conclusions.- 5.6 Discussion.- 6. Random Vector Functional Link (RVFL) Networks.- 6.1 The RVFL theorem.- 6.2 Proof of the RVFL theorem.- 6.3 Comparison with the multilayer perceptron.- 6.4 A simple illustration.- 6.5 Summary.- 7. Improved Training Scheme Combining the Expectation Maximisation (EM) Algorithm with the RVFL Approach.- 7.1 Review of the Expectation Maximisation (EM) algorithm.- 7.2 Simulation: Application of the GM network trained with the EM algorithm.- 7.2.1 Method.- 7.2.2 Results.- 7.2.3 Discussion.- 7.3 Combining EM and RVFL.- 7.4 Preventing numerical instability.- 7.5 Regularisation.- 7.6 Summary.- 7.7 Appendix.- 8. Empirical Demonstration: Combining EM and RVFL.- 8.1 Method.- 8.2 Application of the GM-RVFL network to predicting the stochastic logistic-kappa map.- 8.2.1 Training a single model.- 8.2.2 Training an ensemble of models.- 8.3 Application of the GM-RVFL network to the double-well problem.- 8.3.1 Committee selection.- 8.3.2 Prediction.- 8.3.3 Comparison with other approaches.- 8.4 Discussion.- 9. A simple Bayesian regularisation scheme.- 9.1 A Bayesian approach to regularisation.- 9.2 A simple example: repeated coin flips.- 9.3 A conjugate prior.- 9.4 EM algorithm with regularisation.- 9.5 The posterior mode.- 9.6 Discussion.- 10. The Bayesian Evidence Scheme for Regularisation.- 10.1 Introduction.- 10.2 A simple illustration of the evidence idea.- 10.3 Overview of the evidence scheme.- 10.3.1 First step: Gaussian approximation to the probability in parameter space.- 10.3.2 Second step: Optimising the hyperparameters.- 10.3.3 A self-consistent iteration scheme.- 10.4 Implementation of the evidence scheme.- 10.4.1 First step: Gaussian approximation to the probability in parameter space.- 10.4.2 Second step: Optimising the hyperparameters.- 10.4.3 Algorithm.- 10.5 Discussion.- 10.5.1 Improvement over the maximum likelihood estimate.- 10.5.2 Justification of the approximations.- 10.5.3 Final remark.- 11. The Bayesian Evidence Scheme for Model Selection.- 11.1 The evidence for the model.- 11.2 An uninformative prior.- 11.3 Comparison with MacKay's work.- 11.4 Interpretation of the model evidence.- 11.4.1 Ockham factors for the weight groups.- 11.4.2 Ockham factors for the kernel widths.- 11.4.3 Ockham factor for the priors.- 11.5 Discussion.- 12. Demonstration of the Bayesian Evidence Scheme for Regularisation.- 12.1 Method and objective.- 12.1.1 Initialisation.- 12.1.2 Different training and regularisation schemes.- 12.1.3 Pruning.- 12.2 Large Data Set.- 12.3 Small Data Set.- 12.4 Number of well-determined parameters and pruning.- 12.4.1 Automatic self-pruning.- 12.4.2 Mathematical elucidation of the pruning scheme.- 12.5 Summary and Conclusion.- 13. Network Committees and Weighting Schemes.- 13.1 Network committees for interpolation.- 13.2 Network committees for modelling conditional probability densities.- 13.3 Weighting Schemes for Predictors.- 13.3.1 Introduction.- 13.3.2 A Bayesian approach.- 13.3.3 Numerical problems with the model evidence.- 13.3.4 A weighting scheme based on the cross-validation performance.- 14. Demonstration: Committees of Networks Trained with Different Regularisation Schemes.- 14.1 Method and objective.- 14.2 Single-model prediction.- 14.3 Committee prediction.- 14.3.1 Best and average single-model performance.- 14.3.2 Improvement over the average single-model performance.- 14.3.3 Improvement over the best single-model performance.- 14.3.4 Robustness of the committee performance.- 14.3.5 Dependence on the temperature.- 14.3.6 Dependence on the temperature when including biased models.- 14.3.7 Optimal temperature.- 14.3.8 Model selection and evidence.- 14.3.9 Advantage of under-regularisation and over-fitting.- 14.4 Conclusions.- 15. Automatic Relevance Determination (ARD).- 15.1 Introduction.- 15.2 Two alternative ARD schemes.- 15.3 Mathematical implementation.- 15.4 Empirical demonstration.- 16. A Real-World Application: The Boston Housing Data.- 6.1 A real-world regression problem: The Boston house-price data.- 16.2 Prediction with a single model.- 16.2.1 Methodology.- 16.2.2 Results.- 16.3 Test of the ARD scheme.- 16.3.1 Methodology.- 16.3.2 Results.- 16.4 Prediction with network committees.- 16.4.1 Objective.- 16.4.2 Methodology.- 16.4.3 Weighting scheme and temperature.- 16.4.4 ARD parameters.- 16.4.5 Comparison between the two ARD schemes.- 16.4.6 Number of kernels.- 16.4.7 Bayesian regularisation.- 16.4.8 Network complexity.- 16.4.9 Cross-validation.- 16.5 Discussion: How overfitting can be useful.- 16.6 Increasing diversity.- 16.6.1 Bagging.- 16.6.2 Nonlinear Preprocessing.- 16.7 Comparison with Neal's results.- 16.8 Conclusions.- 17. Summary.- 18. Appendix: Derivation of the Hessian for the Bayesian Evidence Scheme.- 18.1 Introduction and notation.- 18.2 A decomposition of the Hessian using EM.- 18.3 Explicit calculation of the Hessian.- 18.4 Discussion.- References.

「Nielsen BookData」より

この本の情報

書名 Neural networks for conditional probability estimation : forecasting beyond point predictions
著作者等 Husmeier, Dirk
シリーズ名 Perspectives in neural computing
出版元 Springer-Verlag
刊行年月 c1999
ページ数 xxiii, 275 p.
大きさ 24 cm
ISBN 1852330953
NCID BA41502229
※クリックでCiNii Booksを表示
言語 英語
出版国 イギリス
この本を: 
このエントリーをはてなブックマークに追加

このページを印刷

外部サイトで検索

この本と繋がる本を検索

ウィキペディアから連想