## Gaussian Processes for Machine Learning

Carl Edward Rasmussen and Christopher K. I. Williams

MIT Press, 2006. ISBN-10 0-262-18253-X, ISBN-13 978-0-262-18253-9.

**Book description**

Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decade, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics.

The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes.

### About the Author

Christopher K. I. Williams is Professor of Machine Learning and Director of the Institute for Adaptive and Neural Computation in the School of Informatics, University of Edinburgh.

**The whole book as a single pdf file.**

### List of contents and individual chapters in pdf format

- Table of Contents
- Series Foreword
- Preface
- Symbols and Notation
- 1 Introduction
- 1.1 A Pictorial Introduction to Bayesian Modelling
- 1.2 Roadmap
- 2 Regression
- 2.1 Weight-space View
- 2.2 Function-space View
- 2.3 Varying the Hyperparameters
- 2.4 Decision Theory for Regression
- 2.5 An Example Application
- 2.6 Smoothing, Weight Functions and Equivalent Kernels
- 2.7 History and Related Work
- 2.8 Appendix: Infinite Radial Basis Function Networks
- 2.9 Exercises
- 3 Classification
- 3.1 Classification Problems
- 3.2 Linear Models for Classification
- 3.3 Gaussian Process Classification
- 3.4 The Laplace Approximation for the Binary GP Classifier
- 3.5 Multi-class Laplace Approximation
- 3.6 Expectation Propagation
- 3.7 Experiments
- 3.8 Discussion
- 3.9 Appendix: Moment Derivations
- 3.10 Exercises
- 4 Covariance Functions
- 4.1 Preliminaries
- 4.2 Examples of Covariance Functions
- 4.3 Eigenfunction Analysis of Kernels
- 4.4 Kernels for Non-vectorial Inputs
- 4.5 Exercises
- 5 Model Selection and Adaptation of Hyperparameters
- 5.1 The Model Selection Problem
- 5.2 Bayesian Model Selection
- 5.3 Cross-validation
- 5.4 Model Selection for GP Regression
- 5.5 Model Selection for GP Classification
- 5.6 Exercises
- 6 Relationships between GPs and Other Models
- 6.1 Reproducing Kernel Hilbert Spaces
- 6.2 Regularization
- 6.3 Spline Models
- 6.4 Support Vector Machines
- 6.5 Least-Squares Classification
- 6.6 Relevance Vector Machines
- 6.7 Exercises
- 7 Theoretical Perspectives
- 7.1 The Equivalent Kernel
- 7.2 Asymptotic Analysis
- 7.3 Average-case Learning Curves
- 7.4 PAC-Bayesian Analysis
- 7.5 Comparison with Other Supervised Learning Methods
- 7.6 Appendix: Learning Curve for the Ornstein-Uhlenbeck Process
- 7.7 Exercises
- 8 Approximation Methods for Large Datasets
- 8.1 Reduced-rank Approximations of the Gram Matrix
- 8.2 Greedy Approximation
- 8.3 Approximations for GPR with Fixed Hyperparameters
- 8.4 Approximations for GPC with Fixed Hyperparameters
- 8.5 Approximating the Marginal Likelihood and its Derivatives
- 8.6 Appendix: Equivalence of SR and GPR using the Nyström Approximate Kernel
- 8.7 Exercises
- 9 Further Issues and Conclusions
- 9.1 Multiple Outputs
- 9.2 Noise Models with Dependencies
- 9.3 Non-Gaussian Likelihoods
- 9.4 Derivative Observations
- 9.5 Prediction with Uncertain Inputs
- 9.6 Mixtures of Gaussian Processes
- 9.7 Global Optimization
- 9.8 Evaluation of Integrals
- 9.9 Student’s t Process
- 9.10 Invariances
- 9.11 Latent Variable Models
- 9.12 Conclusions and Future Directions
- A Mathematical Background
- B Gaussian Markov Processes
- C Datasets and Code
- Bibliography
- Author Index
- Subject Index

Go back to the web page for Gaussian Processes for Machine Learning.