GPy/doc/tuto_GP_regression.rst


*************************************
Gaussian process regression tutorial
*************************************

We will see in this tutorial the basics for building a 1 dimensional and a 2 dimensional Gaussian process regression model, also known as a kriging model.

We first import the libraries we will need: ::

    import pylab as pb
    pb.ion()
    import numpy as np
    import GPy

1 dimensional model
===================

For this toy example, we assume we have the following inputs and outputs::

    X = np.random.uniform(-3.,3.,(20,1))
    Y = np.sin(X) + np.random.randn(20,1)*0.05

Note that the observations Y include some noise.

The first step is to define the covariance kernel we want to use for the model. We choose here a kernel based on Gaussian kernel (i.e. rbf or square exponential) plus some white noise::

    Gaussian = GPy.kern.rbf(D=1)
    noise = GPy.kern.white(D=1)
    kernel = Gaussian + noise

The parameter ``D`` stands for the dimension of the input space. Note that many other kernels are implemented such as:

* linear (``GPy.kern.linear``)
* exponential kernel (``GPy.kern.exponential``)
* Matern 3/2 (``GPy.kern.Matern32``)
* Matern 5/2 (``GPy.kern.Matern52``)
* spline (``GPy.kern.spline``)
* and many others...

The inputs required for building the model are the observations and the kernel::

    m = GPy.models.GP_regression(X,Y,kernel)

The functions ``print`` and ``plot`` give an insight of the model we have just build. The code::

    print m
    m.plot()

gives the following output: ::

    Marginal log-likelihood: -2.281e+01
           Name        |  Value   |  Constraints  |  Ties  |  Prior  
    -----------------------------------------------------------------
       rbf_variance    |  1.0000  |               |        |         
      rbf_lengthscale  |  1.0000  |               |        |         
      white_variance   |  1.0000  |               |        |         

.. figure::  Figures/tuto_GP_regression_m1.png
    :align:   center
    :height: 350px

    GP regression model before optimization of the parameters. The shaded region corresponds to 95% confidence intervals (ie +/- 2 standard deviation).

The default values of the kernel parameters may not be relevant for the current data (for example, the confidence intervals seems too wide on the previous figure). A common approach is to find the values of the parameters that maximize the likelihood of the data. There are two steps for doing that with GPy:

* Constrain the parameters of the kernel to ensure the kernel will always be a valid covariance structure (For example, we don\'t want some variances to be negative!).
* Run the optimization

There are various ways to constrain the parameters of the kernel. The most basic is to constrain all the parameters to be positive::

    m.constrain_positive('')

but it is also possible to set a range on to constrain one parameter to be fixed. The parameter of ``m.constrain_positive`` is a regular expression that matches the name of the parameters to be constrained (as seen in ``print m``). For example, if we want the variance to be positive, the lengthscale to be in [1,10] and the noise variance to be fixed we can write::

    m.unconstrain('')                            # Required to remove the previous constrains
    m.constrain_positive('rbf_variance')
    m.constrain_bounded('lengthscale',1.,10. )
    m.constrain_fixed('white',0.0025)

Once the constrains have been imposed, the model can be optimized::

    m.optimize()

If we want to perform some restarts to try to improve the result of the optimization, we can use the optimize_restart function::

    m.optimize_restarts(Nrestarts = 10)

Once again, we can use ``print(m)`` and ``m.plot()`` to look at the resulting model  resulting model::

    Marginal log-likelihood: 2.001e+01
           Name        |  Value   |  Constraints  |  Ties  |  Prior  
    -----------------------------------------------------------------
       rbf_variance    |  0.8033  |     (+ve)     |        |         
      rbf_lengthscale  |  1.8033  |  (1.0, 10.0)  |        |         
      white_variance   |  0.0025  |     Fixed     |        |               

.. figure::  Figures/tuto_GP_regression_m2.png
    :align:   center
    :height: 350px

    GP regression model after optimization of the parameters.


2 dimensional example
=====================

Here is a 2 dimensional example::

    import pylab as pb
    pb.ion()
    import numpy as np
    import GPy

    # sample inputs and outputs
    X = np.random.uniform(-3.,3.,(50,2))
    Y = np.sin(X[:,0:1]) * np.sin(X[:,1:2])+np.random.randn(50,1)*0.05

    # define kernel
    ker = GPy.kern.Matern52(2,ARD=True) + GPy.kern.white(2)

    # create simple GP model
    m = GPy.models.GP_regression(X,Y,ker)

    # contrain all parameters to be positive
    m.constrain_positive('')

    # optimize and plot
    pb.figure()
    m.optimize('tnc', max_f_eval = 1000)

    m.plot()
    print(m)

The flag ``ARD=True`` in the definition of the Matern kernel specifies that we want one lengthscale parameter per dimension (ie the GP is not isotropic). The output of the last 2 lines is::

    Marginal log-likelihood: 2.893e+01
               Name            |  Value   |  Constraints  |  Ties  |  Prior  
    -------------------------------------------------------------------------
        Mat52_ARD_variance     |  0.4094  |     (+ve)     |        |         
      Mat52_ARD_lengthscale_0  |  2.1060  |     (+ve)     |        |         
      Mat52_ARD_lengthscale_1  |  2.0546  |     (+ve)     |        |         
          white_variance       |  0.0012  |     (+ve)     |        |         

.. figure::  Figures/tuto_GP_regression_m3.png
    :align:   center
    :height: 350px

    Contour plot of the best predictor (posterior mean).
linear kernel now has an ARD flag 2013-01-28 16:21:32 +00:00
			`*************************************`
			`Gaussian process regression tutorial`
			`*************************************`

rst files from documentation 2013-01-31 10:47:24 +00:00			`We will see in this tutorial the basics for building a 1 dimensional and a 2 dimensional Gaussian process regression model, also known as a kriging model.`
linear kernel now has an ARD flag 2013-01-28 16:21:32 +00:00
			`We first import the libraries we will need: ::`

			`import pylab as pb`
			`pb.ion()`
			`import numpy as np`
			`import GPy`

			`1 dimensional model`
			`===================`

			`For this toy example, we assume we have the following inputs and outputs::`

			`X = np.random.uniform(-3.,3.,(20,1))`
			`Y = np.sin(X) + np.random.randn(20,1)*0.05`

			`Note that the observations Y include some noise.`

			`The first step is to define the covariance kernel we want to use for the model. We choose here a kernel based on Gaussian kernel (i.e. rbf or square exponential) plus some white noise::`

			`Gaussian = GPy.kern.rbf(D=1)`
			`noise = GPy.kern.white(D=1)`
			`kernel = Gaussian + noise`

improved tutorial for GP_regression 2013-01-31 10:44:13 +00:00			The parameter ``D`` stands for the dimension of the input space. Note that many other kernels are implemented such as:
linear kernel now has an ARD flag 2013-01-28 16:21:32 +00:00
			* linear (``GPy.kern.linear``)
			* exponential kernel (``GPy.kern.exponential``)
			* Matern 3/2 (``GPy.kern.Matern32``)
			* Matern 5/2 (``GPy.kern.Matern52``)
			* spline (``GPy.kern.spline``)
			`* and many others...`

			`The inputs required for building the model are the observations and the kernel::`

			`m = GPy.models.GP_regression(X,Y,kernel)`

improved tutorial for GP_regression 2013-01-31 10:44:13 +00:00			The functions ``print`` and ``plot`` give an insight of the model we have just build. The code::
linear kernel now has an ARD flag 2013-01-28 16:21:32 +00:00
			`print m`
			`m.plot()`

improved tutorial for GP_regression 2013-01-31 10:44:13 +00:00			`gives the following output: ::`

			`Marginal log-likelihood: -2.281e+01`
			`Name \| Value \| Constraints \| Ties \| Prior`
			`-----------------------------------------------------------------`
			`rbf_variance \| 1.0000 \| \| \|`
			`rbf_lengthscale \| 1.0000 \| \| \|`
			`white_variance \| 1.0000 \| \| \|`

			`.. figure:: Figures/tuto_GP_regression_m1.png`
			`:align: center`
			`:height: 350px`

			`GP regression model before optimization of the parameters. The shaded region corresponds to 95% confidence intervals (ie +/- 2 standard deviation).`

rst files from documentation 2013-01-31 10:47:24 +00:00			`The default values of the kernel parameters may not be relevant for the current data (for example, the confidence intervals seems too wide on the previous figure). A common approach is to find the values of the parameters that maximize the likelihood of the data. There are two steps for doing that with GPy:`
linear kernel now has an ARD flag 2013-01-28 16:21:32 +00:00
			`* Constrain the parameters of the kernel to ensure the kernel will always be a valid covariance structure (For example, we don\'t want some variances to be negative!).`
			`* Run the optimization`

			`There are various ways to constrain the parameters of the kernel. The most basic is to constrain all the parameters to be positive::`

			`m.constrain_positive('')`

			but it is also possible to set a range on to constrain one parameter to be fixed. The parameter of ``m.constrain_positive`` is a regular expression that matches the name of the parameters to be constrained (as seen in ``print m``). For example, if we want the variance to be positive, the lengthscale to be in [1,10] and the noise variance to be fixed we can write::

improved tutorial for GP_regression 2013-01-31 10:44:13 +00:00			`m.unconstrain('') # Required to remove the previous constrains`
linear kernel now has an ARD flag 2013-01-28 16:21:32 +00:00			`m.constrain_positive('rbf_variance')`
			`m.constrain_bounded('lengthscale',1.,10. )`
			`m.constrain_fixed('white',0.0025)`

improved tutorial for GP_regression 2013-01-31 10:44:13 +00:00			`Once the constrains have been imposed, the model can be optimized::`
linear kernel now has an ARD flag 2013-01-28 16:21:32 +00:00
			`m.optimize()`

			`If we want to perform some restarts to try to improve the result of the optimization, we can use the optimize_restart function::`

			`m.optimize_restarts(Nrestarts = 10)`
improved tutorial for GP_regression 2013-01-31 10:44:13 +00:00
			Once again, we can use ``print(m)`` and ``m.plot()`` to look at the resulting model resulting model::

			`Marginal log-likelihood: 2.001e+01`
			`Name \| Value \| Constraints \| Ties \| Prior`
			`-----------------------------------------------------------------`
			`rbf_variance \| 0.8033 \| (+ve) \| \|`
			`rbf_lengthscale \| 1.8033 \| (1.0, 10.0) \| \|`
			`white_variance \| 0.0025 \| Fixed \| \|`

			`.. figure:: Figures/tuto_GP_regression_m2.png`
			`:align: center`
			`:height: 350px`

			`GP regression model after optimization of the parameters.`

linear kernel now has an ARD flag 2013-01-28 16:21:32 +00:00
			`2 dimensional example`
			`=====================`

			`Here is a 2 dimensional example::`

			`import pylab as pb`
			`pb.ion()`
			`import numpy as np`
			`import GPy`

			`# sample inputs and outputs`
			`X = np.random.uniform(-3.,3.,(50,2))`
			`Y = np.sin(X[:,0:1]) * np.sin(X[:,1:2])+np.random.randn(50,1)*0.05`

			`# define kernel`
			`ker = GPy.kern.Matern52(2,ARD=True) + GPy.kern.white(2)`

			`# create simple GP model`
			`m = GPy.models.GP_regression(X,Y,ker)`

			`# contrain all parameters to be positive`
			`m.constrain_positive('')`

			`# optimize and plot`
			`pb.figure()`
			`m.optimize('tnc', max_f_eval = 1000)`

			`m.plot()`
			`print(m)`

improved tutorial for GP_regression 2013-01-31 10:44:13 +00:00			The flag ``ARD=True`` in the definition of the Matern kernel specifies that we want one lengthscale parameter per dimension (ie the GP is not isotropic). The output of the last 2 lines is::

			`Marginal log-likelihood: 2.893e+01`
			`Name \| Value \| Constraints \| Ties \| Prior`
			`-------------------------------------------------------------------------`
			`Mat52_ARD_variance \| 0.4094 \| (+ve) \| \|`
			`Mat52_ARD_lengthscale_0 \| 2.1060 \| (+ve) \| \|`
			`Mat52_ARD_lengthscale_1 \| 2.0546 \| (+ve) \| \|`
			`white_variance \| 0.0012 \| (+ve) \| \|`

			`.. figure:: Figures/tuto_GP_regression_m3.png`
			`:align: center`
			`:height: 350px`

			`Contour plot of the best predictor (posterior mean).`