GPy/doc/source/tuto_creating_new_kernels.rst
Eric Kalosa-Kenyon fa909768bd
v1.10.0 (#908)
* Update self.num_data in GP when X is updated

* Update appveyor.yml

* Update setup.cfg

* Stop using legacy bdist_wininst

* fix: reorder brackets to avoid an n^2 array

* Minor fix to multioutput regression example, to clarify code + typo.

* added missing import

* corrected typo in function name

* fixed docstring and added more explanation

* changed ordering of explanation to get to the point fast and provide additional details after

* self.num_data and self.input_dim are set dynamically in class GP() after the shape of X. In MRD, the user-specific values are passed around until X is defined.

* fixed technical description of gradients_X()

* brushed up wording

* fix normalizer

* fix ImportError in likelihood.py

in function log_predictive_density_sampling

* Update setup.py

bump min require version of scipy to 1.3.0

* Add cython into installation requirement

* Coregionalized regression bugfix (#824)

* route default arg W_rank correctly (Addresses #823)

* Drop Python 2.7 support (fix #833)

* travis, appveyor: Add Python 3.8 build

* README: Fix scipy version number

* setup.py: Install scipy < 1.5.0 when using Python 3.5

* plotting_tests.py: Use os.makedirs instead of matplotlib.cbook.mkdirs (fix #844)

* Use super().__init__ consistently, instead of sometimes calling base class __init__ directly

* README.md: Source formatting, one badge per line

* README.md: Remove broken landscape badge (fix #831)

* README.md: Badges for devel and deploy (fix #830)

* ignore itermediary sphinx restructured text

* ignore vs code project settings file

* add yml config for readthedocs

* correct path

* drop epub and pdf builds (as per main GPy)

* typo

* headings and structure

* update copyright

* restructuring and smartening

* remove dead links

* reorder package docs

* rst "markup"

* change rst syntax

* makes sense for core to go first

* add placeholder

* initial core docs, class diagram

* lower level detail

* higher res diagrams

* layout changes for diagrams

resolve conflict

* better syntax

* redunant block

* introduction

* inheritance diagrams

* more on models

* kernel docs to kern.src

* moved doc back from kern.src to kern

* kern not kern.src in index

* better kernel description

* likelihoods

* placeholder

* add plotting to docs index

* summarise plotting

* clarification

* neater contents

* architecture diagram

* using pods

* build with dot

* more on examples

* introduction for utils package

* compromise formatting for sphinx

* correct likelihod definition

* parameterization of priors

* latent function inference intro and format

* maint: Remove tabs (and some trailing spaces)

* dpgplvm.py: Wrap long line + remove tabs

* dpgplvm.py: Fix typo in the header

* maint: Wrap very long lines (> 450 chars)

* maint: Wrap very long lines (> 400 chars)

* Add the link to the api doc on the readme page.

* remove deprecated parameter

* Update README.md

* new: Added to_dict() method to Ornstein-Uhlenbeck (OU) kernel

* fix: minor typos in README !minor

* added python 3.9 build following 4aa2ea9f5e to address https://github.com/SheffieldML/GPy/issues/881

* updated cython-generated c files for python 3.9 via `pyenv virtualenv 3.9.1 gpy391 && pyenv activate gpy391 && python setup.py build --force

* updated osx to macOS 10.15.7, JDK to 14.0.2, and XCode to Xcode 12.2 (#904)

The CI  was broken. This commit fixes the CI. The root cause is reported in more detail in issue #905.

In short, the default macOS version (10.13, see the TravisCI docs) used in TravisCI isn't supported by brew which caused the brew install pandoc in the download_miniconda.sh pre-install script to hang and time out the build. It failed even on inert PRs (adding a line to README, e.g.). Now, with the updated macOS version (from 10.13 to 10.15), brew is supported and the brew install pandoc command succeeds and allows the remainder of the CI build and test sequence to succeed.

* incremented version

Co-authored-by: Masha Naslidnyk 🦉 <naslidny@amazon.co.uk>
Co-authored-by: Zhenwen Dai <zhenwendai@users.noreply.github.com>
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Co-authored-by: Mark McLeod <mark.mcleod@mindfoundry.ai>
Co-authored-by: Sigrid Passano Hellan <sighellan@gmail.com>
Co-authored-by: Antoine Blanchard <antoine@sand-lab-gpu.mit.edu>
Co-authored-by: kae_mihara <rukamihara@outlook.com>
Co-authored-by: lagph <49130858+lagph@users.noreply.github.com>
Co-authored-by: Julien Bect <julien.bect@centralesupelec.fr>
Co-authored-by: Neil Lawrence <ndl21@cam.ac.uk>
Co-authored-by: bobturneruk <bob.turner.uk@gmail.com>
Co-authored-by: bobturneruk <r.d.turner@sheffield.ac.uk>
Co-authored-by: gehbiszumeis <16896724+gehbiszumeis@users.noreply.github.com>
2021-05-11 20:12:38 -07:00

247 lines
8.9 KiB
ReStructuredText

********************
Creating new kernels
********************
We will see in this tutorial how to create new kernels in GPy. We will also give details on how to implement each function of the kernel and illustrate with a running example: the rational quadratic kernel.
Structure of a kernel in GPy
============================
In GPy a kernel object is made of a list of kernpart objects, which correspond to symetric positive definite functions. More precisely, the kernel should be understood as the sum of the kernparts. In order to implement a new covariance, the following steps must be followed
1. implement the new covariance as a :py:class:`GPy.kern.src.kern.Kern` object
2. update the :py:mod:`GPy.kern.src` file
Theses three steps are detailed below.
Implementing a Kern object
==============================
We advise the reader to start with copy-pasting an existing kernel and
to modify the new file. We will now give a description of the various
functions that can be found in a Kern object, some of which are
mandatory for the new kernel to work.
Header
~~~~~~
The header is similar to all kernels: ::
from .kern import Kern
import numpy as np
class RationalQuadratic(Kern):
:py:func:`GPy.kern.src.kern.Kern.__init__` ``(self, input_dim, param1, param2, *args)``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The implementation of this function in mandatory.
For all Kerns the first parameter ``input_dim`` corresponds to the
dimension of the input space, and the following parameters stand for
the parameterization of the kernel.
You have to call ``super(<class_name>, self).__init__(input_dim, active_dims,
name)`` to make sure the input dimension (and possible dimension restrictions using active_dims) and name of the kernel are
stored in the right place. These attributes are available as
``self.input_dim`` and ``self.name`` at runtime. Parameterization is
done by adding :py:class:`~GPy.core.parameterization.param.Param`
objects to ``self`` and use them as normal numpy ``array-like`` s in
your code. The parameters have to be added by calling
:py:func:`~GPy.core.parameterization.parameterized.Parameterized.link_parameters`
``(*parameters)`` with the
:py:class:`~GPy.core.parameterization.param.Param` objects as
arguments::
from .core.parameterization import Param
def __init__(self,input_dim,variance=1.,lengthscale=1.,power=1.,active_dims=None):
super(RationalQuadratic, self).__init__(input_dim, active_dims, 'rat_quad')
assert input_dim == 1, "For this kernel we assume input_dim=1"
self.variance = Param('variance', variance)
self.lengthscale = Param('lengtscale', lengthscale)
self.power = Param('power', power)
self.link_parameters(self.variance, self.lengthscale, self.power)
From now on you can use the parameters ``self.variance,
self.lengthscale, self.power`` as normal numpy ``array-like`` s in your
code. Updates from the optimization routine will be done
automatically.
:py:func:`~GPy.core.parameterization.parameter_core.Parameterizable.parameters_changed` ``(self)``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The implementation of this function is optional.
This functions is called as a callback upon each successful change to the parameters. If
one optimization step was successfull and the parameters (linked by
:py:func:`~GPy.core.parameterization.parameterized.Parameterized.link_parameters`
``(*parameters)``) are changed, this callback function will be called. This callback may be used to
update precomputations for the kernel. Do not implement the
gradient updates here, as gradient updates are performed by the model enclosing
the kernel. In this example, we issue a no-op::
def parameters_changed(self):
# nothing todo here
pass
:py:func:`~GPy.kern.src.kern.Kern.K` ``(self,X,X2)``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The implementation of this function in mandatory.
This function is used to compute the covariance matrix associated with
the inputs X, X2 (np.arrays with arbitrary number of lines,
:math:`n_1`, :math:`n_2`, corresponding to the number of samples over which to calculate covariance)
and ``self.input_dim`` columns. ::
def K(self,X,X2):
if X2 is None: X2 = X
dist2 = np.square((X-X2.T)/self.lengthscale)
return self.variance*(1 + dist2/2.)**(-self.power)
:py:func:`~GPy.kern.src.kern.Kern.Kdiag` ``(self,X)``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The implementation of this function is mandatory.
This function is similar to ``K`` but it computes only the values of
the kernel on the diagonal. Thus, ``target`` is a 1-dimensional
np.array of length :math:`n \times 1`. ::
def Kdiag(self,X):
return self.variance*np.ones(X.shape[0])
:py:func:`~GPy.kern.src.kern.Kern.update_gradients_full` ``(self, dL_dK, X, X2=None)``
~~~~~~~~~~~~~~~~~~~
This function is required for the optimization of the parameters.
Computes the gradients and sets them on the parameters of this model.
For example, if the kernel is parameterized by
:math:`\sigma^2, \theta`, then
.. math::
\frac{\partial L}{\partial\sigma^2}
= \frac{\partial L}{\partial K} \frac{\partial K}{\partial\sigma^2}
is added to the gradient of :math:`\sigma^2`: ``self.variance.gradient = <gradient>``
and
.. math::
\frac{\partial L}{\partial\theta}
= \frac{\partial L}{\partial K} \frac{\partial K}{\partial\theta}
to :math:`\theta`. ::
def update_gradients_full(self, dL_dK, X, X2):
if X2 is None: X2 = X
dist2 = np.square((X-X2.T)/self.lengthscale)
dvar = (1 + dist2/2.)**(-self.power)
dl = self.power * self.variance * dist2 * self.lengthscale**(-3) * (1 + dist2/2./self.power)**(-self.power-1)
dp = - self.variance * np.log(1 + dist2/2.) * (1 + dist2/2.)**(-self.power)
self.variance.gradient = np.sum(dvar*dL_dK)
self.lengthscale.gradient = np.sum(dl*dL_dK)
self.power.gradient = np.sum(dp*dL_dK)
:py:func:`~GPy.kern.src.kern.Kern.update_gradients_diag` ``(self,dL_dKdiag,X,target)``
~~~~~~~~~~~~~~~~~~~
This function is required for BGPLVM, sparse models and uncertain inputs.
As previously, target is an ``self.num_params`` array and
.. math::
\frac{\partial L}{\partial Kdiag}
\frac{\partial Kdiag}{\partial param}
is set to each ``param``. ::
def update_gradients_diag(self, dL_dKdiag, X):
self.variance.gradient = np.sum(dL_dKdiag)
# here self.lengthscale and self.power have no influence on Kdiag so target[1:] are unchanged
:py:func:`~GPy.kern.src.kern.Kern.gradients_X` ``(self,dL_dK, X, X2)``
~~~~~~~~~~~~~~~~~~~
This function is required for GPLVM, BGPLVM, sparse models and uncertain inputs.
Computes the derivative of the likelihood with respect to the inputs
``X`` (a :math:`n \times q` np.array), that is, it calculates the quantity:
.. math::
\frac{\partial L}{\partial K} \frac{\partial K}{\partial X}
The partial derivative matrix is, in this case, comes out as an :math:`n \times q` np.array. ::
def gradients_X(self,dL_dK,X,X2):
"""derivative of the likelihood with respect to X, calculated using dL_dK*dK_dX"""
if X2 is None: X2 = X
dist2 = np.square((X-X2.T)/self.lengthscale)
dK_dX = -self.variance*self.power * (X-X2.T)/self.lengthscale**2 * (1 + dist2/2./self.lengthscale)**(-self.power-1)
return np.sum(dL_dK*dK_dX,1)[:,None]
Were the number of parameters to be larger than 1 or the number of dimensions likewise any larger
than 1, the calculated partial derivitive would be a 3- or 4-tensor.
:py:func:`~GPy.kern.src.kern.Kern.gradients_X_diag` ``(self,dL_dKdiag,X)``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This function is required for BGPLVM, sparse models and uncertain
inputs. As for ``dKdiag_dtheta``,
.. math::
\frac{\partial L}{\partial Kdiag} \frac{\partial Kdiag}{\partial X}
is added to each element of target. ::
def gradients_X_diag(self,dL_dKdiag,X):
# no diagonal gradients
pass
**Second order derivatives**
~~~~~~~~~~~~~~~~~~~~~~~~
These functions are required for the magnification factor and are the same as the first order gradients for X, but
as the second order derivatives:
.. math:: \frac{\partial^2 K}{\partial X\partial X2}
- :py:func:`GPy.kern.src.kern.gradients_XX` ``(self,dL_dK, X, X2)``
- :py:func:`GPy.kern.src.kern.gradients_XX_diag` ``(self,dL_dKdiag, X)``
**Psi statistics**
~~~~~~~~~~~~~
The psi statistics and their derivatives are required for BGPLVM and
GPS with uncertain inputs only, the expressions are as follows
- `psi0(self, Z, variational_posterior)`
.. math::
\psi_0 = \sum_{i=0}^{n}E_{q(X)}[k(X_i, X_i)]
- `psi1(self, Z, variational_posterior)`::
.. math::
\psi_1^{n,m} = E_{q(X)}[k(X_n, Z_m)]
- `psi2(self, Z, variational_posterior)`
.. math::
\psi_2^{m,m'} = \sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m'})]
- `psi2n(self, Z, variational_posterior)`
.. math::
\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]