From c122c268431407f08902c3b1f947d58c364b7f09 Mon Sep 17 00:00:00 2001
From: Eric Kalosa-Kenyon <ekalosak@gmail.com>
Date: Wed, 8 Jan 2020 15:24:59 -0800
Subject: [PATCH 1/6] added missing import

---
 doc/source/tuto_creating_new_kernels.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/doc/source/tuto_creating_new_kernels.rst b/doc/source/tuto_creating_new_kernels.rst
index 9db6adc4..20107026 100644
--- a/doc/source/tuto_creating_new_kernels.rst
+++ b/doc/source/tuto_creating_new_kernels.rst
@@ -53,6 +53,8 @@ your code. The parameters have to be added by calling
 :py:class:`~GPy.core.parameterization.param.Param` objects as
 arguments::
 
+    from .core.parameterization import Param
+
     def __init__(self,input_dim,variance=1.,lengthscale=1.,power=1.,active_dims=None):
         super(RationalQuadratic, self).__init__(input_dim, active_dims, 'rat_quad')
 	assert input_dim == 1, "For this kernel we assume input_dim=1"

From 3c753bb1a07ef7038b6916efad52c01846dda616 Mon Sep 17 00:00:00 2001
From: Eric Kalosa-Kenyon <ekalosak@gmail.com>
Date: Wed, 8 Jan 2020 15:32:46 -0800
Subject: [PATCH 2/6] corrected typo in function name

---
 doc/source/tuto_creating_new_kernels.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/source/tuto_creating_new_kernels.rst b/doc/source/tuto_creating_new_kernels.rst
index 20107026..84077f72 100644
--- a/doc/source/tuto_creating_new_kernels.rst
+++ b/doc/source/tuto_creating_new_kernels.rst
@@ -61,7 +61,7 @@ arguments::
         self.variance = Param('variance', variance)
         self.lengthscale = Param('lengtscale', lengthscale)
         self.power = Param('power', power)
-	self.add_parameters(self.variance, self.lengthscale, self.power)
+	self.link_parameters(self.variance, self.lengthscale, self.power)
 
 From now on you can use the parameters ``self.variance,
 self.lengthscale, self.power`` as normal numpy ``array-like`` s in your

From c58104c943be19f57d5bcff2c1520ae3f0cc968a Mon Sep 17 00:00:00 2001
From: Eric Kalosa-Kenyon <ekalosak@gmail.com>
Date: Wed, 8 Jan 2020 16:09:59 -0800
Subject: [PATCH 3/6] fixed docstring and added more explanation

---
 doc/source/tuto_creating_new_kernels.rst | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/doc/source/tuto_creating_new_kernels.rst b/doc/source/tuto_creating_new_kernels.rst
index 84077f72..426ef95e 100644
--- a/doc/source/tuto_creating_new_kernels.rst
+++ b/doc/source/tuto_creating_new_kernels.rst
@@ -173,16 +173,23 @@ is set to each ``param``. ::
 This function is required for GPLVM, BGPLVM, sparse models and uncertain inputs.
 
 Computes the derivative of the likelihood with respect to the inputs
-``X`` (a :math:`n \times q` np.array). The result is returned by the
-function which is a :math:`n \times q` np.array. ::
+``X`` (a :math:`n \times q` np.array), that is, it calculates the quantity:
+
+.. math::
+
+   \frac{\partial L}{\partial K} \frac{\partial K}{\partial X}
+
+The partial derivative matrix is, in this case, comes out as an :math:`n \times q` np.array.
+Were the number of parameters to be larger than 1 or the number of dimensions likewise any larger
+than 1, the calculated partial derivitive would be a 3- or 4-tensor.  ::
 
     def gradients_X(self,dL_dK,X,X2):
-        """derivative of the covariance matrix with respect to X."""
+        """derivative of the likelihood matrix with respect to X, calculated using dK_dX"""
         if X2 is None: X2 = X
         dist2 = np.square((X-X2.T)/self.lengthscale)
 
-        dX = -self.variance*self.power * (X-X2.T)/self.lengthscale**2 *  (1 + dist2/2./self.lengthscale)**(-self.power-1)
-        return np.sum(dL_dK*dX,1)[:,None]
+        dK_dX = -self.variance*self.power * (X-X2.T)/self.lengthscale**2 *  (1 + dist2/2./self.lengthscale)**(-self.power-1)
+        return np.sum(dL_dK*dK_dX,1)[:,None]
 
 :py:func:`~GPy.kern.src.kern.Kern.gradients_X_diag` ``(self,dL_dKdiag,X)``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

From 585d9cc92be550d1ff309ffa3e22adb97172c52a Mon Sep 17 00:00:00 2001
From: Eric Kalosa-Kenyon <ekalosak@gmail.com>
Date: Wed, 8 Jan 2020 16:10:58 -0800
Subject: [PATCH 4/6] changed ordering of explanation to get to the point fast
 and provide additional details after

---
 doc/source/tuto_creating_new_kernels.rst | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/doc/source/tuto_creating_new_kernels.rst b/doc/source/tuto_creating_new_kernels.rst
index 426ef95e..05e32af0 100644
--- a/doc/source/tuto_creating_new_kernels.rst
+++ b/doc/source/tuto_creating_new_kernels.rst
@@ -179,9 +179,7 @@ Computes the derivative of the likelihood with respect to the inputs
 
    \frac{\partial L}{\partial K} \frac{\partial K}{\partial X}
 
-The partial derivative matrix is, in this case, comes out as an :math:`n \times q` np.array.
-Were the number of parameters to be larger than 1 or the number of dimensions likewise any larger
-than 1, the calculated partial derivitive would be a 3- or 4-tensor.  ::
+The partial derivative matrix is, in this case, comes out as an :math:`n \times q` np.array.  ::
 
     def gradients_X(self,dL_dK,X,X2):
         """derivative of the likelihood matrix with respect to X, calculated using dK_dX"""
@@ -191,6 +189,9 @@ than 1, the calculated partial derivitive would be a 3- or 4-tensor.  ::
         dK_dX = -self.variance*self.power * (X-X2.T)/self.lengthscale**2 *  (1 + dist2/2./self.lengthscale)**(-self.power-1)
         return np.sum(dL_dK*dK_dX,1)[:,None]
 
+Were the number of parameters to be larger than 1 or the number of dimensions likewise any larger
+than 1, the calculated partial derivitive would be a 3- or 4-tensor.
+
 :py:func:`~GPy.kern.src.kern.Kern.gradients_X_diag` ``(self,dL_dKdiag,X)``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     

From 3c80c6e30f5db8c05547f0ffd186e257074103a9 Mon Sep 17 00:00:00 2001
From: Eric Kalosa-Kenyon <ekalosak@gmail.com>
Date: Tue, 14 Jan 2020 11:52:03 -0800
Subject: [PATCH 5/6] fixed technical description of gradients_X()

---
 doc/source/tuto_creating_new_kernels.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/source/tuto_creating_new_kernels.rst b/doc/source/tuto_creating_new_kernels.rst
index 05e32af0..386c2991 100644
--- a/doc/source/tuto_creating_new_kernels.rst
+++ b/doc/source/tuto_creating_new_kernels.rst
@@ -182,7 +182,7 @@ Computes the derivative of the likelihood with respect to the inputs
 The partial derivative matrix is, in this case, comes out as an :math:`n \times q` np.array.  ::
 
     def gradients_X(self,dL_dK,X,X2):
-        """derivative of the likelihood matrix with respect to X, calculated using dK_dX"""
+        """derivative of the likelihood with respect to X, calculated using dL_dK*dK_dX"""
         if X2 is None: X2 = X
         dist2 = np.square((X-X2.T)/self.lengthscale)
 

From 1d9bbaf7512695c16f09466a4679a5d4de3ff585 Mon Sep 17 00:00:00 2001
From: Eric Kalosa-Kenyon <ekalosak@gmail.com>
Date: Tue, 14 Jan 2020 11:57:41 -0800
Subject: [PATCH 6/6] brushed up wording

---
 doc/source/tuto_creating_new_kernels.rst | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/doc/source/tuto_creating_new_kernels.rst b/doc/source/tuto_creating_new_kernels.rst
index 386c2991..ec46aedc 100644
--- a/doc/source/tuto_creating_new_kernels.rst
+++ b/doc/source/tuto_creating_new_kernels.rst
@@ -73,13 +73,13 @@ automatically.
 
 The implementation of this function is optional.
 
-This functions deals as a callback for each optimization iteration. If
-one optimization step was successfull and the parameters (added by
+This functions is called as a callback upon each successful change to the parameters. If
+one optimization step was successfull and the parameters (linked by
 :py:func:`~GPy.core.parameterization.parameterized.Parameterized.link_parameters`
-``(*parameters)``) this callback function will be called to be able to
-update any precomputations for the kernel. Do not implement the
-gradient updates here, as those are being done by the model enclosing
-the kernel::
+``(*parameters)``) are changed, this callback function will be called. This callback may be used to
+update precomputations for the kernel. Do not implement the
+gradient updates here, as gradient updates are performed by the model enclosing
+the kernel. In this example, we issue a no-op::
 
     def parameters_changed(self):
         # nothing todo here
@@ -92,8 +92,9 @@ the kernel::
 The implementation of this function in mandatory.
 
 This function is used to compute the covariance matrix associated with
-the inputs X, X2 (np.arrays with arbitrary number of line (say
-:math:`n_1`, :math:`n_2`) and ``self.input_dim`` columns). ::
+the inputs X, X2 (np.arrays with arbitrary number of lines,
+:math:`n_1`, :math:`n_2`, corresponding to the number of samples over which to calculate covariance)
+and ``self.input_dim`` columns. ::
 
     def K(self,X,X2):
         if X2 is None: X2 = X