start of psi2 crossterms

2026-05-18 13:55:14 +02:00 · 2013-02-26 14:49:00 +00:00 · 2013-02-26 14:49:00 +00:00 · 4d79c3c97d
commit 4d79c3c97d
parent d9b03044ac
2 changed files with 87 additions and 10 deletions
--- a/GPy/kern/kern.py
+++ b/GPy/kern/kern.py
@ -378,6 +378,26 @@ class kern(parameterised):
        slices1, slices2 = self._process_slices(slices1,slices2)
        [p.psi2(Z[s2,i_s],mu[s1,i_s],S[s1,i_s],target[s1,s2,s2]) for p,i_s,s1,s2 in zip(self.parts,self.input_slices,slices1,slices2)]

+        #compute the "cross" terms
+        for p1, p2 in itertools.combinations(self.parts,2):
+            #white doesn;t compine with anything
+            if p1.name=='white' or p2.name=='white':
+                pass
+            #rbf X bias
+            elif p1.name=='bias' and p2.name=='rbf':
+                target += p1.variance*(p2._psi1[:,:,None]+p2._psi1[:,None,:])
+            elif p2.name=='bias' and p1.name=='rbf':
+                target += p2.variance*(p1._psi1[:,:,None]+p1._psi1[:,None,:])
+            #rbf X linear
+            elif p1.name=='linear' and p2.name=='rbf':
+                raise NotImplementedError #TODO
+            elif p2.name=='linear' and p1.name=='rbf':
+                raise NotImplementedError #TODO
+            else:
+                raise NotImplementedError, "psi2 cannot be computed for this kernel"
+
+
+


        # "crossterms". Here we are recomputing psi1 for white (we don't need to), but it's
@ -402,6 +422,31 @@ class kern(parameterised):
        target = np.zeros(self.Nparam)
        [p.dpsi2_dtheta(partial[s1,s2,s2],Z[s2,i_s],mu[s1,i_s],S[s1,i_s],target[ps]) for p,i_s,s1,s2,ps in zip(self.parts,self.input_slices,slices1,slices2,self.param_slices)]

+        #compute the "cross" terms
+        #TODO: better looping
+        for i1, i2 in itertools.combinations(range(len(self.parts)),2):
+            p1,p2 = self.parts[i1], self.parts[i2]
+            ipsl1, ipsl2 = self.input_slices[i1], self.input_slices[i2]
+            ps1, ps2 = self.param_slices[i1], self.param_slices[i2]
+
+            #white doesn;t compine with anything
+            if p1.name=='white' or p2.name=='white':
+                pass
+            #rbf X bias
+            elif p1.name=='bias' and p2.name=='rbf':
+                p2.dpsi1_dtheta(partial.sum(1)*p1.variance,Z,mu,S,target[ps2])
+                p1.dpsi1_dtheta(partial.sum(1)*p2._psi1,Z,mu,S,target[ps1])
+            elif p2.name=='bias' and p1.name=='rbf':
+                p1.dpsi1_dtheta(partial.sum(1)*p2.variance,Z,mu,S,target[ps1])
+                p2.dpsi1_dtheta(partial.sum(1)*p1._psi1,Z,mu,S,target[ps2])
+            #rbf X linear
+            elif p1.name=='linear' and p2.name=='rbf':
+                raise NotImplementedError #TODO
+            elif p2.name=='linear' and p1.name=='rbf':
+                raise NotImplementedError #TODO
+            else:
+                raise NotImplementedError, "psi2 cannot be computed for this kernel"
+
        # # "crossterms"
        # # 1. get all the psi1 statistics
        # psi1_matrices = [np.zeros((mu.shape[0], Z.shape[0])) for p in self.parts]
@ -429,6 +474,26 @@ class kern(parameterised):
        target = np.zeros_like(Z)
        [p.dpsi2_dZ(partial[s1,s2,s2],Z[s2,i_s],mu[s1,i_s],S[s1,i_s],target[s2,i_s]) for p,i_s,s1,s2 in zip(self.parts,self.input_slices,slices1,slices2)]

+        #compute the "cross" terms
+        #TODO: slices (need to iterate around the input slices also...)
+        for p1, p2 in itertools.combinations(self.parts,2):
+            #white doesn;t compine with anything
+            if p1.name=='white' or p2.name=='white':
+                pass
+            #rbf X bias
+            elif p1.name=='bias' and p2.name=='rbf':
+                target += p2.dpsi1_dX(partial.sum(1)*p1.variance,Z,mu,S)
+            elif p2.name=='bias' and p1.name=='rbf':
+                target += p1.dpsi1_dZ(partial.sum(2)*p2.variance,Z,mu,S)
+            #rbf X linear
+            elif p1.name=='linear' and p2.name=='rbf':
+                raise NotImplementedError #TODO
+            elif p2.name=='linear' and p1.name=='rbf':
+                raise NotImplementedError #TODO
+            else:
+                raise NotImplementedError, "psi2 cannot be computed for this kernel"
+
+
        return target

    def dpsi2_dmuS(self,partial,Z,mu,S,slices1=None,slices2=None):
--- a/GPy/notes.txt
+++ b/GPy/notes.txt
@ -1,12 +1,11 @@
-Fails in weird ways if you pass a integer as the input instead of a double to the kernel.
-
-The Matern kernels (at least the 52) still is working in the ARD manner which means it wouldn't run for very large input dimension. Needs to be fixed to match the RBF.
-
 Implementing new covariances is too complicated at the moment. We need a barebones example of what to implement and where. Commenting in the covariance matrices needs to be improved. It's not clear to a user what all the psi parts are for. Maybe we need a cut down and simplified example to help with this (perhaps a cut down version of the RBF?). And then we should provide a simple list of what you need to do to get a new kernel going.
+TODO

 Missing kernels: polynomial, rational quadratic.
+TODO

 Kernel implementations are far to obscure. Need to be easily readable for a first time user.
+Duplicate. 

 Need an implementation of scaled conjugate gradients for the optimizers.

@ -15,21 +14,30 @@ Need an implementation of gradient descent for the optimizers (works well with G
 Need Carl Rasmussen's permission to add his conjugate gradients algorithm. In fact, we can just provide a hook for it, and post a separate python implementation of his algorithm.

 Change get_param and set_param to get_params and set_params
+FIXED

 Get constrain param by default inside model creation.

-Randomize doesn't seem to cover a wide enough range for restarts ... try it for a model where inputs are widely spaced apart and length scale is too short. Sampling from N(0,1) is too conservative. Dangerous for people who naively use restarts. Since we have the model we could maybe come up with some sensible heuristics for setting these things. Maybe we should also consider having '.initialize()'. If we can't do this well we should disable the restart method.
-
-
-Tolerances for optimizers, do we need to introduce some standardization? At the moment does each have its own defaults?

 Do all optimizers work only in terms of function evaluations? Do we need to check for one that uses iterations?
+Upstream: Waiting for the new scipy, where the optimisers have been unified. 
+
+Tolerances for optimizers, do we need to introduce some standardization? At the moment does each have its own defaults?
+Upstream, as above

 Change Youter to YYT (Youter doesn't mean anything for matrices).
+FIXED

 Bug when running classification.crescent_data()

 A dictionary for parameter storage? So we can go through names easily?
+Wontfix. Dictionaries bring up all kinds of problems since they're not ordered. 
+
+When computing kernel.K for kernels like rbf, you can't compute a version with rbf.K(X) you have to do rbf.K(X, X)
+FIXED
+
+the predict method for GP_regression returns a covariance matrix which is a bad idea as this takes a lot to compute, it's also confusing for first time users. Should only be returned if the user explicitly requests it. 
+FIXED

 A flag on covariance functions that indicates when they are not associated with an underlying function (like white noise or a coregionalization matrix).

@ -37,6 +45,10 @@ Diagonal noise covariance function

 Long term: automatic Lagrange multiplier calculation for optimizers: constrain two parameters in an unusual way and the model automatically does the Lagrangian. Also augment the parameters with new ones, so define data variance to be white noise plus RBF variance and optimize over that and signal to noise ratio ... for example constrain the sum of variances to equal the known variance of the data.

-When computing kernel.K for kernels like rbf, you can't compute a version with rbf.K(X) you have to do rbf.K(X, X)
+Randomize doesn't seem to cover a wide enough range for restarts ... try it for a model where inputs are widely spaced apart and length scale is too short. Sampling from N(0,1) is too conservative. Dangerous for people who naively use restarts. Since we have the model we could maybe come up with some sensible heuristics for setting these things. Maybe we should also consider having '.initialize()'. If we can't do this well we should disable the restart method.

-the predict method for GP_regression returns a covariance matrix which is a bad idea as this takes a lot to compute, it's also confusing for first time users. Should only be returned if the user explicitly requests it. 
+Fails in weird ways if you pass a integer as the input instead of a double to the kernel.
+FIXED
+
+The Matern kernels (at least the 52) still is working in the ARD manner which means it wouldn't run for very large input dimension. Needs to be fixed to match the RBF.
+FIXED