가장 쉬운 가우시안 프로세스 회귀, 분류 [Gaussian process] by 바죠

가장 쉬운 가우시안 프로세스(Gaussian process) 회귀, 분류

가우시안 프로세스(Gaussian process)는 회귀, 분류 문제를 풀기 위한 지도학습 방법이다.
확률론적 예측을 가능하게 해준다.  
유사성 측정(kernel function, similarity function)을 이용하여 예측을 수행한다. 
Kernel is a way of computing the dot product of two vectors x and y in some (possibly very high dimensional) feature space, which is why kernel functions are sometimes called "generalized dot product".

너무 큰 다차원 공간에서 방법 고유의 효율성을 잃어버릴 수 있다.
훈련된 데이터를 적극적으로 사용한다. 계속해서 훈련 데이터를 추가한다.
훈련하는 데이터의 수가 너무 많지 않을 경우에 사용하는 회귀 방법이다.

이 방법은 아래와 같은 많은 약점들을 가지고 있다. 
첫째, 일반적인 세팅을 만들기가 어렵다. 풀고 있는 문제에 적합한지 알 수 없다. 가정에 문제가 있음.
둘째, 제대로 된 소프트웨어가 없다.
셋째, 순차적 계산만이 가능하다. 병렬 계산이 불가능하다.
넷째, 큰 문제를 풀 때 매우 불리하다.

‣ Fragility and poor default choices. Getting the function model wrong can be catastrophic.
‣ There hasn’t been standard software available. It’s a bit tricky to build such a system from scratch.
‣ Experiments are run sequentially. We want to take advantage of cluster computing.
‣ Limited scalability in dimensions and evaluations. We want to solve big problems.

장점: 데이터가 너무 많지 않을 경우, 데이터의 차원이 크지 않을 경우, 데이터에 노이즈가 포함되어 있을 경우에 적용할 수 있다. 함수 구배를 알 수 없을 때 사용할 수 있다. 


cos(x) 함수를 생각하자.
데이터 포인트의 갯수가 제한적일 때, 예를 들어 4 points.
8 points, 16 points일 경우를 각각 고려할 수 있다.



무한차원의 다중변수 가우시안 분포를 가정한다.
평균 함수, 상호분산 함수를 구축한다.

import sklearn.gaussian_process as gp
#     X_tr <-- training observations [# points, # features]
#     y_tr <-- training labels [# points]
#     X_te <-- test observations [# points, # features]
#     y_te <-- test labels [# points]

kernel = gp.kernels.ConstantKernel(1.0, (1e-1, 1e3)) * gp.kernels.RBF(10.0, (1e-3, 1e3))
model = gp.GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10, alpha=0.1, normalize_y=True)
model.fit(X_tr, y_tr)
params = model.kernel_.get_params()
y_pred, std = model.predict(X_te, return_std=True)
MSE = ((y_pred-y_te)**2).mean()

---------------------------------------------------------------------------------------------------------------------
print(__doc__)
#     Author: Vincent Dubourg <vincent.dubourg@gmail.com>
#     Jake Vanderplas <vanderplas@astro.washington.edu>
#     Jan Hendrik Metzen <jhm@informatik.uni-bremen.de>s
#     License: BSD 3 clause

import numpy as np
from matplotlib import pyplot as plt
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C
np.random.seed(1)
def f(x):
    """The function to predict."""
    return x * np.sin(x)

# -------------------------------------------------------------------------------------------------------------------
#     First the noiseless case
X = np.atleast_2d([1., 3., 5., 6., 7., 8.]).T
#     Observations
y = f(X).ravel()
#     Mesh the input space for evaluations of the real function, the prediction and its MSE
x = np.atleast_2d(np.linspace(0, 10, 1000)).T
#     Instantiate a Gaussian Process model
kernel = C(1.0, (1e-3, 1e3)) * RBF(10, (1e-2, 1e2))
gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9)
#     Fit to data using Maximum Likelihood Estimation of the parameters
gp.fit(X, y)
#     Make the prediction on the meshed x-axis (ask for MSE as well)
y_pred, sigma = gp.predict(x, return_std=True)
#     Plot the function, the prediction and the 95% confidence interval based on the MSE
plt.figure()
plt.plot(x, f(x), 'r:', label=r'$f(x) = x\,\sin(x)$')
plt.plot(X, y, 'r.', markersize=10, label='Observations')
plt.plot(x, y_pred, 'b-', label='Prediction')
plt.fill(np.concatenate([x, x[::-1]]), np.concatenate([y_pred - 1.9600 * sigma,
                        (y_pred + 1.9600 * sigma)[::-1]]), alpha=.5, fc='b', ec='None', label='95% confidence interval')
plt.xlabel('$x$')
plt.ylabel('$f(x)$')
plt.ylim(-10, 20)
plt.legend(loc='upper left')

# -------------------------------------------------------------------------------------------------------------------
#     now the noisy case
X = np.linspace(0.1, 9.9, 20)
X = np.atleast_2d(X).T
#     Observations and noise
y = f(X).ravel()
dy = 0.5 + 1.0 * np.random.random(y.shape)
noise = np.random.normal(0, dy)
y += noise
#     Instantiate a Gaussian Process model
gp = GaussianProcessRegressor(kernel=kernel, alpha=dy ** 2, n_restarts_optimizer=10)
#     Fit to data using Maximum Likelihood Estimation of the parameters
gp.fit(X, y)
#     Make the prediction on the meshed x-axis (ask for MSE as well)
y_pred, sigma = gp.predict(x, return_std=True)
#     Plot the function, the prediction and the 95% confidence interval based on the MSE
plt.figure()
plt.plot(x, f(x), 'r:', label=r'$f(x) = x\,\sin(x)$')
plt.errorbar(X.ravel(), y, dy, fmt='r.', markersize=10, label='Observations')
plt.plot(x, y_pred, 'b-', label='Prediction')
plt.fill(np.concatenate([x, x[::-1]]), np.concatenate([y_pred - 1.9600 * sigma, (y_pred + 1.9600 * sigma)[::-1]]),
         alpha=.5, fc='b', ec='None', label='95% confidence interval')
plt.xlabel('$x$')
plt.ylabel('$f(x)$')
plt.ylim(-10, 20)
plt.legend(loc='upper left')
plt.show()

---------------------------------------------------------------------------------------------------------------------
import numpy as np
from scipy.stats import norm
def expected_improvement(X,X_sample,Y_sample,gpr,xi=0.01):
    '''
    Computes the EI at points X based on existing samples X_sample
    and Y_sample using a Gaussian process surrogate model.
    Args:
        X: Points at which EI shall be computed (m x d).
        X_sample: Sample locations (n x d).
        Y_sample: Sample values (n x 1).
        gpr: A GaussianProcessRegressor fitted to samples.
        xi: Exploitation-exploration trade-off parameter.
    Returns:
        Expected improvements at points X.
    '''
    mu, sigma = gpr.predict(X, return_std=True)
    sigma1=np.zeros_like(sigma)
    sigma1[:]=sigma1[:]+1e-16
    sigma=sigma+sigma1
    mu_sample = gpr.predict(X_sample)
    sigma = sigma.reshape(-1, 1)
#   Needed for noise-based model, otherwise use np.max(Y_sample).
#   See also section 2.4 in [...]
    mu_sample_opt = np.max(mu_sample)
    with np.errstate(divide='warn'):
        imp = mu - mu_sample_opt - xi
        Z = imp / sigma
        ei = imp * norm.cdf(Z) + sigma * norm.pdf(Z)
        ei[sigma == 0.0] = 0.0
    return ei
from scipy.optimize import minimize
def propose_location(acquisition,X_sample,Y_sample,gpr,bounds, n_restarts=25):
    '''
    Proposes the next sampling point by optimizing the acquisition function.
    Args:
        acquisition: Acquisition function.
        X_sample: Sample locations (n x d).
        Y_sample: Sample values (n x 1).
        gpr: A GaussianProcessRegressor fitted to samples.
    Returns:
        Location of the acquisition function maximum.
    '''
    ndim = X_sample.shape[1]
    min_val = 1
    min_x = None
    def min_obj(X):
#   Minimization objective is the negative acquisition function
        return -acquisition(X.reshape(-1, ndim),X_sample,Y_sample,gpr)
#   Find the best optimum by starting from n_restart different random points.
    for x0 in np.random.uniform(bounds[:, 0], bounds[:, 1], size=(n_restarts, ndim)):
        res = minimize(min_obj, x0=x0, bounds=bounds, method='L-BFGS-B')
        if res.fun < min_val:
            min_val = res.fun
            min_x = res.x
    return min_x.reshape(-1, ndim)


ndim=3
ninit=10*ndim
bounds=np.zeros((ndim,2))
for i in range(ndim):
    for j in range(2):
        if j == 0 :
           bounds[i,j]= -5.12
        if j == 1 :
           bounds[i,j]= 5.12
noise = 0.001
def f(X,noise=noise):
    y=np.zeros((X.shape[0],))
    ndim = X.shape[1]
    for i in range(X.shape[0]):
        y[i]=-10.*ndim
        for j in range(X.shape[1]):
            y[i]=y[i]-X[i,j]**2+10.*np.cos(2.*(np.pi)*X[i,j])
    return y
X_init=np.zeros((ninit,ndim))
for i in range(ninit):
    for j in range(ndim):
        X_init[i,j]=bounds[j,0]+(bounds[j,1]-bounds[j,0])*np.random.random()
Y_init=f(X_init)
for i in range(ninit):
    print(X_init[i,:],Y_init[i])
best_obj=Y_init[0]
best_sol=X_init[0,:]
for i in range(ninit):
    if best_obj < Y_init[i]:
       best_obj=Y_init[i]
       best_sol=X_init[i,:]
       print(best_obj)
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import ConstantKernel,Matern
#   Gaussian process with Matern kernel as surrogate model
m52=ConstantKernel(1.0) * Matern(length_scale=1.0, nu=2.5)
gpr=GaussianProcessRegressor(kernel=m52, alpha=noise**2)
#   Initialize samples
X_sample=X_init
Y_sample=Y_init
#   Number of iterations
n_iter = 50*ndim
for i in range(n_iter):
#   Update Gaussian process with existing samples
    gpr.fit(X_sample,Y_sample)
#   Obtain next sampling point from the acquisition function (expected_improvement)
    X_next=propose_location(expected_improvement,X_sample,Y_sample,gpr,bounds, n_restarts=30*ndim)
#   Obtain next noisy sample from the objective function
    Y_next=f(X_next,noise)
    if Y_next > best_obj:
       best_obj=Y_next
       best_sol=X_next
       print(X_next,Y_next)
#   print(X_sample)
#   print(Y_sample)
    X_sample=np.vstack((X_sample,X_next))
    Y_sample=np.hstack((Y_sample,Y_next))

import matplotlib.pyplot as plt
x=np.zeros((Y_sample.shape[0],))
for i in range(Y_sample.shape[0]):
    x[i]=i+1
plt.xlabel('Iteration')
plt.ylabel('Objective function')
plt.plot(x, Y_sample, 'y-', lw=2)
plt.show()


---------------------------------------------------------------------------------------------------------------------
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d.axes3d import Axes3D
from matplotlib import cm
f = plt.figure(num=None, figsize=(15, 10)); ax = f.add_subplot(1, 1, 1, projection='3d');
x, y = np.meshgrid(np.linspace(-4, 4, 64), np.linspace(-4, 4, 64));
r = np.sqrt(x**2 + y**2);  z = .5*(np.exp(-r*r/4)); 
ax.plot_surface(x, y, z, rstride=1, cstride=1, cmap=cm.RdBu, linewidth=0);
plt.axis('off');plt.title('Bivariate Gaussian', fontsize=20);
import numpy as np
import matplotlib.pyplot as plt

class GPR:
    def __init__(self, nS, nT, iP, nP, Lx, xS, xT, yS, yT, yP, K, Ks, Kss, mu, sd):
        self.nS   =  nS           # # of test data points
        self.nT   =  nT           # # of training data points
        self.iP   =  iP           # # inverse of sigma
        self.nP   =  nP           # # number of priors
        self.Lx   =  Lx           # size of domain
        self.xS   =  xS           # training input
        self.xT   =  xT           # training output
        self.yS   =  yS           # test input
        self.yT   =  yT           # test output
        self.yP   =  yP           # prior output
        self.K    =  K            # kernel
        self.Ks   =  Ks           # kernel
        self.Kss  =  Kss          # kernel
        self.mu   =  mu           # mean
        self.sd   =  sd           # stanndard deviatio

    def calcDistM(self, r, s): 
        '''Calculate distance matrix between 2 1D arrays'''
        return r[..., np.newaxis] - s[np.newaxis, ...]
    
    def calcKernels(self): 
        '''Calculate the kernel'''
        cc = self.iP*0.5
        self.K   = np.exp(-cc*self.calcDistM(xT, xT)**2)
        self.Ks  = np.exp(-cc*self.calcDistM(xT, xS)**2)
        self.Kss = np.exp(-cc*self.calcDistM(xS, xS)**2)
        return 

    def calcPrior(self): 
        '''Calculate the prior'''
        L  = np.linalg.cholesky(self.Kss + 1e-6*np.eye(self.nS))
        G  = np.random.normal(size=(self.nS, self.nP))
        yP = np.dot(L, G)
        return 
    
    def calcMuSigma(self): 
        '''Calculate the mean'''
        self.mu =  np.dot(self.Ks.T, np.linalg.solve(self.K, self.yT))
        
        vv = self.Kss - np.dot(self.Ks.T, np.linalg.solve(self.K, self.Ks))
        self.sd = np.sqrt(np.abs(np.diag(vv)))
        
        # Posterior
        L  = np.linalg.cholesky(vv + 1e-6*np.eye(self.nS))
        self.yS = self.mu.reshape(-1,1)   + np.dot(L, np.random.normal(size=(self.nS, self.nP)))
        return 
    
    def plotResults(self): 
        plt.plot(self.xT, self.yT, 'o', ms=10, mfc='#348ABD', mec='none', label='training set' )
        plt.plot(self.xS, self.yS, '#dddddd', lw=1.5, label='posterior')
        plt.plot(self.xS, self.mu, '#A60628', lw=2, label='mean')

        # fill 95% confidence interval (2*sd about the mean)
        plt.fill_between(self.xS.flat, self.mu-2*self.sd, self.mu+2*self.sd, color="#348ABD", alpha=0.4, label='2 sigma')
        plt.axis('tight'); plt.legend(fontsize=15); plt.rcParams.update({'font.size':18})
 
    def runGPR(self):
        self.calcKernels()
        self.calcPrior()
        self.calcMuSigma()
        self.plotResults()


nS, Lx = 64, 0.5*np.pi
iP, nP  = 10.0 , 1  
xS = np.linspace(0, Lx, nS)
yS = xS*0;  yP = xS*0 
K  = 0; Ks=0; Kss=0; mu=0;sd=0; 
f = plt.figure(figsize=(20, 14));   


## Training set #1
nT = 4
xT = np.array([0, .2, .3, .9])
yT = np.cos(xT)

sp =  f.add_subplot(2,2,1);
plt.title('%s training points'%nT, fontsize=20)
gp = GPR(nS, nT, iP, nP, Lx, xS, xT, yS, yT, yP, K, Ks, Kss, mu,sd)
gp.runGPR()

## Training set #2
nT = 4
xT = np.linspace(0, Lx, nT); 
yT = np.cos(xT)

sp =  f.add_subplot(2,2,2);
plt.title('%s training points'%nT, fontsize=20)
gp = GPR(nS, nT, iP, nP, Lx, xS, xT, yS, yT, yP, K, Ks, Kss, mu,sd)
gp.runGPR()

## Training set #3
nT = 8
xT = np.linspace(0, Lx, nT); 
yT = np.cos(xT)

sp =  f.add_subplot(2,2,3);
plt.title('%s training points'%nT, fontsize=20)
gp = GPR(nS, nT, iP, nP, Lx, xS, xT, yS, yT, yP, K, Ks, Kss, mu,sd)
gp.runGPR()

## Training set #4
nT = 16
xT = np.linspace(0, Lx, nT); 
yT = np.cos(xT)

sp =  f.add_subplot(2,2,4);
plt.title('%s training points'%nT, fontsize=20)
gp = GPR(nS, nT, iP, nP, Lx, xS, xT, yS, yT, yP, K, Ks, Kss, mu,sd)
gp.runGPR()

nS, Lx = 256, 4.8*np.pi
iP, nP  = 10.0 , 1  
xS = np.linspace(-Lx, Lx, nS)
yS = xS*0;  yP = xS*0 
K  = 0; Ks=0; Kss=0; mu=0;sd=0; 
f = plt.figure(figsize=(20, 14));  

## Training set #1
nT = 16
xT = np.linspace(-Lx, Lx, nT)
yT = np.sin(xT)/xT

sp =  f.add_subplot(2,2,1);
plt.title('%s training points'%nT, fontsize=20)
gp = GPR(nS, nT, iP, nP, Lx, xS, xT, yS, yT, yP, K, Ks, Kss, mu,sd)
gp.runGPR()

## Training set #2
nT = 32
xT = np.linspace(-Lx, Lx, nT)
yT = np.sin(xT)/xT

sp =  f.add_subplot(2,2,2);
plt.title('%s training points'%nT, fontsize=20)
gp = GPR(nS, nT, iP, nP, Lx, xS, xT, yS, yT, yP, K, Ks, Kss, mu,sd)
gp.runGPR()

## Training set #3
nT = 64
xT = np.linspace(-Lx, Lx, nT)
yT = np.sin(xT)/xT

sp =  f.add_subplot(2,2,3);
plt.title('%s training points'%nT, fontsize=20)
gp = GPR(nS, nT, iP, nP, Lx, xS, xT, yS, yT, yP, K, Ks, Kss, mu,sd)
gp.runGPR()

## Training set #4
nT = 128
xT = np.linspace(-Lx, Lx, nT)
yT = np.sin(xT)/xT

sp =  f.add_subplot(2,2,4);
plt.title('%s training points'%nT, fontsize=20)
gp = GPR(nS, nT, iP, nP, Lx, xS, xT, yS, yT, yP, K, Ks, Kss, mu,sd)
gp.runGPR()

---------------------------------------------------------------------------------------------------------------------


---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------


핑백

덧글

  • 바죠 2020/01/26 14:20 # 답글

    https://blog.dominodatalab.com/fitting-gaussian-process-models-python/
    GaussianProcessRegressor
    covariance function, or kernel 을 선택한다.
    maximizing the log of the marginal likelihood

    GaussianProcessClassifier
    a soft, probabilistic classification
    a Laplace approximation is used to obtain a solution, rather than maximizing the marginal likelihood
  • 바죠 2020/02/12 10:50 # 답글

    http://krasserm.github.io/2018/03/19/gaussian-processes/
    https://github.com/krasserm/bayesian-machine-learning/blob/master/gaussian_processes.ipynb
  • 바죠 2020/02/12 09:58 # 답글

  • 바죠 2021/02/04 10:13 # 답글

    ls -ltra POSCAR_tmq_* |awk '/Feb 3/{print $NF}' |xargs -i{} cp {} ../e_tot/




    n = 2 # for 2 random indices
    index = np.random.choice(X.shape[0], n, replace=False)
    Then you just need to index your arrays with the result:

    x_random = X[index]
    y_random = Y[index]
댓글 입력 영역

최근 포토로그



MathJax