가장 쉬운 superlearner [ML-Ensemble (mlens) library] by 바죠

가장 쉬운 superlearner

기존에 알려진 9개 학습모델을 모두 활용한다. 사용하는 방법은 각각의 방법과 완전하게 동일하게 만들어 버릴 수 있다.
결국, 하나의 학습모델을 활용하는 것과 완전히 동일해진다. 
이 방법을 사용하지 않을 이유가 없다.
나아가서, 기계학습이 기계학습을 이용한다. 이쯤되면, 또 한번 기계학습을 올려도 좋을 듯하다.  [베이지안 옵티마이제이션을 활용해서 하이퍼 파라메터를 최적화하는 것이 하나의 예이다.]
묻고 더불로 갈 수 있을 것 같다.

Superlearner has successfully enhanced machine learning model robustness and performance by stacking prediction results from base learners.

The ML-Ensemble (mlens) library provides a convenient implementation that allows the superlearner to be fit and used in just a few lines of code. It is specifically designed to work with scikit-learn models.
ML-Ensemble combines a Scikit-learn high-level API with a low-level computational graph framework to build memory efficient, maximally parallelized ensemble networks in as few lines of codes as possible.

가장 쉬운 보팅 앙상블 (voting ensemble)

가장 쉬운 회귀:
가장 쉬운 LightGBM 회귀, 분류:

--------------------------------------------------------------------------------------------------------------------
#     example of a super learner for regression using the mlens library
from math import sqrt
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import ElasticNet
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
from sklearn.ensemble import AdaBoostRegressor
from sklearn.ensemble import BaggingRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import ExtraTreesRegressor
from mlens.ensemble import SuperLearner

#     create a list of base-models
def get_models():
models = list()
models.append(LinearRegression())
models.append(ElasticNet())
models.append(SVR(gamma='scale'))
models.append(DecisionTreeRegressor())
models.append(KNeighborsRegressor())
models.append(AdaBoostRegressor())
models.append(BaggingRegressor(n_estimators=10))
models.append(RandomForestRegressor(n_estimators=10))
models.append(ExtraTreesRegressor(n_estimators=10))
return models

#     cost function for base models
def rmse(yreal, yhat):
return sqrt(mean_squared_error(yreal, yhat))

#     create the superlearner
def get_super_learner(X):
ensemble = SuperLearner(scorer=rmse, folds=10, shuffle=True, sample_size=len(X))
#     add base models
models = get_models()
ensemble.add(models)
#     add the meta model
ensemble.add_meta(LinearRegression())
return ensemble

#     create the inputs and outputs
X, y = make_regression(n_samples=1000, n_features=100, noise=0.5)
#     split
X, X_val, y, y_val = train_test_split(X, y, test_size=0.50)
print('Train', X.shape, y.shape, 'Test', X_val.shape, y_val.shape)
#      create the superlearner
ensemble = get_super_learner(X)
#     fit the superlearner
ensemble.fit(X, y)
#     summarize base learners
print(ensemble.data)
#     evaluate meta model
yhat = ensemble.predict(X_val)
print('Super Learner: RMSE %.3f' % (rmse(y_val, yhat)))



--------------------------------------------------------------------------------------------------------------------
#     example of a super learner using the mlens library
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from mlens.ensemble import SuperLearner

#     create a list of base-models
def get_models():
models = list()
models.append(LogisticRegression(solver='liblinear'))
models.append(DecisionTreeClassifier())
models.append(SVC(gamma='scale', probability=True))
models.append(GaussianNB())
models.append(KNeighborsClassifier())
models.append(AdaBoostClassifier())
models.append(BaggingClassifier(n_estimators=10))
models.append(RandomForestClassifier(n_estimators=10))
models.append(ExtraTreesClassifier(n_estimators=10))
return models

#     create the superlearner
def get_super_learner(X):
ensemble = SuperLearner(scorer=accuracy_score, folds=10, shuffle=True, sample_size=len(X))
#     add base models
models = get_models()
ensemble.add(models)
#     add the meta model
ensemble.add_meta(LogisticRegression(solver='lbfgs'))
return ensemble

#     create the inputs and outputs
X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)
#     split
X, X_val, y, y_val = train_test_split(X, y, test_size=0.50)
print('Train', X.shape, y.shape, 'Test', X_val.shape, y_val.shape)
#     create the superlearner
ensemble = get_super_learner(X)
#     fit the superlearner
ensemble.fit(X, y)
#     summarize base learners
print(ensemble.data)
#     make predictions on hold out set
yhat = ensemble.predict(X_val)
print('Super Learner: %.3f' % (accuracy_score(y_val, yhat) * 100))


--------------------------------------------------------------------------------------------------------------------


덧글

댓글 입력 영역

최근 포토로그



MathJax