Title: | Construction and Evaluation of Metamodels |
---|---|
Description: | Estimation, validation and prediction of models of different types : linear models, additive models, MARS,PolyMARS and Kriging. |
Authors: | D. Dupuy and C. Helbert |
Maintainer: | C. Helbert <[email protected]> |
License: | GPL-3 |
Version: | 1.6.1 |
Built: | 2025-02-06 03:55:36 UTC |
Source: | https://github.com/cran/DiceEval |
Construction and evaluation of metamodels.
Package: | DiceEval |
Type: | Package |
Version: | 1.4 |
Date: | 2015-06-15 |
License: | GPL-3 |
This package is dedicated to the construction of metamodels. A validation procedure is also proposed using usual criteria (RMSE, MAE etc.) and cross-validation procedure. Moreover, graphical tools help to choose the best value for the penalty parameter of a stepwise or a PolyMARS model. Another routine is dedicated to the comparison of metamodels.
This work was conducted within the frame of the DICE (Deep Inside Computer Experiments) Consortium between ARMINES, Renault, EDF, IRSN, ONERA and TOTAL S.A. (http://emse.dice.fr/).
Functions gam
, mars
and polymars
are required for the construction of metamodels. km
provides Kriging models.
D. Dupuy & C. Helbert
Dupuy D., Helbert C., Franco J. (2015), DiceDesign and DiceEval: Two R-Packages for Design and Analysis of Computer Experiments, Journal of Statistical Software, 65(11), 1–38, https://www.jstatsoft.org/v65/i11/.
Friedman J. (1991), Multivariate Adaptative Regression Splines (invited paper), Annals of Statistics, 10/1, 1-141.
Hastie T. and Tibshirani R. (1990), Generalized Additive Models, Chapman and Hall, London.
Hastie T., Tibshirani R. and Friedman J. (2001), The Elements of Statistical Learning : Data Mining, Inference and Prediction, Springer.
Helbert C. and Dupuy D. (2007-09-26), Retour d'exp?riences sur m?tamod?les : partie th?orique, Livrable r?dig? dans le cadre du Consortium DICE.
Kooperberg C., Bose S. and Stone C.J. (1997), Polychotomous Regression, Journal of the American Statistical Association, 92 Issue 437, 117-127.
Rasmussen C.E. and Williams C.K.I. (2006), Gaussian Processes for Machine Learning, the MIT Press, www.GaussianProcess.org/gpml.
Stones C., Hansen M.H., Kooperberg C. and Truong Y.K. (1997), Polynomial Splines and their Tensor Products in Extended Linear Modeling, Annals of Statistics, 25/4, 1371-1470.
modelFit
, modelPredict
,
crossValidation
and modelComparison
Different space-filling designs can be found in the DiceDesign
package and we refer to the DiceKriging
package for the construction of kriging models. This package takes part of a toolbox inplemented during the Dice consortium.
## Not run: rm(list=ls()) # A 2D example Branin <- function(x1,x2) { x1 <- 1/2*(15*x1+5) x2 <- 15/2*(x2+1) (x2 - 5.1/(4*pi^2)*(x1^2) + 5/pi*x1 - 6)^2 + 10*(1 - 1/(8*pi))*cos(x1) + 10 } # A 2D uniform design with n points in [-1,1]^2 n <- 50 X <- matrix(runif(n*2,-1,1),ncol=2,nrow=n) Y <- Branin(X[,1],X[,2]) Z <- (Y-mean(Y))/sd(Y) # Construction of a PolyMARS model with a penalty parameter equal to 2 library(polspline) modPolyMARS <- modelFit(X,Z,type = "PolyMARS",gcv=2.2) # Prediction and comparison between the exact function and the predicted one xtest <- seq(-1, 1, length= 21) ytest <- seq(-1, 1, length= 21) Zreal <- outer(xtest, ytest, Branin) Zreal <- (Zreal-mean(Y))/sd(Y) Zpredict <- modelPredict(modPolyMARS,expand.grid(xtest,ytest)) m <- min(floor(Zreal),floor(Zpredict)) M <- max(ceiling(Zreal),ceiling(Zpredict)) persp(xtest, ytest, Zreal, theta = 30, phi = 30, expand = 0.5, col = "lightblue",main="Branin function",zlim=c(m,M), ticktype = "detailed") persp(xtest, ytest, matrix(Zpredict,nrow=length(xtest), ncol=length(ytest)), theta = 30, phi = 30, expand = 0.5, col = "lightblue",main="PolyMARS Model",zlab="Ypredict",zlim=c(m,M), ticktype = "detailed") # Comparison of models modelComparison(X,Y,type=c("Linear", "StepLinear","PolyMARS","Kriging"), formula=Y~X1+X2+X1:X2+I(X1^2)+I(X2^2),penalty=log(dim(X)[1]), gcv=4) # see also the demonstration example in dimension 5 (source: IRSN) demo(IRSN5D) ## End(Not run)
## Not run: rm(list=ls()) # A 2D example Branin <- function(x1,x2) { x1 <- 1/2*(15*x1+5) x2 <- 15/2*(x2+1) (x2 - 5.1/(4*pi^2)*(x1^2) + 5/pi*x1 - 6)^2 + 10*(1 - 1/(8*pi))*cos(x1) + 10 } # A 2D uniform design with n points in [-1,1]^2 n <- 50 X <- matrix(runif(n*2,-1,1),ncol=2,nrow=n) Y <- Branin(X[,1],X[,2]) Z <- (Y-mean(Y))/sd(Y) # Construction of a PolyMARS model with a penalty parameter equal to 2 library(polspline) modPolyMARS <- modelFit(X,Z,type = "PolyMARS",gcv=2.2) # Prediction and comparison between the exact function and the predicted one xtest <- seq(-1, 1, length= 21) ytest <- seq(-1, 1, length= 21) Zreal <- outer(xtest, ytest, Branin) Zreal <- (Zreal-mean(Y))/sd(Y) Zpredict <- modelPredict(modPolyMARS,expand.grid(xtest,ytest)) m <- min(floor(Zreal),floor(Zpredict)) M <- max(ceiling(Zreal),ceiling(Zpredict)) persp(xtest, ytest, Zreal, theta = 30, phi = 30, expand = 0.5, col = "lightblue",main="Branin function",zlim=c(m,M), ticktype = "detailed") persp(xtest, ytest, matrix(Zpredict,nrow=length(xtest), ncol=length(ytest)), theta = 30, phi = 30, expand = 0.5, col = "lightblue",main="PolyMARS Model",zlab="Ypredict",zlim=c(m,M), ticktype = "detailed") # Comparison of models modelComparison(X,Y,type=c("Linear", "StepLinear","PolyMARS","Kriging"), formula=Y~X1+X2+X1:X2+I(X1^2)+I(X2^2),penalty=log(dim(X)[1]), gcv=4) # see also the demonstration example in dimension 5 (source: IRSN) demo(IRSN5D) ## End(Not run)
This function calculates the predicted values at each point of the design and gives an estimation of criterion using K-fold cross-validation.
crossValidation(model, K)
crossValidation(model, K)
model |
an output of the |
K |
the number of groups into which the data should be split to apply cross-validation |
A list with the following components:
Ypred |
a vector of predicted values obtained using K-fold cross-validation at the points of the design |
Q2 |
a real which is the estimation of the criterion |
folds |
a list which indicates the partitioning of the data into the folds |
RMSE_CV |
|
MAE_CV |
|
In the case of a Kriging model, other components to test the robustess of the procedure are proposed:
theta |
the range parameter theta estimated for each fold, |
trend |
the trend parameter estimated for each fold, |
shape |
the estimated shape parameter if the covariance structure is of type |
The principle of cross-validation is to split the data into folds of approximately equal size
. For
to
, a model
is fitted from the data
and this model is validated on the fold
. Given a criterion of quality
(here,
could be the
RMSE
or the MAE
criterion), the "evaluation" of the model consists in computing :
The cross-validation criterion is the mean of the criterion:
_CV
The Q2
criterion is defined as: Q2
with
Y
the response value and Ypred
the value fit by cross-validation.
When K
is equal to the number of observations, leave-one-out cross-validation
is performed.
D. Dupuy
R2
, modelFit
, MAE
, RMSE
, foldsComposition
, testCrossValidation
## Not run: rm(list=ls()) # A 2D example Branin <- function(x1,x2) { x1 <- x1*15-5 x2 <- x2*15 (x2 - 5/(4*pi^2)*(x1^2) + 5/pi*x1 - 6)^2 + 10*(1 - 1/(8*pi))*cos(x1) + 10 } # Linear model on 50 points n <- 50 X <- matrix(runif(n*2),ncol=2,nrow=n) Y <- Branin(X[,1],X[,2]) modLm <- modelFit(X,Y,type = "Linear",formula=Y~X1+X2+X1:X2+I(X1^2)+I(X2^2)) R2(Y,modLm$model$fitted.values) crossValidation(modLm,K=10)$Q2 # kriging model : gaussian covariance structure, no trend, no nugget effect # on 16 points n <- 16 X <- data.frame(x1=runif(n),x2=runif(n)) Y <- Branin(X[,1],X[,2]) mKm <- modelFit(X,Y,type="Kriging",formula=~1, covtype="powexp") K <- 10 out <- crossValidation(mKm, K) par(mfrow=c(2,2)) plot(c(0,1:K),c(mKm$model@[email protected][1],out$theta[,1]), xlab='',ylab='Theta1') plot(c(0,1:K),c(mKm$model@[email protected][2],out$theta[,2]), xlab='',ylab='Theta2') plot(c(0,1:K),c(mKm$model@[email protected][1],out$shape[,1]), xlab='',ylab='p1',ylim=c(0,2)) plot(c(0,1:K),c(mKm$model@[email protected][2],out$shape[,2]), xlab='',ylab='p2',ylim=c(0,2)) par(mfrow=c(1,1)) ## End(Not run)
## Not run: rm(list=ls()) # A 2D example Branin <- function(x1,x2) { x1 <- x1*15-5 x2 <- x2*15 (x2 - 5/(4*pi^2)*(x1^2) + 5/pi*x1 - 6)^2 + 10*(1 - 1/(8*pi))*cos(x1) + 10 } # Linear model on 50 points n <- 50 X <- matrix(runif(n*2),ncol=2,nrow=n) Y <- Branin(X[,1],X[,2]) modLm <- modelFit(X,Y,type = "Linear",formula=Y~X1+X2+X1:X2+I(X1^2)+I(X2^2)) R2(Y,modLm$model$fitted.values) crossValidation(modLm,K=10)$Q2 # kriging model : gaussian covariance structure, no trend, no nugget effect # on 16 points n <- 16 X <- data.frame(x1=runif(n),x2=runif(n)) Y <- Branin(X[,1],X[,2]) mKm <- modelFit(X,Y,type="Kriging",formula=~1, covtype="powexp") K <- 10 out <- crossValidation(mKm, K) par(mfrow=c(2,2)) plot(c(0,1:K),c(mKm$model@covariance@range.val[1],out$theta[,1]), xlab='',ylab='Theta1') plot(c(0,1:K),c(mKm$model@covariance@range.val[2],out$theta[,2]), xlab='',ylab='Theta2') plot(c(0,1:K),c(mKm$model@covariance@shape.val[1],out$shape[,1]), xlab='',ylab='p1',ylim=c(0,2)) plot(c(0,1:K),c(mKm$model@covariance@shape.val[2],out$shape[,2]), xlab='',ylab='p2',ylim=c(0,2)) par(mfrow=c(1,1)) ## End(Not run)
Nuclear criticality safety assessments are based on an optimization process to search for safety-penalizing physical conditions in a given range of parameters of a system involving fissile materials.
In the following examples, the criticality coefficient (namely k-effective or keff) models the nuclear chain reaction trend:
- keff > 1 is an increasing neutrons production leading to an uncontrolled chain reaction potentially having deep consequences on safety,
- keff = 1 means a stable neutrons population as required in nuclear reactors,
- keff < 1 is the safety state required for all unused fissile materials, like for fuel storage.
Besides its fissile materials geometry and composition, the criticality of a system is widely sensitive to physical parameters like water density, geometrical perturbations or structure materials (like concrete) characteristics. Thereby, a typical criticality safety assessment is supposed to verify that k-effective cannot reach the critical value of 1.0 (in practice the limit value used is 0.95) for given hypothesis on these parameters.
The benchmark system is an assembly of four fuel rods contained in a reflecting hull. Regarding criticality safety hypothesis, the main parameters are the uranium enrichment of fuel (namely "e", U235 enrichment, varying in [0.03, 0.07]), the rods assembly geometrical characteristics (namely "p", the pitch between rods, varying in [1.0, 2.0] cm and "l", the length of fuel rods, varying in [10, 60] cm), the water density inside the assembly (namely "b", varying in [0.1, 0.9]) , and the hull reflection characteristics (namely "r", reflection coefficient, varying in [0.75, 0.95]).
In this criticality assessment, the MORET (Fernex et al., 2005) Monte Carlo simulator is used to estimate the criticality coefficient of the fuel storage system using these parameters (among other) as numerical input,. The output k-effective is returned as a Gaussian density which standard deviation is setup to be negligible regarding input parameters sensitivity.
data(dataIRSN5D)
data(dataIRSN5D)
a data frame with 50 observations (lines) and 6 columns. Columns 1 to 5 correspond to the design of experiments for the input variables ("b","e","p","r" and "l") and the last column the value of the output "keff".
Y. Richet
IRSN (Institut de Radioprotection et de Sûreté Nucléaire)
Fernex F., Heulers L, Jacquet O., Miss J. and Richet Y. (2005) The MORET 4B Monte Carlo code - New features to treat complex criticality systems, M&C International Conference on Mathematics and Computation Supercomputing, Reactor Physics and Nuclear and Biological Application, Avignon, 12/09/2005
The mean of absolute errors between real values and predictions.
MAE(Y, Ypred)
MAE(Y, Ypred)
Y |
a real vector with the values of the output |
Ypred |
a real vector with the predicted values at the same inputs |
a real which represents the mean of the absolute errors between the real and the predicted values:
where denotes the points of the experimental design,
the output of the computer code and
the fitted model.
D. Dupuy
other quality criteia as RMSE
and RMA
.
X <- seq(-1,1,0.1) Y <- 3*X + rnorm(length(X),0,0.5) Ypred <- 3*X MAE(Y,Ypred)
X <- seq(-1,1,0.1) Y <- 3*X + rnorm(length(X),0,0.5) Ypred <- 3*X MAE(Y,Ypred)
modelComparison fits different metamodels and returns R2
and RMSE
criteria relating to each.
modelComparison(X,Y, type="all",K=10,test=NULL,...)
modelComparison(X,Y, type="all",K=10,test=NULL,...)
X |
a data.frame containing the design of experiments |
Y |
a vector containing the response variable |
type |
a vector containing the type of models to compare. The default value is |
K |
the number of folds for cross-validation (default value is set at 10) |
test |
a data.frame containing the design and the response of a test set when available, the prediction criteria will be evaluated on the test design (default corresponds to no test set) |
... |
according to the |
A list containing two fields if the argument test
equal NULL
and three fields otherwise :
Learning |
|
CV |
|
Test |
|
A graphical tool to compare the value of the criteria is proposed.
D. Dupuy
## Not run: data(dataIRSN5D) X <- dataIRSN5D[,1:5] Y <- dataIRSN5D[,6] data(testIRSN5D) library(gam) library(mda) library(polspline) crit <- modelComparison(X,Y, type="all",test=testIRSN5D) crit2 <- modelComparison(X,Y, type=rep("StepLinear",5),test=testIRSN5D, penalty=c(1,2,5,10,20),formula=Y~.^2) ## End(Not run)
## Not run: data(dataIRSN5D) X <- dataIRSN5D[,1:5] Y <- dataIRSN5D[,6] data(testIRSN5D) library(gam) library(mda) library(polspline) crit <- modelComparison(X,Y, type="all",test=testIRSN5D) crit2 <- modelComparison(X,Y, type=rep("StepLinear",5),test=testIRSN5D, penalty=c(1,2,5,10,20),formula=Y~.^2) ## End(Not run)
modelFit
is used to fit a metamodel of class lm
, gam
,
mars
, polymars
or km
.
modelFit (X,Y, type, ...)
modelFit (X,Y, type, ...)
X |
a data.frame containing the design of experiments |
|||||||||||||||||
Y |
a vector containing the response variable |
|||||||||||||||||
type |
represents the method used to fit the model:
|
|||||||||||||||||
... |
corresponds to the parameter(s) of the model. The list of the needed arguments for each type of models is given below:
|
A list with the following components:
X |
a data frame representing the design of experiments |
Y |
a vector representing the response |
type |
the type of metamodel |
model |
a fitted model of the specified class |
and the value of the parameter(s) depending on the fitted model.
D. Dupuy
# A 2D example Branin <- function(x1,x2) { x1 <- x1*15-5 x2 <- x2*15 (x2 - 5/(4*pi^2)*(x1^2) + 5/pi*x1 - 6)^2 + 10*(1 - 1/(8*pi))*cos(x1) + 10 } # a 2D uniform design and the value of the response at these points X <- matrix(runif(24),ncol=2,nrow=12) Z <- Branin(X[,1],X[,2]) Y <- (Z-mean(Z))/sd(Z) # construction of a linear model modLm <- modelFit(X,Y,type = "Linear",formula=Y~X1+X2+X1:X2+I(X1^2)+I(X2^2)) summary(modLm$model) ## Not run: # construction of a stepwise-selected model modStep <- modelFit(X,Y,type = "StepLinear",penalty=log(dim(X)[1]), formula=Y~X1+X2+X1:X2+I(X1^2)+I(X2^2)) summary(modStep$model) # construction of an additive model library(gam) modAm <- modelFit(X,Y,type = "Additive",formula=Y~s(X1)+s(X2)) summary(modAm$model) # construction of a MARS model of degree 2 library(mda) modMARS <- modelFit(X,Y,type = "MARS",degree=2) print(modMARS$model) # construction of a PolyMARS model with a penalty parameter equal to 1 library(polspline) modPolyMARS <- modelFit(X,Y,type = "PolyMARS",gcv=1) summary(modPolyMARS$model) # construction of a Kriging model modKm <- modelFit(X,Y,type = "Kriging") str(modKm$model) ## End(Not run)
# A 2D example Branin <- function(x1,x2) { x1 <- x1*15-5 x2 <- x2*15 (x2 - 5/(4*pi^2)*(x1^2) + 5/pi*x1 - 6)^2 + 10*(1 - 1/(8*pi))*cos(x1) + 10 } # a 2D uniform design and the value of the response at these points X <- matrix(runif(24),ncol=2,nrow=12) Z <- Branin(X[,1],X[,2]) Y <- (Z-mean(Z))/sd(Z) # construction of a linear model modLm <- modelFit(X,Y,type = "Linear",formula=Y~X1+X2+X1:X2+I(X1^2)+I(X2^2)) summary(modLm$model) ## Not run: # construction of a stepwise-selected model modStep <- modelFit(X,Y,type = "StepLinear",penalty=log(dim(X)[1]), formula=Y~X1+X2+X1:X2+I(X1^2)+I(X2^2)) summary(modStep$model) # construction of an additive model library(gam) modAm <- modelFit(X,Y,type = "Additive",formula=Y~s(X1)+s(X2)) summary(modAm$model) # construction of a MARS model of degree 2 library(mda) modMARS <- modelFit(X,Y,type = "MARS",degree=2) print(modMARS$model) # construction of a PolyMARS model with a penalty parameter equal to 1 library(polspline) modPolyMARS <- modelFit(X,Y,type = "PolyMARS",gcv=1) summary(modPolyMARS$model) # construction of a Kriging model modKm <- modelFit(X,Y,type = "Kriging") str(modKm$model) ## End(Not run)
modelPredict
computes predicted values based on the model given in argument.
modelPredict(model,newdata)
modelPredict(model,newdata)
model |
a fitted model obtained from |
newdata |
a matrix (or a data frame) which represents the predictor values at which the fitted values will be computed. |
a vector of predicted values, obtained by evaluating the model at newdata
.
D. Dupuy
X <- seq(-1,1,l=21) Y <- 3*X + rnorm(21,0,0.5) # construction of a linear model modLm <- modelFit(X,Y,type = "Linear",formula="Y~.") print(modLm$model$coefficient) ## Not run: # illustration on a 2-dimensional example Branin <- function(x1,x2) { x1 <- 1/2*(15*x1+5) x2 <- 15/2*(x2+1) (x2 - 5.1/(4*pi^2)*(x1^2) + 5/pi*x1 - 6)^2 + 10*(1 - 1/(8*pi))*cos(x1) + 10 } # A 2D uniform design with 20 points in [-1,1]^2 n <- 20 X <- matrix(runif(n*2,-1,1),ncol=2,nrow=n) Y <- Branin(X[,1],X[,2]) Z <- (Y-mean(Y))/sd(Y) # Construction of a Kriging model mKm <- modelFit(X,Z,type = "Kriging") # Prediction and comparison between the exact function and the predicted one xtest <- seq(-1, 1, length= 21) ytest <- seq(-1, 1, length= 21) Zreal <- outer(xtest, ytest, Branin) Zreal <- (Zreal-mean(Y))/sd(Y) Zpredict <- modelPredict(mKm,expand.grid(xtest,ytest)) z <- abs(Zreal-matrix(Zpredict,nrow=length(xtest),ncol=length(ytest))) contour(xtest, xtest, z,30) points(X,pch=19) ## End(Not run)
X <- seq(-1,1,l=21) Y <- 3*X + rnorm(21,0,0.5) # construction of a linear model modLm <- modelFit(X,Y,type = "Linear",formula="Y~.") print(modLm$model$coefficient) ## Not run: # illustration on a 2-dimensional example Branin <- function(x1,x2) { x1 <- 1/2*(15*x1+5) x2 <- 15/2*(x2+1) (x2 - 5.1/(4*pi^2)*(x1^2) + 5/pi*x1 - 6)^2 + 10*(1 - 1/(8*pi))*cos(x1) + 10 } # A 2D uniform design with 20 points in [-1,1]^2 n <- 20 X <- matrix(runif(n*2,-1,1),ncol=2,nrow=n) Y <- Branin(X[,1],X[,2]) Z <- (Y-mean(Y))/sd(Y) # Construction of a Kriging model mKm <- modelFit(X,Z,type = "Kriging") # Prediction and comparison between the exact function and the predicted one xtest <- seq(-1, 1, length= 21) ytest <- seq(-1, 1, length= 21) Zreal <- outer(xtest, ytest, Branin) Zreal <- (Zreal-mean(Y))/sd(Y) Zpredict <- modelPredict(mKm,expand.grid(xtest,ytest)) z <- abs(Zreal-matrix(Zpredict,nrow=length(xtest),ncol=length(ytest))) contour(xtest, xtest, z,30) points(X,pch=19) ## End(Not run)
This function fits a PolyMARS model for different values of the penalty parameter and compute criteria.
penaltyPolyMARS(X,Y,test=NULL,graphic=FALSE,K=10, Penalty=seq(0,5,by=0.2))
penaltyPolyMARS(X,Y,test=NULL,graphic=FALSE,K=10, Penalty=seq(0,5,by=0.2))
X |
a data.frame containing the design of experiments |
Y |
a vector containing the response variable |
test |
a data.frame containing the design and the response of a test set when available, the prediction criteria will be computed for the test data (default corresponds to no test set) |
graphic |
if TRUE the values of the criteria are represented |
K |
the number of folds for cross-validation (by default, K=10) |
Penalty |
a vector containing the values of the penalty parameter |
A data frame containing
a |
the values of the penalty parameter |
R2 |
the |
m |
the size of the selected model |
If a test set is available the last row is
R2test |
the |
If no test set is available, criteria computed by K-corss-validation are provided:
Q2 |
the |
RMSE CV |
|
Note that the penalty parameter could be chosen by minimizing the value of the RMSE
by cross-validation.
D. Dupuy
modelFit
, R2
and crossValidation
data(dataIRSN5D) X <- dataIRSN5D[,1:5] Y <- dataIRSN5D[,6] data(testIRSN5D) library(polspline) Crit <- penaltyPolyMARS(X,Y,test=testIRSN5D[,-7],graphic=TRUE)
data(dataIRSN5D) X <- dataIRSN5D[,1:5] Y <- dataIRSN5D[,6] data(testIRSN5D) library(polspline) Crit <- penaltyPolyMARS(X,Y,test=testIRSN5D[,-7],graphic=TRUE)
Coefficient of determination
R2(Y, Ypred)
R2(Y, Ypred)
Y |
a real vector with the values of the output |
Ypred |
a real vector with the predicted values at the same inputs |
where is the residual sum of squares
and is the total sum of squares.
Note that the order of the input argument is important.
D. Dupuy
X <- seq(-1,1,0.1) Y <- 3*X + rnorm(length(X),0,0.5) Ypred <- 3*X print(R2(Y,Ypred))
X <- seq(-1,1,0.1) Y <- 3*X + rnorm(length(X),0,0.5) Ypred <- 3*X print(R2(Y,Ypred))
residualsStudy analyzes the residuals of a model: a plot of the residuals against the index, a plot of the residuals against the fitted values, the representation of the density and a normal Q-Q plot.
residualsStudy(model)
residualsStudy(model)
model |
a fitted model obtained from |
D. Dupuy
modelFit
and modelPredict
data(dataIRSN5D) X <- dataIRSN5D[,1:5] Y <- dataIRSN5D[,6] library(gam) modAm <- modelFit(X,Y,type = "Additive",formula=formulaAm(X,Y)) residualsStudy(modAm)
data(dataIRSN5D) X <- dataIRSN5D[,1:5] Y <- dataIRSN5D[,6] library(gam) modAm <- modelFit(X,Y,type = "Additive",formula=formulaAm(X,Y)) residualsStudy(modAm)
Relative Maximal Absolute Error
RMA(Y, Ypred)
RMA(Y, Ypred)
Y |
a real vector with the values of the output |
Ypred |
a real vector with the predicted values at the same inputs |
The RMA
criterion represents the maximum of errors between exact values and predicted one:
where is the output variable,
is the fitted model and
denotes the standard deviation of
.
The output of this function is a list with the following components:
max.value |
the value of the |
max.data |
an integer |
index |
a vector containing the data sorted according to the value of the errors |
error |
a vector containing the corresponding value of the errors |
D. Dupuy
other validation criteria as MAE
or RMSE
.
X <- seq(-1,1,0.1) Y <- 3*X + rnorm(length(X),0,0.5) Ypred <- 3*X print(RMA(Y,Ypred)) # Illustration on Branin function Branin <- function(x1,x2) { x1 <- x1*15-5 x2 <- x2*15 (x2 - 5/(4*pi^2)*(x1^2) + 5/pi*x1 - 6)^2 + 10*(1 - 1/(8*pi))*cos(x1) + 10 } X <- matrix(runif(24),ncol=2,nrow=12) Z <- Branin(X[,1],X[,2]) Y <- (Z-mean(Z))/sd(Z) # Fitting of a Linear model on the data (X,Y) modLm <- modelFit(X,Y,type = "Linear",formula=Y~X1+X2+X1:X2+I(X1^2)+I(X2^2)) # Prediction on a grid u <- seq(0,1,0.1) Y_test_real <- Branin(expand.grid(u,u)[,1],expand.grid(u,u)[,2]) Y_test_pred <- modelPredict(modLm,expand.grid(u,u)) Y_error <- matrix(abs(Y_test_pred-(Y_test_real-mean(Z))/sd(Z)),length(u),length(u)) contour(u, u, Y_error,45) Y_pred <- modelPredict(modLm,X) out <- RMA(Y,Y_pred) for (i in 1:dim(X)[1]){ points(X[out$index[i],1],X[out$index[i],2],pch=19,col='red',cex=out$error[i]*10) }
X <- seq(-1,1,0.1) Y <- 3*X + rnorm(length(X),0,0.5) Ypred <- 3*X print(RMA(Y,Ypred)) # Illustration on Branin function Branin <- function(x1,x2) { x1 <- x1*15-5 x2 <- x2*15 (x2 - 5/(4*pi^2)*(x1^2) + 5/pi*x1 - 6)^2 + 10*(1 - 1/(8*pi))*cos(x1) + 10 } X <- matrix(runif(24),ncol=2,nrow=12) Z <- Branin(X[,1],X[,2]) Y <- (Z-mean(Z))/sd(Z) # Fitting of a Linear model on the data (X,Y) modLm <- modelFit(X,Y,type = "Linear",formula=Y~X1+X2+X1:X2+I(X1^2)+I(X2^2)) # Prediction on a grid u <- seq(0,1,0.1) Y_test_real <- Branin(expand.grid(u,u)[,1],expand.grid(u,u)[,2]) Y_test_pred <- modelPredict(modLm,expand.grid(u,u)) Y_error <- matrix(abs(Y_test_pred-(Y_test_real-mean(Z))/sd(Z)),length(u),length(u)) contour(u, u, Y_error,45) Y_pred <- modelPredict(modLm,X) out <- RMA(Y,Y_pred) for (i in 1:dim(X)[1]){ points(X[out$index[i],1],X[out$index[i],2],pch=19,col='red',cex=out$error[i]*10) }
The root of the Mean Squared Error between the exact value and the predicted one.
RMSE(Y, Ypred)
RMSE(Y, Ypred)
Y |
a real vector with the values of the output |
Ypred |
a real vector with the predicted values |
a real which represents the root of the mean squared error between the target response
and the fitted one
:
D. Dupuy
other validation criteria as MAE
or RMA
X <- seq(-1,1,0.1) Y <- 3*X + rnorm(length(X),0,0.5) Ypred <- 3*X print(RMSE(Y,Ypred))
X <- seq(-1,1,0.1) Y <- 3*X + rnorm(length(X),0,0.5) Ypred <- 3*X print(RMSE(Y,Ypred))
Graphical representation of the selected terms using stepwise procedure for different values of the penalty parameter.
stepEvolution(X,Y,formula,P=1:7,K=10,test=NULL,graphic=TRUE)
stepEvolution(X,Y,formula,P=1:7,K=10,test=NULL,graphic=TRUE)
X |
a data.frame containing the design of experiments |
Y |
a vector containing the response variable |
formula |
a formula for the initial model |
P |
a vector containing different values of the penalty parameter for which a stepwise selected model is fitted |
K |
the number of folds for the cross-validation procedure |
test |
an additional data set on which the prediction criteria are evaluated (default corresponds to no test data set) |
graphic |
if |
a list with the different criteria for different values of the penalty parameter. This list contains:
penalty |
the values for the penalty parameter |
m |
size |
R2 |
the value of the |
According to the value of the test
argument, other criteria are calculated:
a. | If a test set is available, R2test contains the value of the R2
criterion on the test set |
b. | If no test set is available, the Q2 and the RMSE computed by
cross-validation are done.
|
Plots are also available.
A tabular represents the selected terms for each value in P
.
The evolution of the R2
criterion, the evolution of the size m
of the selected
model and criteria on the test set or by K-folds cross-validation are represented.
These graphical tools can be used to select the best value for the penalty parameter.
D. Dupuy
step
procedure for linear models.
## Not run: data(dataIRSN5D) design <- dataIRSN5D[,1:5] Y <- dataIRSN5D[,6] out <- stepEvolution(design,Y,formulaLm(design,Y),P=c(1,2,5,10,20,30)) ## End(Not run)
## Not run: data(dataIRSN5D) design <- dataIRSN5D[,1:5] Y <- dataIRSN5D[,6] out <- stepEvolution(design,Y,formulaLm(design,Y),P=c(1,2,5,10,20,30)) ## End(Not run)
This function calculates the estimated K-fold cross-validation for different
values of K
.
testCrossValidation(model,Kfold=c(2,5,10,20,30,40,dim(model$data$X)[1]),N=10)
testCrossValidation(model,Kfold=c(2,5,10,20,30,40,dim(model$data$X)[1]),N=10)
model |
a fitted model from |
Kfold |
a vector containing the values to test (default corresponds to 2,5,10,20,30,40 and the number of observations for leave-one-out procedure) |
N |
an integer given the number of times the K-fold cross-validation is performed for each value of K |
a matrix of all the values obtained by K-fold cross-validation
For each value of K, the cross-validation procedure is repeated N
times in
order to get an idea of the dispersion of the Q2
criterion and of the RMSE
by K-fold cross-validation.
D. Dupuy
## Not run: rm(list=ls()) # A 2D example Branin <- function(x1,x2) { x1 <- x1*15-5 x2 <- x2*15 (x2 - 5/(4*pi^2)*(x1^2) + 5/pi*x1 - 6)^2 + 10*(1 - 1/(8*pi))*cos(x1) + 10 } # a 2D uniform design and the value of the response at these points n <- 50 X <- matrix(runif(n*2),ncol=2,nrow=n) Y <- Branin(X[,1],X[,2]) mod <- modelFit(X,Y,type="Linear",formula=formulaLm(X,Y)) out <- testCrossValidation(mod,N=20) ## End(Not run)
## Not run: rm(list=ls()) # A 2D example Branin <- function(x1,x2) { x1 <- x1*15-5 x2 <- x2*15 (x2 - 5/(4*pi^2)*(x1^2) + 5/pi*x1 - 6)^2 + 10*(1 - 1/(8*pi))*cos(x1) + 10 } # a 2D uniform design and the value of the response at these points n <- 50 X <- matrix(runif(n*2),ncol=2,nrow=n) Y <- Branin(X[,1],X[,2]) mod <- modelFit(X,Y,type="Linear",formula=formulaLm(X,Y)) out <- testCrossValidation(mod,N=20) ## End(Not run)
These test data correspond to the five-dimensional case provided by the IRSN
detailed in dataIRSN5D
.
data(testIRSN5D)
data(testIRSN5D)
A data frame with 324 rows representing the number of observations and 6 columns:
the first five corresponding to the input variables ("b","e","p","r" and "l") and the last to the response
Keff
.
IRSN (Institut de Radioprotection et de Sûreté Nucléaire)