CoxBoost              package:CoxBoost              R Documentation

_F_i_t _a _C_o_x _s_u_r_v_i_v_a_l _m_o_d_e_l _b_y _l_i_k_e_l_i_h_o_o_d _b_a_s_e_d _b_o_o_s_t_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     'CoxBoost' is used to fit a Cox proportional hazards model by
     componentwise likelihood based boosting.   It is especially suited
     for models with a large number of predictors and allows for
     mandatory covariates with unpenalized parameter estimates.

_U_s_a_g_e:

     CoxBoost(time,status,x,unpen.index=NULL,standardize=TRUE,stepno=100,
              penalty=100,trace=FALSE) 

_A_r_g_u_m_e_n_t_s:

    time: vector of length 'n' specifying the observed times.

  status: censoring indicator, i.e., vector of length 'n' with entries
          '0' for censored observations and '1' for uncensored
          observations.

       x: 'n * p' matrix of covariates.

unpen.index: vector of length 'p.unpen' with indices of mandatory
          covariates, where parameter estimation should be performed
          unpenalized.

standardize: logical value indicating whether covariates should be
          standardized for estimation. This does not apply for
          mandatory covariates, i.e., these are not standardized.

 penalty: penalty value for the update of an individual element of the
          parameter vector in each boosting step.

  stepno: number of boosting steps ('m').

   trace: logical value indicating whether progress in estimation
          should be indicated by printing the name of the covariate
          updated.

_D_e_t_a_i_l_s:

     In contrast to gradient boosting (implemented e.g. in the
     'glmboost' routine in the R package 'mboost', using the 'CoxPH'
     loss function), 'CoxBoost' is not based on gradients of loss
     functions, but adapts the offset-based boosting approach from Tutz
     and Binder (2007) for estimating Cox proportional hazards models.
     In each boosting step the previous boosting steps are incorporated
     as an offset in penalized partial likelihood estimation, which is
     employed for obtain an update for one single parameter, i.e., one
     covariate, in every boosting step. This results in sparse fits
     similar to Lasso-like approaches, with many estimated coefficients
     being zero. The main model complexity parameter, which has to be
     selected (e.g. by cross-validation using 'cv.CoxBoost'), is the
     number of boosting steps 'stepno'. The penalty parameter 'penalty'
     can be chosen rather coarsely, either by hand or using
     'optimCoxBoostPenalty'.

     The advantage of the offset-based approach compared to gradient
     boosting is that the penalty structure is very flexible. In the
     present implementation this is used for allowing for unpenalized
     mandatory covariates, which receive a very fast coefficient
     build-up in the course of the boosting steps, while the other
     (optional) covariates are subjected to penalization. For example
     in a microarray setting, the (many) microarray features would be
     taken to be optional covariates, and the (few) potential clinical
     covariates would be taken to be mandatory, by including their
     indices in 'unpen.index'.

_V_a_l_u_e:

     'CoxBoost' returns an object of class 'CoxBoost'.  

    n, p: number of observations and number of covariates.

  stepno: number of boosting steps.

  xnames: vector of length 'p' containing the names of the covariates.
          This information is extracted from 'x' or names following the
          scheme 'V1, V2, ...'

coefficients: 'stepno * p' matrix containing the coefficient estimates
          for the (standardized) optional covariates for every boosting
          step.

meanx, sdx: vector of mean values and standard deviations used for
          standardizing the covariates.

unpen.index: indices of the mandatory covariates in the original
          covariate matrix 'x'.

    time: observed times given in the 'CoxBoost' call.

  status: censoring indicator given in the 'CoxBoost' call.

event.times: vector with event times from the data given in the
          'CoxBoost' call.

linear.predictors: 'stepno * n' matrix giving the linear predictor for
          every boosting step and every observation.

  Lambda: matrix with the Breslow estimate for the cumulative baseline
          hazard in every boosting step for every event time.

 logplik: partial log-likelihood of the fitted model in the final
          boosting step.

_A_u_t_h_o_r(_s):

     Written by Harald Binder binderh@fdm.uni-freiburg.de.

_R_e_f_e_r_e_n_c_e_s:

     Binder, H. and Schumacher, M. (2008). Allowing for mandatory
     covariates in boosting estimation of sparse high-dimensional
     survival models. BMC Bioinformatics. 9:14. Tutz, G. and Binder, H.
     (2007) Boosting ridge regression. Computational Statistics & Data
     Analysis, 51(12):6044-6059.

_S_e_e _A_l_s_o:

     'predict.CoxBoost', 'cv.CoxBoost'.

_E_x_a_m_p_l_e_s:

     #   Generate some survival data with 10 informative covariates 
     n <- 200; p <- 100
     beta <- c(rep(1,10),rep(0,p-10))
     x <- matrix(rnorm(n*p),n,p)
     real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta)))
     cens.time <- rexp(n,rate=1/10)
     status <- ifelse(real.time <= cens.time,1,0)
     obs.time <- ifelse(real.time <= cens.time,real.time,cens.time)

     #   Fit a Cox proportional hazards model by CoxBoost

     cbfit <- CoxBoost(time=obs.time,status=status,x=x,stepno=100,penalty=100) 
     summary(cbfit)

     #   ... with covariates 1 and 2 being mandatory

     cbfit.mand <- CoxBoost(time=obs.time,status=status,x=x,unpen.index=c(1,2),
                            stepno=100,penalty=100) 
     summary(cbfit.mand)

