3.8 Performance metrics
3.8.1 Concordance index (C-index)
C-index is a goodness of fit measure for models which produce risk scores. It is commonly used to evaluate risk models in survival analysis, where data may be censored.
Consider both the observations and prediction values of two instances \((y_1; \hat{y}_1)\) and \((y_2; \hat{y}_2)\). \(y_i\) and \(\hat{y}_i\) represent respectively the actual observation time and the predicted time. Mathematically, the C-index is defined as the probability to well predict the order of event occurring time for any pair of instances.
\[\begin{equation} c = \mathbb{P}\big(\hat{y}_1 > \hat{y}_2 | y_1 > y_2\big) \tag{3.22} \end{equation}\]
Another way to write the C-index metric is to compute the ratio between concordant pairs and the total number of pairs. Consider individual \(i\) and let \(T\) be the time-to-event variable and \(\eta_i\) the risk score assigned to \(i\) by the model. We say that the pair \((i, j)\) is a concordant pair if \(\eta_i > \eta_j\) and \(T_i < T_j\), and it is a discordant pair if \(\eta_i > \eta_j\) and \(T_i > T_j\). If both \(T_i\) and \(T_j\) are censored, then this pair is not taken into account in the computation. If \(T_j\) is censored, then:
- If \(T_j < T_i\) the pair \((i, j)\) is not considered in the computation since the order cannot be determined.
- If \(T_j > T_i\), the order can be determined and \((i, j)\) is concordant if \(\eta_i > \eta_j\), discordant otherwise.
Equation (3.22) can then be rewritten as follows:
\[\begin{equation} \begin{aligned} c & = \frac{\# \text{concordant pairs}}{\# \text{concordant pairs} + \# \text{discordant pairs}} \\\\ c & = \frac{\sum_{i \neq j} \pmb{1}_{\eta_i < \eta_j} \pmb{1}_{T_i > T_j}d_j}{\sum_{i \neq j} \pmb{1}_{T_i > T_j}d_j} \end{aligned} \tag{3.23} \end{equation}\]
with \(d_j\) the event indicator variable.
The concordance index ranges between 0 and 1. A C-index below 0.5 indicates a very poor model. A C-index of 0.5 means that the model is rather a non-informative model making random predictions. A model with C-index 1 makes perfect prediction. Generally, a C-index higher than 0.7 indicates a good performance.
3.8.2 Brier score
The Brier score is another statistical metric for evaluating duration models’ performance and is defined as the mean squared error between the estimated survival probability and the observed survival at time \(t\):
\[\begin{equation} BS(t) = \frac{1}{N} \sum_{i=1}^{N} \Big(\pmb{1}_{\{t_i>t\}} - \hat{S}(t|\mathrm{x}_i) \Big)^2 \tag{3.24} \end{equation}\]
The Cox proportional hazards model is the most popular for the analysis of duration data. This model is said to be semi-parametric as it makes no assumption regarding the nature of the baseline hazard function \(\lambda_0(t)\). The parametric part only relies in the modelling of the effect of some covariates on the hazard function \(\lambda(t)\). The relationship between the vector of covariates and the log hazard is linear and the parameters can be estimated by maximizing the partial likelihood function. The Cox PH model solely assumes that predictors act multiplicatively on the hazard function. The model is formulated as in equation (3.15) with the exponential function as link between the hazard and the covariates i.e. \(\lambda(t|\pmb{\mathrm{x}}) = \lambda_0 (t,\alpha) \text{e}^{\pmb{\mathrm{x'}} \beta}\).