3.6 Semi-parametric estimation
3.6.1 Proportional Hazards models
Parametric models assume that the baseline (or raw) hazard follows a specific distribution. This assumption can be sometimes too restrictive and semi-parametric models can be more adapted to describe the duration data.
In proportional hazards (PH) models, the instantaneous risk function is proportional to the baseline hazard \(\lambda_0 (t,\alpha)\) modulo a scaling factor depending on the covariates \(\phi(\pmb{\mathrm{x}}, \beta)\). These models allow to generalize the basic survival models to a survival regression model which permits to take individuals’ heterogeneity into consideration (Harrell 1984). The general mathematical formulation is expressed as follows:
\[\begin{equation} \lambda(t|\pmb{\mathrm{x}}) = \lambda_0 (t,\alpha) \phi(\pmb{\mathrm{x}}, \beta) \tag{3.15} \end{equation}\]
Note that when the function form of \(\lambda_0 (t,\alpha)\) is known, we are in the case of parametric estimation. For instance, the exponential, Weibull and Gompertz models are PH models since their respective hazards are function of some covariates.
What does proportional hazards mean?
PH models are said to be proportional as the relative hazard ratio between two individuals \(i\) and \(k\) does not vary over time, such that:
\[\begin{equation} \frac{\lambda(t|\mathrm{x_i})}{\lambda(t|\mathrm{x_k})} = \frac{\phi(\mathrm{x_i}, \beta) }{\phi(\mathrm{x_k}, \beta)} \tag{3.16} \end{equation}\]
The formulation stated in equation (3.16) needs to be verified when one wants to fit a PH model to real-life data and is only valid in the case of time-constant covariates.
Marginal effects
In proportional hazards models, the marginal effect of covariate \(x_p\) on the hazard function can be easily derived since this computation only requires knowledge on \(\beta\). As shown in Cameron and Trivedi (2005), a one-unit increase in the \(p^{\text{th}}\) covariate leads to the following variation in the hazard function ceteris paribus:
\[\begin{equation} \frac{\partial \lambda(t|\pmb{\mathrm{x}}, \beta)}{\partial x_p} = \lambda(t|\pmb{\mathrm{x}}, \beta) \frac{\partial \phi(\pmb{\mathrm{x}}, \beta) / \partial x_p}{\phi(\pmb{\mathrm{x}}, \beta) } \tag{3.17} \end{equation}\]
Thus the new hazard after variation of the \(p^{\text{th}}\) covariate is the original hazard times the effect of \(x_p\) on the model’s regression part.
Partial likelihood estimation
The vector of parameters \(\beta\) related to the regression part of the PH model is estimated by partial likelihood maximization. The method’s principle consists in only estimating the regression’s parameters \(\beta\) by considering the baseline hazard \(\lambda_0\) as noise. If desired an estimate of the baseline hazard can be recovered after estimation of \(\beta\) using, for instance, the Nelson-Aalen estimator (see part 3.4). Cox’s intuition is that no information can be retrieved from the intervals during which no event has occurred and that it is conceivable that \(\lambda_0\) is null in these intervals. Thus, solely the set of moments when an event occurs are considered in the estimation method.
In order to derive the partial likelihood function, let us note \(t_j\) the \(j^{\text{th}}\) discrete failure time in an \(N\)-sample with \(j \in [\![1; k]\!]\), such that:
- \(t_1 < t_2 < \dots < t_k\),
- \(D(t_j) = \{l: t_l = t_j\}\) is the set of spells completed at \(t_j\) with \(\#D(t_j) = d_j\),
- \(R(t_j) = \{l: t_l \geq t_j\}\) is the set of spells at risk at \(t_j\).
The contribution of a spell in \(D(t_j)\) to the likelihood function equals the conditional probability that the spell ends at \(t_j\) given it is exposed at that specific time and can be written as (see Cameron and Trivedi (2005) and proof (6.5) for more details):
\[\begin{equation} \mathbb{P}\big[T_j = t_j | R(t_j) \big] = \frac{\phi(\mathrm{x_j}, \beta)}{\sum_{l \in R(t_j)} \phi(\mathrm{x_l}, \beta)} \tag{3.18} \end{equation}\]
Given \(k\) discrete failure times are considered and that for each of those there is a set \(D(t_j)\) of completed spells, Cox defines the partial likelihood function as the joint product of the probability expressed in (3.18), such that:
\[\begin{equation} \mathcal{L}_p = \Pi_{j=1}^{k} \ \frac{\Pi_{m \in D(tj)} \ \phi(\mathrm{x_j}, \beta)}{\Big[\sum_{l \in R(t_j)} \phi(\mathrm{x_l}, \beta)\Big]^{d_j}} \tag{3.19} \end{equation}\]
The latter formulation of the partial likelihood function is explained in more details in proofs (6.6) and (6.7) in the appendix.
3.6.2 Cox PH model
The Cox proportional hazards model is the most popular for the analysis of duration data. This model is said to be semi-parametric as it makes no assumption regarding the nature of the baseline hazard function \(\lambda_0(t)\). The parametric part only relies in the modelling of the effect of some covariates on the hazard function \(\lambda(t)\). The relationship between the vector of covariates and the log hazard is linear and the parameters can be estimated by maximizing the partial likelihood function. The Cox PH model solely assumes that predictors act multiplicatively on the hazard function. The model is formulated as in equation (3.15) with the exponential function as link between the hazard and the covariates i.e. \(\lambda(t|\pmb{\mathrm{x}}) = \lambda_0 (t,\alpha) \text{e}^{\pmb{\mathrm{x'}} \beta}\).