3.2 Censoring and Truncation

When dealing with survival data, some observations are usually censored meaning they are related to spells which are not completely observed. Duration data can also suffer from a selection bias which is called truncation.

3.2.1 Censoring mechanisms

Left-censoring occurs when the event of interest occurs before the beginning of the observation period. For example, an individual is included in a study of unemployment duration at t0. At that time he has already been unemployed for a period but he cannot recall exactly the duration of this period. If we observe that he finds a job again at t1, we can only deduce that the duration of unemployment is bigger than t1t0, this individual is consequently left-censored. Observation 2 on figure 3.1 is associated with a left-censored spell (Liu 2019).

A spell is considered right-censored when it is observed from time t0 until a censoring time tc as illustrated by observation 4 on figure 3.1. For instance, the lifetime related to a customer who has not churned at the end of the observation period is right-censored. Let us note Xi the duration of a complete spell and Ci the duration of a right-censored spell. We also note Ti the duration actually observed and δi the censoring indicator such that δi=1 if the spell is censored. Then (t1,δ1),,(tN,δN) are the realizations of the following random variables:

Ti=min

3.2.2 Selection bias

Survival data suffers from a selection bias (or truncation) when only a sub-sample of the population of interest is studied. A customer entering the firm’s portfolio after the end of the study is said to be right-truncated, whereas a client who has left the portfolio before the beginning of the study is considered left-truncated. Mathematically, a random variable X is truncated by a subset A \in \mathbb{R}^+ if instead of \Omega(X), we solely observe \Omega(X)\bigcap A. On figure 3.1, the first and fifth observations suffers from a selection bias.

Figure 3.1: Censored and truncated data

Censored and truncated data

References

Liu, Weinan. 2019. “Inclusive Underwriting: The Case of Cardiovascular Risk Calculator.” PhD thesis, ENSAE ParisTech.