5.1 General Overview
The data set used in this study contains 29 variables and 7,032 customers from a telecom firm. For each client, the data includes:
Demographic information:
CustomerID
,City
,Zip_Code
,Latitude
,Longitude
,Gender
,Senior_Citizen
,Partner
andDependents
.Customer account information:
Tenure_Months
,Contract
,Paperless_Billing
,Payment_Method
,Monthly_Charges
,Total_Charges
,Churn_Label
,Churn_Value
,Churn_Score
,CLTV
,Churn_Reason
.Services information:
Phone_Service
,Multiple_Lines
,Internet_Service
,Online_Security
,Online_Backup
,Device_Protection
,Tech_Support
,Streaming_TV
,Streaming_Movies
.
CustomerID | City | Zip_Code | Latitude | Longitude | Gender | Senior_Citizen | Partner | Dependents | Tenure_Months | Phone_Service | Multiple_Lines | Internet_Service | Online_Security | Online_Backup | Device_Protection | Tech_Support | Streaming_TV | Streaming_Movies | Contract | Paperless_Billing | Payment_Method | Monthly_Charges | Total_Charges | Churn_Label | Churn_Value | Churn_Score | CLTV | Churn_Reason |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Los Angeles | 90003 | 33.96 | -118.27 | Male | No | No | No | 2 | Yes | Yes | DSL | Yes | Yes | No | No | No | No | Month-to-month | Yes | Mailed check | 53.85 | 108.15 | Yes | 1 | 86 | 3239 | Competitor made better offer |
2 | Los Angeles | 90005 | 34.06 | -118.31 | Female | No | No | Yes | 2 | Yes | Yes | Fiber optic | No | No | No | No | No | No | Month-to-month | Yes | Electronic check | 70.70 | 151.65 | Yes | 1 | 67 | 2701 | Moved |
3 | Los Angeles | 90006 | 34.05 | -118.29 | Female | No | No | Yes | 8 | Yes | Yes | Fiber optic | No | No | Yes | No | Yes | Yes | Month-to-month | Yes | Electronic check | 99.65 | 820.50 | Yes | 1 | 86 | 5372 | Moved |
4 | Los Angeles | 90010 | 34.06 | -118.32 | Female | No | Yes | Yes | 28 | Yes | Yes | Fiber optic | No | No | Yes | Yes | Yes | Yes | Month-to-month | Yes | Electronic check | 104.80 | 3046.05 | Yes | 1 | 84 | 5003 | Moved |
5 | Los Angeles | 90015 | 34.04 | -118.27 | Male | No | No | Yes | 49 | Yes | Yes | Fiber optic | No | Yes | Yes | No | Yes | Yes | Month-to-month | Yes | Bank transfer | 103.70 | 5036.30 | Yes | 1 | 89 | 5340 | Competitor had better devices |
6 | Los Angeles | 90020 | 34.07 | -118.31 | Female | No | Yes | No | 10 | Yes | Yes | DSL | No | No | Yes | Yes | No | No | Month-to-month | No | Credit card | 55.20 | 528.35 | Yes | 1 | 78 | 5925 | Competitor offered higher download speeds |
As shown by table 5.1, the Churn_Value
status variable indicates whether the customer left the firm’s portfolio within the last month and Tenure_Months
is the duration variable.
Since the purpose of our study relies in estimating the overall value of this fictional firm’s portfolio, two groups of target variables can be considered. On the one hand Churn_Value
and Tenure_Months
permit to determine whether a customer is active in the portfolio. They are used as response variables in the survival models. On the other hand, Monthly_Charges
variable indicates the price paid by customers each month and may be used to derive a customer raw value. Even though the CLTV
variable represents each customer’s value through measurement of customer lifetime value, we do not have any information on its calculation. Thus, it is not used in the model developed in the next chapter.