# Difference between revisions of "Propensity Modelling"

Jump to navigation
Jump to search

Line 9: | Line 9: | ||

* Propensity Score Weighting (PSW) | * Propensity Score Weighting (PSW) | ||

− | + | ==Algorithm for Propensity Scoring== | |

+ | The usual algorithm for propensity score computation will include the following steps: | ||

+ | * Select variables to use as features (e.g. gender, income, neighbourhood, age, nationality). These variables should be selected dependent on your underlying understanding of what independent factors might affect certain customer behaviour (e.g. buying vs. not buying). | ||

+ | * Build a model and prepare data. The next step is to create a probabilistic model based on logistic regression and prepared features that will predict whether a given subject chooses a certain behaviour. The model should be trained using a dataset of people with a set of covariates and behaviours you are looking for. | ||

+ | * Calculate propensity scores for new data. After the logistic regression step we have created an optimized and trained model, that can now be used to calculate the propensity score for new data (e.g. potential consumers). | ||

+ | * Using the model for causal inference. The created model can then be used for causal inference. For example, to understand the differences and similarities between users and non-users of a product, we can create several buckets which cover subjects with the same propensity score. For example, we can have one bucket for subjects with 0.0-0.1 propensity score, the second bucket for users with 0.1-0.2 propensity and so on. Since propensity score is a balancing score we can then compare users and non-users in each bucket. Since both users and non-users in those buckets have the same propensity score it allows for the controlling of confounding variables and the inferring of actual causal relationships. | ||

[[Category: Data Science Application]] | [[Category: Data Science Application]] |

## Latest revision as of 18:07, 14 February 2019

A propensity model is a statistical scorecard that is used to predict the behaviour of your customer or prospect base. Propensity models are often used to identify those most likely to respond to an offer, or to focus retention activity on those most likely to churn

Propensity modelling includes several approaches and techniques with the following techniques:

- Propensity Score Matching (PSM)
- Propensity Score Stratification (PSS)
- Propensity Score Weighting (PSW)

## Algorithm for Propensity Scoring

The usual algorithm for propensity score computation will include the following steps:

- Select variables to use as features (e.g. gender, income, neighbourhood, age, nationality). These variables should be selected dependent on your underlying understanding of what independent factors might affect certain customer behaviour (e.g. buying vs. not buying).
- Build a model and prepare data. The next step is to create a probabilistic model based on logistic regression and prepared features that will predict whether a given subject chooses a certain behaviour. The model should be trained using a dataset of people with a set of covariates and behaviours you are looking for.
- Calculate propensity scores for new data. After the logistic regression step we have created an optimized and trained model, that can now be used to calculate the propensity score for new data (e.g. potential consumers).
- Using the model for causal inference. The created model can then be used for causal inference. For example, to understand the differences and similarities between users and non-users of a product, we can create several buckets which cover subjects with the same propensity score. For example, we can have one bucket for subjects with 0.0-0.1 propensity score, the second bucket for users with 0.1-0.2 propensity and so on. Since propensity score is a balancing score we can then compare users and non-users in each bucket. Since both users and non-users in those buckets have the same propensity score it allows for the controlling of confounding variables and the inferring of actual causal relationships.