This article explores two additional aspects that data scientists and data science teams often need to address during the scorecard development process: segmentation and reject inference (RI).
How many scorecards? What are the criteria? What is best practice? These are common questions data scientists try to answer early in the scorecard development process, starting with the process of identifying and justifying the number of scorecards – known as segmentation.
Figure 1. Scorecard segmentation
The initial segmentation pre-assessment is carried out during the business insights analysis. At this stage, data scientists should inform the business about any identified heterogeneous population segments that might have different characteristics that are impossible to treat as a single group; this facilitates an early business decision about accepting multiple scorecards.
The business drivers for segmentation are:
- Marketing: Such as product offerings or new markets
- Different treatment of disparate customer groups: Grouping customers based on demographic information, for example
- Data availability: Meaning different data might be available through different marketing channels, or some customer groups might not have credit history, for example
The statistical drivers for segmentation assume there are a sufficient number of observations in each segment – including “good” and “bad” accounts – and each segment contains interaction effects where predictive patterns vary across segments.
Typically, the segmentation process includes the following steps:
- Identify a simple segmentation schema using supervised or unsupervised segmentation.
- For supervised segmentation, data scientists often use a decision tree to identify the potential segments and capture interaction effects. Alternatively, teams can use residuals from an ensemble model to detect data interactions.
- Unsupervised data segmentation, like clustering, can be used to create the segments, but this doesn’t necessarily capture interaction effects.
- Identify a set of candidate predictors for each of the segments
- Build a separate model per segment
- Test the following aspects:
- If the segmented models have different predictive patterns. Failing to identify new predictive characteristics across segments tells the data scientist they should search for a better segmentation split or build a single model.
- If the segmented models have similar predictive patterns, but with significantly different magnitudes or opposing effects across segments.
- If the segmented models produce superior lift in predictive power, comparing it to a single model built on the entire population.
Segmentation is an iterative process that requires constant judgement to determine whether to use single or multiple segments. Generally speaking, segmentation rarely results in a significant lift and data scientists should make every effort to produce a single scorecard. The common methods used to avoid segmentation include adding additional variables in the logistic regression to capture interaction effects and/or identifying the most predictive variables per segment and combining them into a single model.
Separate scorecards are usually built independently. However, if the reliability of model factors is an issue, a parent/child model may offer an alternative approach. In this approach, data scientists develop a “parent” model on the common characteristics and use the model output as a predictor into its “child” models to supplement unique characteristics across child segments.
The primary aim of multiple scorecards is to improve the quality of risk assessment compared to that of a single scorecard. Segmented scorecards should only be used if they offer significant value to the business that outweighs the drawbacks involved, including higher development and implementation cost, the complexity inherent in the decision management process, the need to manage additional scorecards, and the greater strain on IT resources.
Reject Inference (RI)
Application scorecards have naturally occurring selection bias if the modeling is based solely on the accepted population with known performance. That means there is a significant group of rejected customers excluded from the modeling process because of their unknown performances. To address the selection bias, application scorecard models should include both populations. This means that unknown performance of the rejections needs to be inferred, which is completed using the reject inference (RI) method.
Figure 2. Accepted and rejected populations
The real question is, With or without reject inference? In this debate, there are two schools of thought:
- Without: Those who think RI is a vicious circle where inferred performance of the rejections would be based on the approved (but biased) population, which would lead to unreliable reject inference.
- With: Those who think RI methodology is a valuable approach that benefits model performance.
There are a few extra steps required during the scorecard development when using RI:
- Build a logistic regression model on the accepts – this is the base_logit_model
- Infer the rejections using a reject inference technique
- Combine the accepts and the inferred rejects into a single dataset (complete_population)
- Build a new logistic regression model on complete_population – this is the final_logit_model
- Validate the final_logit_model
- Create a scorecard model based on the final_logit_model
Figure 3. Scorecard development using reject inference
Reject inference is a form of missing values treatment where the outcomes are “missing not at random” (MNAR), resulting in significant differences between accepted and rejected populations. There are two broad approaches used to infer the missing performance: assignment and augmentation, each complete with their own set of techniques. The most popular techniques within the two approaches are proportional assignment, simple and fuzzy augmentation, and parceling.
|Assignment techniques||Augmentation techniques|
|Ignore rejects, do not use RI||Simple augmentation|
|Assign “bad” status to all rejects||Fuzzy augmentation|
|Proportional assignment||Case-based inference|
Table 1. Reject inference techniques
Proportional assignment: The random partitioning of the rejects into “good” and “bad” accounts with a “bad” rate two to five times greater than in the accepted population.
Simple augmentation: Assumes scoring the rejects using the base_logit_model and partitioning it into “good” and “bad” accounts based on a cut-off value. The cut-off value is selected so the “bad” rate in the rejects is two to five times greater than in the accepts.
Fuzzy augmentation: Assumes scoring of the rejects using the base_logit_model. Each record is effectively duplicated containing weighted “bad” and weighted “good” components, both derived from the rejects’ scores. Those weights, along with the weights equal to “1” for all the accepts, are used in the final_logit_model. A “bad” rate in the rejects of two to five times greater than in the accepts would be the recommended strategy.
Parceling: A hybrid method encompassing simple augmentation and proportional assignment. Parcels are created by binning the rejects’ scores, generated using the base_logit_model, into the score bands. Proportional assignment is applied on each parcel with a “bad” rate two to five times greater than the “bad” rate in the equivalent score band of the accepted population.
Figure 4. Proportional assignment
Figure 5. Simple augmentation
Figure 6. Fuzzy augmentation
Figure 7. Parceling
Credit scoring is a dynamic, flexible, and powerful tool for lenders, but there are plenty of ins and outs that are worth covering in detail. To learn more about credit scoring and credit risk mitigation techniques, read the next installment of our credit scoring series, Part Seven: Additional Credit Risk Modeling Considerations.
And click here to learn more about Altair’s credit scoring, credit risk, and financial services solutions.
Read prior Credit Scoring Series installments: