The customer, a UK based telecom major accepted orders (mobile handsets, Wi-Fi routers, sim cards, switches etc.) from multiple channels, namely website, app, call, sms etc.
Fraudsters often hack into customer accounts through various means and place orders on their behalf
When there is a bill shock at the end of the month, the customers refuse to pay for these orders which becomes a lost for the telecom service provider
There is a team that randomly takes a few samples from the list of all orders placed and check for possibility of fraud, but due to their bandwidth limitation they are able to go through only 1-1.5% of all the orders placed per day
Out of the orders reviewed by the team, the historic success rate (i.e. fraud identification rate is close to 24%, which means 76% of the times their efforts yield no result
There are a number of fraudulent orders that get through due to this reason (i.e. insufficient bandwidth from the team) and the customer was looking for a solution to improve the number of frauds identified per day
Objective
To improve throughput rate of the fraud identification team from existing 24% to at least 50% +
To identify more frauds per day which would improve the customer experience
To use data analytics to replace rule-based approach of selecting orders to be reviewed by the fraud classification team
DATA
The data that was made available contained information around customer demography, usage behavior, service subscription, credit rating, historical fraudulent orders identified, Ip information.
The team ensured complete masking of any individually identifiable information before working with the data
The Solution
The team changed the process to the following – all the orders placed would pass through a machine learning algorithm which would identify the propensity of any order to be fraudulent
The most likely orders to be fraud would then be passed on to the fraud identification team for a round of manual review
The team would follow the SOP to determine whether these orders are actually fraudulent
The fraudulent orders would then be cancelled/ appropriately dealt with while the legitimate orders would be let through
The optimum model (after multiple rounds of improvement) was found to be an ensemble of logistic regression, decision tree, xgboost, random forest and svm
The model was hosted as a pl file in a server while the input data was fed as a Json
The output from the model was shared back to the fraud identification team as another Json
The team would refer to the hourly fraud reports from the model and review the same manually to identify the actual fraudulent orders
KPI’S IMPACTED
Customer satisfaction score/ customer experience
Throughput rate for the fraud identification team
No. of frauds identified per day
Revenue leakage reduced
Human effort reduced
The Benefits
Improvement of throughput rate to 76% from 24%
200+ additional fraudulent cases identified per week, resulting in annual cost avoidance to the tunes of a minimum of GBP 450,000
Improvement in NPS score of more than 60 basis points