Apply DA Tag

To Win with Machine Learning, It Isn’t What You Do; It...

If you’re a credit risk manager or a data scientist responsible for modeling consumer credit risk at a lender, a fintech, a telecommunications company or even a utility company you’re certainly exploring how machine learning (ML) will make you even more successful with predictive analytics. You know your competition is looking beyond the algorithms that have long been used to predict consumer payment behavior: algorithms with names like regression, decision trees and cluster analysis. Perhaps you’re experimenting with or even building a few models with artificial intelligence (AI) algorithms that may be less familiar to your business: neural networks, support vector machines, gradient boosting machines or random forests. One recent survey found that 25 percent of financial services companies are ahead of the industry; they’re already implementing or scaling up adoption of advanced analytics and ML. My alma mater, the Virginia Cavaliers, recently won the 2019 NCAA national championship in nail-biting overtime. With the utmost respect to Coach Tony Bennett, this victory got me thinking more about John Wooden, perhaps the greatest college coach ever. In his book Coach Wooden and Me, Kareem Abdul-Jabbar recalled starting at UCLA in 1965 with what was probably the greatest freshman team in the history of basketball. What was their new coach’s secret as he transformed UCLA into the best college basketball program in the country? I can only imagine their surprise at the first practice when the coach told them, “Today we are going to learn how to put on our sneakers and socks correctly. … Wrinkles cause blisters. Blisters force players to sit on the sideline. And players sitting on the sideline lose games.” What’s that got to do with machine learning? Simply put, the financial services companies ready to move beyond the exploration stage with AI are those that have mastered the tasks that come before and after modeling with the new algorithms. Any ML library — whether it’s TensorFlow, PyTorch, extreme gradient boosting or your company’s in-house library — simply enables a computer to spot patterns in training data that can be generalized for new customers. To win in the ML game, the team and the process are more important than the algorithm. If you’ve assembled the wrong stakeholders, if your project is poorly defined or if you’ve got the wrong training data, you may as well be sitting on the sideline. Consider these important best practices before modeling: Careful project planning is a prerequisite — Assemble all the key project stakeholders, and insist they reach a consensus on specific and measurable project objectives. When during the project life cycle will the model be used? A wealth of new data sources are available. Which data sources and attributes are appropriate candidates for use in the modeling project? Does the final model need to be explainable, or is a black box good enough? If the model will be used to make real-time decisions, what data will be available at runtime? Good ML consultants (like those at Experian) use their experience to help their clients carefully define the model development parameters. Data collection and data preparation are incredibly important — Explore the data to determine not only how important and appropriate each candidate attribute is for your project, but also how you’ll handle missing or corrupt data during training and implementation. Carefully select the training and validation data samples and the performance definition. Any biases in the training data will be reflected in the patterns the algorithm learns and therefore in your future business decisions. When ML is used to build a credit scoring model for loan originations, a common source of bias is the difference between the application population and the population of booked accounts. ML experts from outside the credit risk industry may need to work with specialists to appreciate the variety of reject inference techniques available. Segmentation analysis — In most cases, more than one ML model needs to be built, because different segments of your population perform differently. The segmentation needs to be done in a way that makes sense — both statistically and from a business perspective. Intriguingly, some credit modeling experts have had success using an AI library to inform segmentation and then a more tried-and-true method, such as regression, to develop the actual models. During modeling: With a good plan and well-designed data sets, the modeling project has a very good chance of succeeding. But no automated tool can make the tough decisions that can make or break whether the model is suitable for use in your business — such as trade-offs between the ML model’s accuracy and its simplicity and transparency. Engaged leadership is important. After modeling: Model validation — Your project team should be sure the analysts and consultants appreciate and mitigate the risk of over fitting the model parameters to the training data set. Validate that any ML model is stable. Test it with samples from a different group of customers — preferably a different time period from which the training sample was taken. Documentation — AI models can have important impacts on people’s lives. In our industry, they determine whether someone gets a loan, a credit line increase or an unpleasant loss mitigation experience. Good model governance practice insists that a lender won’t make decisions based on an unexplained black box. In a globally transparent model, good documentation thoroughly explains the data sources and attributes and how the model considers those inputs. With a locally transparent model, you can further explain how a decision is reached for any specific individual — for example, by providing FCRA-compliant adverse action reasons. Model implementation — Plan ahead. How will your ML model be put into production? Will it be recoded into a new computer language, or can it be imported into one of your systems using a format such as the Predictive Model Markup Language (PMML)? How will you test that it works as designed? Post-implementation — Just as with an old-fashioned regression model, it’s important to monitor both the usage and the performance of the ML model. Your governance team should check periodically that the model is being used as it was intended. Audit the model periodically to know whether changing internal and external factors — which might range from a change in data definition to a new customer population to a shift in the economic environment — might impact the model’s strength and predictive power. Coach Wooden used to say, “It isn’t what you do. It’s how you do it.” Just like his players, the most successful ML practitioners understand that a process based on best practices is as important as the “game” itself.

Published: April 24, 2019 by Jim Bander

Security and Convenience: No Longer a Balancing Act

For most businesses, building the best online experience for consumers requires a balance between security and convenience. But the challenge has always been finding a happy medium between the two – offering enough security that won’t get in the way of convenience and vice versa. In the past, it was always believed that one would always come at the expense of the other. But technology and innovation is changing how businesses approach security and is allowing them to give the maximum potential of both. Consumers want security AND convenience Consumers consider security and convenience as the foundation of their online experience. Findings from our 2019 Global Identity and Fraud Report revealed approximately 74 percent of consumers ranked security as the most important part of their online experience, followed by convenience. In other words, they expect businesses to provide them with both. We see this with how consumers are typically using the same security information each time they open a new digital account – out of convenience. But if one account is compromised, the consumer becomes vulnerable to possible fraudulent activity. With today’s technology, businesses can give consumers an easier and more secure way to access their digital accounts. Creating the optimal online experience More security usually meant creating more passwords, answering more security questions, completing CAPTCHA tests, etc. While consumers are willing to work through these friction-inducing methods to complete a transaction or access an account, it’s not always the most convenient process. Advanced data and technology has opened doors for new authentication methods, such as physical and behavioral biometrics, digital tokenization, device intelligence and machine learning, to maximize the potential for businesses to provide the best online experience possible. In fact, consumers have expressed greater confidence in businesses that implement these advanced security methods. Rates of consumer confidence in passwords was only 44 percent, compared to a 74 percent rate of consumer confidence in physical biometrics. Consumers are willing to embrace the latest security technology because it provides the security and convenience they want from businesses. While traditional forms of security were sufficient, advanced authentication methods have proven to be more reliable forms of security that consumers trust and can improve their online experience. The optimal online experience is a balance between security and convenience. Innovative technologies and data are helping businesses protect people’s identities and provide consumers with an improved online experience.  

Published: April 1, 2019 by Chris Ryan

Breach-Fueled Credential Stuffing, the “Philly Special...

Be warned. I’m a Philadelphia sports fan, and even after 13 months, I still relish in the only Super Bowl victory I’ve ever known as a fan. Having spent more than two decades in fraud prevention, I find that Super Bowl LII is coalescing in my mind with fraud prevention and lessons in defense more and more. Let me explain: It’s fourth-down-and-goal from the one-yard line. With less than a minute on the clock in the first half, the Eagles lead, 15 to 12. The easy option is to kick the field goal, take the three points and come back with a six-point advantage. Instead of sending out the kicking squad, the Eagles offense stays on the field to go for a touchdown. Broadcaster Cris Collingsworth memorably says, “Are they really going to go for this? You have to take the three!” On the other side are the New England Patriots, winners of two of the last three Super Bowls. Love them or hate them, the Patriots under coach Bill Belichick are more likely than any team in league history to prevent the Eagles from scoring at this moment. After the offense sets up, quarterback Nick Foles walks away from his position in the backfield to shout instructions to his offensive line. The Patriots are licking their chops. The play starts, and the ball is snapped — not to Foles as everyone expects, but to running back Corey Clement. Clement takes two steps to his left and tosses the ball the tight end Trey Burton, who’s running in the opposite direction. Meanwhile, Foles pauses as if he’s not part of the play, then trots lazily toward the end zone. Burton lobs a pass over pursuing defenders into Foles’ outstretched hands. This is the “Philly Special” — touchdown! Let me break this down: A third-string rookie running back takes the snap, makes a perfect toss — on the run — to an undrafted tight end. The tight end, who hasn’t thrown a pass in a game since college, then throws a touchdown pass to a backup quarterback who hasn’t caught a ball in any athletic event since he played basketball in high school. A play that has never been run by the Eagles, led by a coach who was criticized as the worst in pro football just a year before, is perfectly executed under the biggest spotlight against the most dominant team in NFL history. So what does this have to do with fraud? There’s currently an outbreak of breach-fueled credential stuffing. In the past couple of months, billions of usernames and passwords stolen in various high-profile data breaches have been compiled and made available to criminals in data sets described as “Collections 1 through 5.” Criminals acquire credentials in large numbers and attack websites by attempting to login with each set — effectively “stuffing” the server with login requests. Based on consumer propensity to reuse login credentials, the criminals succeed and get access to a customer account between 1 in 1,000 and 1 in 50 attempts. Using readily available tools, basic information like IP address and browser version are easy enough to alter/conceal making the attack harder to detect. Credential stuffing is like the Philly Special: Credential stuffing doesn’t require a group of elite all-stars. Like the Eagles’ players with relatively little experience executing their roles in the Philly Special, criminals with some computer skills, some initiative and the guts to try credential stuffing can score. The best-prepared defense isn’t always enough. The Patriots surely did their homework. They set up their defense to stop what they expected the Eagles to do based on extensive research. They knew the threats posed by every Eagle on the field. They knew what the Eagles’ coaches had done in similar circumstances throughout their careers. The defense wasn’t guessing. They were as prepared as they could have been. It’s the second point that worries me when I think of credential stuffing. Consumers reuse online credentials with alarming frequency, so a stolen set of credentials is likely to work across multiple organizations, possibly even yours. On top of that, traditional device recognition like cookies can’t identify and stop today’s sophisticated fraudsters. The best-prepared organizations feel great about their ability to stop the threats they’re aware of. Once they’ve seen a scheme, they make investments, improve their defenses, and position their players to recognize a risk and stop it. Sometimes past expertise won’t stop the play you can’t see coming.  

Published: March 28, 2019 by Chris Ryan

A faster, Better and Cheaper Way for Government Agenci...

Whenever someone checks in for a flight, airport security needs to establish their identity. Prior to boarding the plane, passengers are required to show a government-issued ID. Agents check IDs for validity and compare the ID picture to the face of the person standing in front of them. This identity proofing is about making sure that would-be flyers really are who they claim to be. But what about online identity proofing? That’s much more challenging. Online banks certainly want to make sure they know a person’s identity before giving them access to their account. But for other online services, it’s fine to remain anonymous. The amount of risk involved in the engagement directly ties to the amount of verification and assurance needed for the individual. Government agencies care very much about identity. They won’t — and shouldn’t — issue a tax refund, provide a driver’s license or allow someone to sign up for Social Security benefits before they’re certain that the claimant’s identity is verified. Since we increasingly expect the same online user experience from government service providers as from online banks, hotel websites and retailers, this poses a challenge. How do government agencies establish a sufficient level of assurance for an online identity without sending their customers to a government office for face-to-face identity verification? To answer this challenge, the National Institute of Standards and Technology (NIST) has developed Digital Identity Guidelines. In its latest publication, SP 800-63-3, NIST helps government agencies implement their digital services while still mitigating the identity risks that come with online service provision. The ability to safely sign up, transact and interact with a government agency online has many benefits. Applying for something like unemployment insurance online is faster, cheaper and more convenient than using paper and waiting in line at a government field office. And for government agencies themselves, providing online services means that they can improve customer satisfaction levels while reducing their costs and subsequent bureaucracy. CrossCore®, was recently recognized by the independent Kantara Initiative for its conformance with NIST’s Digital Identity Guidelines for Identity Assurance (IAL2). Our document verification solution combines authoritative sources, machine learning and facial recognition technology to identify people accurately using photo-based government identification like a driver’s license or passport. The best part? Users can verify their identity in about 60 seconds, at whatever location they prefer, using their personal smartphone.

Published: March 7, 2019 by Guest Contributor

Fighting Fraud in a Fintech World

How can fintech companies ensure they’re one step ahead of fraudsters? Kathleen Peters discusses how fintechs can prepare for success in fraud prevention.

Published: February 8, 2019 by Brittany Peterson

Offshore vs Onshore: A Head-to-Head Comparison of Data...

With scarce resources and limited experience available in the data science field, a majority of organizations are partnering with outside firms to fill gaps within their teams. A report compiled by Hexa Research found that the data analytics outsourcing market is set to expand at a compound annual growth rate of 30 percent between 2016 and 2024, reaching annual revenues of more than $6 billion. With data science becoming a necessity for success, outsourcing these specific skills will be the way of the future. When working with outside firms, you may be given the option between offshore and onshore resources. But how do you decide? Let’s discuss a few things you can consider. Offshore A well-known benefit of using offshore resources is lower cost. Offshore resources provide a larger pool of talent, which includes those who have specific analytical skills that are becoming rare in North America. By partnering with outside firms, you also expose your organization to global best practices by learning from external resources who have worked in different industries and locations. If a partner is investing research and development dollars into specific data science technology or new analytics innovations, you can use this knowledge and apply it to your business. With every benefit, however, there are challenges. Time zone differences and language barriers are things to consider if you’re working on a project that requires a large amount of collaboration with your existing team. Security issues need to be addressed differently when using offshore resources. Lastly, reputational risk also can be a concern for your organization. In certain cases, there may be a negative perception — both internally and externally — of moving jobs offshore, so it’s important to consider this before deciding. Onshore While offshore resources can save your organization money, there are many benefits to hiring onshore analytical resources. Many large projects require cross-functional collaboration. If collaboration is key to the projects you’re managing, onshore resources can more easily blend with your existing resources because of time zone similarities, reduced communication barriers and stronger cultural fit into your organization. In the financial services industry, there also are regulatory guidelines to consider. Offshore resources often may have the skills you’re looking for but don’t have a complete understanding of our regulatory landscape, which can lead to larger problems in the future. Hiring resources with this type of knowledge will help you conduct the analysis in a compliant manner and reduce your overall risk. All of the above Many of our clients — and we ourselves — find that an all-of-the-above approach is both effective and efficient. In certain situations, some timeline reductions can be made by having both onshore and offshore resources working on a project. Teams can include up to three different groups: Local resources who are closest to the client and the problem Resources in a nearby foreign country whose time zone overlaps with that of the local resources More analytical team members around the world whose tasks are accomplished somewhat more independently Carefully focusing on how the partnership works and how the external resources are managed is even more important than where they are located. Read 5 Secrets to Outsourcing Data Science Successfully to help you manage your relationship with your external partner. If your next project calls for experienced data scientists, Experian® can help. Our Analytics on DemandTM service provides senior-level analysts, either offshore  or onshore, who can help with analytical data science and modeling work for your organization.

Published: January 14, 2019 by Guest Contributor

It’s the Most Wonderful Time of the Year (for Fraud)

It’s the holiday season — time for jingle bells, lighting candles, shopping sprees and credit card fraud. But we’re prepared. Our risk analyst team constantly monitors our FraudNet solution performance to identify anomalies our clients experience as millions of transactions occur this month. At its core, FraudNet analyzes incoming events to determine the risk level and to allow legitimate events to process without causing frustrating friction for legitimate customers. That ensures our clients can recognize good customers across digital devices and channels while reducing fraud attacks and the need for internal manual reviews. But what happens when things don’t go as planned? Here’s a recent example. One of our banking clients noticed an abnormally high investigation queue after a routine risk engine tuning. Our risk analyst team looked further into the attacks to determine the cause and assess whether it was a tuning issue or a true fraud attack. After an initial analysis, the team learned that the events shared many of the same characteristics: Came from the same geo location that has been seen in previous attacks on clients Showed suspicious device and browser characteristics that were recognized by Experian’s device identification technology Identified suspicious patterns that have been observed in other recent attacks on banks The conclusion was that it wasn’t a mistake. FraudNet had correctly identified these transactions as suspicious. Experian® then worked with our client and recommended a strategy to ensure this attack was appropriately managed. This example highlights the power of device identification technology as a mechanism to detect emerging fraud threats, as well as link analysis tools and the expertise of a highly trained fraud analyst to uncover suspicious events that might otherwise go unnoticed. In addition to proprietary device intelligence capabilities, our clients take advantage of a suite of capabilities that can further enhance a seamless authentication experience for legitimate customers while increasing fraud detection for bad actors. Using advanced analytics, we can detect patterns and anomalies that may indicate a fraudulent identity is being used. Additionally, through our CrossCore® platform businesses can leverage advanced innovation, such as physical and behavioral biometrics (facial recognition, how a person holds a phone, mouse movements, data entry style), email verification (email tenure, reported fraud on email identities), document verification (autofill, liveliness detection) and digital behavior risk indicators (transaction behavior, transaction velocity), to further advance their existing risk mitigation strategies and efficacy.   With expanding partnerships and capabilities offered via Experian’s CrossCore platform, in conjunction with consultative industry expertise, businesses can be more confident during the authentication process to ensure a superb, frictionless customer experience without compromising security.

Published: December 4, 2018 by Guest Contributor

Beyond Basic Data Sampling for Model Development

Your model is only as good as your data, right? Actually, there are many considerations in developing a sound model, one of which is data. Yet if your data is bad or dirty or doesn’t represent the full population, can it be used? This is where sampling can help. When done right, sampling can lower your cost to obtain data needed for model development. When done well, sampling can turn a tainted and underrepresented data set into a sound and viable model development sample. First, define the population to which the model will be applied once it’s finalized and implemented. Determine what data is available and what population segments must be represented within the sampled data. The more variability in internal factors — such as changes in marketing campaigns, risk strategies and product launches — and external factors — such as economic conditions or competitor presence in the marketplace — the larger the sample size needed. A model developer often will need to sample over time to incorporate seasonal fluctuations in the development sample. The most robust samples are pulled from data that best represents the full population to which the model will be applied. It’s important to ensure your data sample includes customers or prospects declined by the prior model and strategy, as well as approved but nonactivated accounts. This ensures full representation of the population to which your model will be applied. Also, consider the number of predictors or independent variables that will be evaluated during model development, and increase your sample size accordingly. When it comes to spotting dirty or unacceptable data, the golden rule is know your data and know your target population. Spend time evaluating your intended population and group profiles across several important business metrics. Don’t underestimate the time needed to complete a thorough evaluation. Next, select the data from the population to aptly represent the population within the sampled data. Determine the best sampling methodology that will support the model development and business objectives. Sampling generates a smaller data set for use in model development, allowing the developer to build models more quickly. Reducing the data set’s size decreases the time needed for model computation and saves storage space without losing predictive performance. Once the data is selected, weights are applied so that each record appropriately represents the full population to which the model will be applied. Several traditional techniques can be used to sample data: Simple random sampling — Each record is chosen by chance, and each record in the population has an equal chance of being selected. Random sampling with replacement — Each record chosen by chance is included in the subsequent selection. Random sampling without replacement — Each record chosen by chance is removed from subsequent selections. Cluster sampling — Records from the population are sampled in groups, such as region, over different time periods. Stratified random sampling — This technique allows you to sample different segments of the population at different proportions. In some situations, stratified random sampling is helpful in selecting segments of the population that aren’t as prevalent as other segments but are equally vital within the model development sample. Learn more about how Experian Decision Analytics can help you with your custom model development needs.

Published: November 7, 2018 by Guest Contributor

Protecting People Fuels Experian’s Commitment to Ident...

As our society becomes ever more dependent on everything mobile, criminals are continually searching for and exploiting weaknesses in the digital ecosystem, causing significant harm to consumers, businesses and the economy. In fact, according to our 2018 Global Fraud & Identity Report, 72 percent of business executives are more concerned than ever about the impact of fraud. Yet, despite the awareness and concern, 54 percent of businesses are only “somewhat confident” in their ability to detect fraud. That needs to change, and it needs to change right away. Our industry has thrived by providing products and services that root out bad transactions and detect fraud with minimal consumer friction. We continue to innovate new ways to authenticate consumers, apply new cloud technologies, machine learning, self-service portals and biometrics. Yet, the fraud issue still exists. It hasn’t gone away. How do we provide effective means to prevent fraud without inconveniencing everyone in the process? That’s the conundrum. Unfortunately, a silver bullet doesn’t exist. As much as we would like to build a system that can detect all fraud, eliminate all consumer friction, we can’t. We’re not there yet. As long as money has changed hands, as long as there are opportunities to steal, criminals will find the weak points – the soft spots. That said, we are making significant progress. Advances in technology and innovation help us bring new solutions to market more quickly, with more predictive power than ever, and the ability to help clients to turn these services on in days and weeks. So, what is Experian doing? We’ve been in the business of fraud detection and identity verification for more than 30 years. We’ve seen fraud patterns evolve over time, and our product portfolio evolves in lock-step to counter the newest fraud vectors. Synthetic identity fraud, loan stacking, counterfeit, identity theft; the specific fraud attacks may change but our solution stack counters each of those threats. We are on a continuous innovation path, and we need to be. Our consumer and small business databases are unmatched in the industry for quality and coverage, and that is an invaluable asset in the fight against fraud. It used to be that knowing something about a person was the same as authenticating that same person. That’s just not the case today. But, just because I may not be the only person who knows where I live, doesn’t mean that identity information is obsolete. It is incredibly valuable, just in different ways today. And that’s where our scientists come into their own, providing complex predictive solutions that utilize a plethora of data and insight to create the ultimate in predictive performance. We go beyond traditional fraud detection methods, such as knowledge-based authentication, to offer a custom mix of passive and active authentication solutions that improve security and the customer experience. You want the latest deep learning techniques? We have them. You want custom models scored in milliseconds alongside your existing data requests. We can do that. You want a mix of cloud deployment, dedicated hosted services and on-premise? We can do that too. We have more than 20 partners across the globe, creating the most comprehensive identity management network anywhere. We also have teams of experts across the world with the know how to combine Experian and partner expertise to craft a bespoke solution that is unrivaled in detection performance. The results speak for themselves: Experian analyzes more than a billion credit applications per year for fraud and identity, and we’ve helped our clients save more than $2 billion in annual fraud losses globally. CrossCore™, our fraud prevention and identity management platform, leverages the full breadth of Experian data as well as the data assets of our partners. We execute machine learning models on every decision to help improve the accuracy and speed with which decisions are made. We’ve seen CrossCore machine learning result in a more than 40 percent improvement in fraud detection compared to rules-based systems. Our certified partner community for CrossCore includes only the most reputable leaders in the fraud industry. We also understand the need to expand our data to cover those who may not be credit active. We have the largest and most unique sets of alternative credit data among the credit bureaus, that includes our Clarity Services and RentBureau divisions. This rich data helps our clients verify an individual’s identity, even if they have a thin credit file. The data also helps us determine a credit applicant’s ability to pay, so that consumers are empowered to pursue the opportunities that are right for them. And in the background, our models are constantly checking for signs of fraud, so that consumers and clients feel protected. Fraud prevention and identity management are built upon a foundation of trust, innovation and keeping the consumer at the heart of every decision. This is where I’m proud to say that Experian stands apart. We realize that criminals will continue to look for new ways to commit fraud, and we are continually striving to stay one step ahead of them. Through our unparalleled scale of data, partnerships and commitment to innovation, we will help businesses become more confident in their ability to recognize good people and transactions, provide great experiences, and protect against fraud.

Published: November 6, 2018 by Steve Platt

The Rise of Synthetic Identity Fraud and Children

Synthetic identities come from accounts held not by actual individuals, but by fabricated identities created to perpetrate fraud. It often starts with stealing a child’s Social Security number (SSN) and then blending fictitious and factual data, such as a name, a mailing address and a telephone number. What’s interesting is the increase in consumer awareness about synthetic identities. Previously, synthetic identity was a lender concern, often showing itself in delinquent accounts since the individual was fabricated. Consumers are becoming aware of synthetic ID fraud because of who the victims are — children. Based on findings from a recent Experian survey, the average age of child victims is only 12 years old. Children are attractive victims since fraud that uses their personal identifying information can go for years before being detected. I recently was interviewed by Forbes about the increase of synthetic identities being used to open auto loans and how your child’s SSN could be used to get a phony auto loan. The article provides a good overview of this growing concern for parents and lenders. A recent Javelin study found that more than 1 million children were victims of fraud. Most upsetting is that children are often betrayed by people close to them -- while only 7 percent of adults are victimized by someone they know, 60 percent of victims under 18 know the fraudster. Unfortunately, when families are in a tight spot financially they often resort to using their child’s SSN to create a clean credit record. Fraud is an issue we all must deal with — lenders, consumers and even minors — and the best course of action is to protect ourselves and our organizations.

Published: November 2, 2018 by Chris Ryan

In Lending as in Baseball, Moneyball Is No Longer Enou...

In 2011, data scientists and credit risk managers finally found an appropriate analogy to explain what we do for a living. “You know Moneyball? What Paul DePodesta and Billy Beane did for the Oakland A’s, I do for XYZ Bank.” You probably remember the story: Oakland had to squeeze the most value out of its limited budget for hiring free agents, so it used analytics — the new baseball “sabermetrics” created by Bill James — to make data-driven decisions that were counterintuitive to the experienced scouts. Michael Lewis told the story in a book that was an incredible bestseller and led to a hit movie. The year after the movie was made, Harvard Business Review declared that data science was “the sexiest job of the 21st century.” Coincidence?   The importance of data Moneyball emphasized the recognition, through sabermetrics, that certain players’ abilities had been undervalued. In Travis Sawchik’s bestseller Big Data Baseball: Math, Miracles, and the End of a 20-Year Losing Streak, he notes that the analysis would not have been possible without the data. Early visionaries, including John Dewan, began collecting baseball data at games all over the country in a volunteer program called Project Scoresheet. Eventually they were collecting a million data points per season. In a similar fashion, credit data pioneers, such as TRW’s Simon Ramo, began systematically compiling basic credit information into credit files in the 1960s. Recognizing that data quality is the key to insights and decision-making and responding to the demand for objective data, Dewan formed two companies — Sports Team Analysis and Tracking Systems (STATS) and Baseball Info Solutions (BIS). It seems quaint now, but those companies collected and cleaned data using a small army of video scouts with stopwatches. Now data is collected in real time using systems from Pitch F/X and the radar tracking system Statcast to provide insights that were never possible before. It’s hard to find a news article about Game 1 of this year’s World Series that doesn’t discuss the launch angle or exit velocity of Eduardo Núñez’s home run, but just a couple of years ago, neither statistic was even measured. Teams use proprietary biometric data to keep players healthy for games. Even neurological monitoring promises to provide new insights and may lead to changes in the game. Similarly, lenders are finding that so-called “nontraditional data” can open up credit to consumers who might have been unable to borrow money in the past. This includes nontraditional Fair Credit Reporting Act (FCRA)–compliant data on recurring payments such as rent and utilities, checking and savings transactions, and payments to alternative lenders like payday and short-term loans. Newer fintech lenders are innovating constantly — using permissioned, behavioral and social data to make it easier for their customers to open accounts and borrow money. Similarly, some modern banks use techniques that go far beyond passwords and even multifactor authentication to verify their customers’ identities online. For example, identifying consumers through their mobile device can improve the user experience greatly. Some lenders are even using behavioral biometrics to improve their online and mobile customer service practices. Continuously improving analytics Bill James and his colleagues developed a statistic called wins above replacement (WAR) that summarized the value of a player as a single number. WAR was never intended to be a perfect summary of a player’s value, but it’s very convenient to have a single number to rank players. Using the same mindset, early credit risk managers developed credit scores that summarized applicants’ risk based on their credit history at a single point in time. Just as WAR is only one measure of a player’s abilities, good credit managers understand that a traditional credit score is an imperfect summary of a borrower’s credit history. Newer scores, such as VantageScore® credit scores, are based on a broader view of applicants’ credit history, such as credit attributes that reflect how their financial situation has changed over time. More sophisticated financial institutions, though, don’t rely on a single score. They use a variety of data attributes and scores in their lending strategies. Just a few years ago, simply using data to choose players was a novel idea. Now new measures such as defense-independent pitching statistics drive changes on the field. Sabermetrics, once defined as the application of statistical analysis to evaluate and compare the performance of individual players, has evolved to be much more comprehensive. It now encompasses the statistical study of nearly all in-game baseball activities. A wide variety of data-driven decisions Sabermetrics began being used for recruiting players in the 1980’s. Today it’s used on the field as well as in the back office. Big Data Baseball gives the example of the “Ted Williams shift,” a defensive technique that was seldom used between 1950 and 2010. In the world after Moneyball, it has become ubiquitous. Likewise, pitchers alter their arm positions and velocity based on data — not only to throw more strikes, but also to prevent injuries. Similarly, when credit scores were first introduced, they were used only in originations. Lenders established a credit score cutoff that was appropriate for their risk appetite and used it for approving and declining applications. Now lenders are using Experian’s advanced analytics in a variety of ways that the credit scoring pioneers might never have imagined: Improving the account opening experience — for example, by reducing friction online Detecting identity theft and synthetic identities Anticipating bust-out activity and other first-party fraud Issuing the right offer to each prescreened customer Optimizing interest rates Reviewing and adjusting credit lines Optimizing collections   Analytics is no substitute for wisdom Data scientists like those at Experian remind me that in banking, as in baseball, predictive analytics is never perfect. What keeps finance so interesting is the inherent unpredictability of the economy and human behavior. Likewise, the play on the field determines who wins each ball game: anything can happen. Rob Neyer’s book Power Ball: Anatomy of a Modern Baseball Game quotes the Houston Astros director of decision sciences: “Sometimes it’s just about reminding yourself that you’re not so smart.”  

Published: October 26, 2018 by Jim Bander

Machine learning and Extreme Gradient Boosting

This is an exciting time to work in big data analytics. Here at Experian, we have more than 2 petabytes of data in the United States alone. In the past few years, because of high data volume, more computing power and the availability of open-source code algorithms, my colleagues and I have watched excitedly as more and more companies are getting into machine learning. We’ve observed the growth of competition sites like Kaggle, open-source code sharing sites like GitHub and various machine learning (ML) data repositories. We’ve noticed that on Kaggle, two algorithms win over and over at supervised learning competitions: If the data is well-structured, teams that use Gradient Boosting Machines (GBM) seem to win. For unstructured data, teams that use neural networks win pretty often. Modeling is both an art and a science. Those winning teams tend to be good at what the machine learning people call feature generation and what we credit scoring people called attribute generation. We have nearly 1,000 expert data scientists in more than 12 countries, many of whom are experts in traditional consumer risk models — techniques such as linear regression, logistic regression, survival analysis, CART (classification and regression trees) and CHAID analysis. So naturally I’ve thought about how GBM could apply in our world. Credit scoring is not quite like a machine learning contest. We have to be sure our decisions are fair and explainable and that any scoring algorithm will generalize to new customer populations and stay stable over time. Increasingly, clients are sending us their data to see what we could do with newer machine learning techniques. We combine their data with our bureau data and even third-party data, we use our world-class attributes and develop custom attributes, and we see what comes out. It’s fun — like getting paid to enter a Kaggle competition! For one financial institution, GBM armed with our patented attributes found a nearly 5 percent lift in KS when compared with traditional statistics. At Experian, we use Extreme Gradient Boosting (XGBoost) implementation of GBM that, out of the box, has regularization features we use to prevent overfitting. But it’s missing some features that we and our clients count on in risk scoring. Our Experian DataLabs team worked with our Decision Analytics team to figure out how to make it work in the real world. We found answers for a couple of important issues: Monotonicity — Risk managers count on the ability to impose what we call monotonicity. In application scoring, applications with better attribute values should score as lower risk than applications with worse values. For example, if consumer Adrienne has fewer delinquent accounts on her credit report than consumer Bill, all other things being equal, Adrienne’s machine learning score should indicate lower risk than Bill’s score. Explainability — We were able to adapt a fairly standard “Adverse Action” methodology from logistic regression to work with GBM. There has been enough enthusiasm around our results that we’ve just turned it into a standard benchmarking service. We help clients appreciate the potential for these new machine learning algorithms by evaluating them on their own data. Over time, the acceptance and use of machine learning techniques will become commonplace among model developers as well as internal validation groups and regulators. Whether you’re a data scientist looking for a cool place to work or a risk manager who wants help evaluating the latest techniques, check out our weekly data science video chats and podcasts.

Published: October 24, 2018 by Guest Contributor

Pricing Optimization: Understanding a Customer’s Price...

How a business prices its products is a dynamic process that drives customer satisfaction and loyalty, as well as business success. In the digital age, pricing is becoming even more complex. For example, companies like Amazon may revise the price of a hot item several times per day. Dynamic pricing models for consumer financial products can be especially difficult for at least four reasons: A complex regulatory environment. Fair lending concerns. The potential for adverse selection by risky consumers and fraudsters. The direct impact the affordability of a loan may have on both the consumer’s ability to pay it and the likelihood that it will be prepaid. If a lender offered the same interest rate and terms to every customer for the same loan product, low-risk customers would secure better rates elsewhere, and high-risk customers would not. The end result? Only the higher-risk customers would select the product, which would increase losses and reduce profitability. For this reason, the lending industry has established risk-based pricing. This pricing method addresses the above issue, since customers with different risk profiles are offered different rates. But it’s limited. More advanced lenders also understand the price elasticity of customer demand, because there are diverse reasons why customers decide to take up differently priced loans. Customers have different needs and risk profiles, so they react to a loan offer in different ways. Many factors determine a customer’s propensity to take up an offer — for example, the competitive environment and availability of other lenders, how time-critical the decision is, and the loan terms offered. Understanding the customer’s price elasticity allows a business to offer the ideal price to each customer to maximize profitability. Pricing optimization is the superior method assuming the lender has a scientific, data-driven approach to predicting how different customers will respond to different prices. Optimization allows an organization to determine the best offer for each customer to meet business objectives while adhering to financial and operational constraints such as volume, margin and credit risk. The business can access trade-offs between competing objectives, such as maximizing revenue and maximizing volume, and determine the optimal decision to be made for each individual customer to best meet both objectives. In the table below, you can see five benefits lenders realize when they improve their pricing segmentation with an optimization strategy. Interested in learning more about pricing optimization? Click here to download our full white paper, Price optimization in retail consumer lending.

Published: October 11, 2018 by Shelly Miller

Machine Learning for Real-World Credit Risk

Machine learning (ML), the newest buzzword, has swept into the lexicon and captured the interest of us all. Its recent, widespread popularity has stemmed mainly from the consumer perspective. Whether it’s virtual assistants, self-driving cars or romantic matchmaking, ML has rapidly positioned itself into the mainstream. Though ML may appear to be a new technology, its use in commercial applications has been around for some time. In fact, many of the data scientists and statisticians at Experian are considered pioneers in the field of ML, going back decades. Our team has developed numerous products and processes leveraging ML, from our world-class consumer fraud and ID protection to producing credit data products like our Trended 3DTM attributes. In fact, we were just highlighted in the Wall Street Journal for how we’re using machine learning to improve our internal IT performance. ML’s ability to consume vast amounts of data to uncover patterns and deliver results that are not humanly possible otherwise is what makes it unique and applicable to so many fields. This predictive power has now sparked interest in the credit risk industry. Unlike fraud detection, where ML is well-established and used extensively, credit risk modeling has until recently taken a cautionary approach to adopting newer ML algorithms. Because of regulatory scrutiny and perceived lack of transparency, ML hasn’t experienced the broad acceptance as some of credit risk modeling’s more utilized applications. When it comes to credit risk models, delivering the most predictive score is not the only consideration for a model’s viability. Modelers must be able to explain and detail the model’s logic, or its “thought process,” for calculating the final score. This means taking steps to ensure the model’s compliance with the Equal Credit Opportunity Act, which forbids discriminatory lending practices. Federal laws also require adverse action responses to be sent by the lender if a consumer’s credit application has been declined. This requires the model must be able to highlight the top reasons for a less than optimal score. And so, while ML may be able to deliver the best predictive accuracy, its ability to explain how the results are generated has always been a concern. ML has been stigmatized as a “black box,” where data mysteriously gets transformed into the final predictions without a clear explanation of how. However, this is changing. Depending on the ML algorithm applied to credit risk modeling, we’ve found risk models can offer the same transparency as more traditional methods such as logistic regression. For example, gradient boosting machines (GBMs) are designed as a predictive model built from a sequence of several decision tree submodels. The very nature of GBMs’ decision tree design allows statisticians to explain the logic behind the model’s predictive behavior. We believe model governance teams and regulators in the United States may become comfortable with this approach more quickly than with deep learning or neural network algorithms. Since GBMs are represented as sets of decision trees that can be explained, while neural networks are represented as long sets of cryptic numbers that are much harder to document, manage and understand. In future blog posts, we’ll discuss the GBM algorithm in more detail and how we’re using its predictability and transparency to maximize credit risk decisioning for our clients.

Published: September 12, 2018 by Alan Ikemura

Five Secrets to Outsourcing Data Science Successfully

The August 2018 LinkedIn Workforce Report states some interesting facts about data science and the current workforce in the United States. Demand for data scientists is off the charts, but there is a data science skills shortage in almost every U.S. city — particularly in the New York, San Francisco and Los Angeles areas. Nationally, there is a shortage of more than 150,000 people with data science skills. One way companies in financial services and other industries have coped with the skills gap in analytics is by using outside vendors. A 2017 Dun & Bradstreet and Forbes survey reported that 27 percent of respondents cited a skills gap as a major obstacle to their data and analytics efforts. Outsourcing data science work makes it easier to scale up and scale down as needs arise. But surprisingly, more than half of respondents said the third-party work was superior to their in-house analytics. At Experian, we have participated in quite a few outsourced analytics projects. Here are a few of the lessons we’ve learned along the way: Manage expectations: Everyone has their own management style, but to be successful, you must be proactively involved in managing the partnership with your provider. Doing so will keep them aligned with your objectives and prevent quality degradation or cost increases as you become more tied to them. Communication: Creating open and honest communication between executive management and your resource partner is key. You need to be able to discuss what is working well and what isn’t. This will help to ensure your partner has a thorough understanding of your goals and objectives and will properly manage any bumps in the road. Help external resources feel like a part of the team: When you’re working with external resources, either offshore or onshore, they are typically in an alternate location. This can make them feel like they aren’t a part of the team and therefore not directly tied to the business goals of the project. To help bridge the gap, performing regular status meetings via video conference can help everyone feel like a part of the team. Within these meetings, providing information on the goals and objectives of the project is key. This way, they can hear the message directly from you, which will make them feel more involved and provide a clear understanding of what they need to do to be successful. Being able to put faces to names, as well as having direct communication with you, will help external employees feel included. Drive engagement through recognition programs: Research has shown that employees are more engaged in their work when they receive recognition for their efforts. While you may not be able to provide a monetary award, recognition is still a big driver for engagement. It can be as simple as recognizing a job well done during your video conference meetings, providing certificates of excellence or sending a simple thank-you card to those who are performing well. Either way, taking the extra time to make your external workforce feel appreciated will produce engaged resources that will help drive your business goals forward. Industry training: Your external resources may have the necessary skills needed to perform the job successfully, but they may not have specific industry knowledge geared towards your business. Work with your partner to determine where they have expertise and where you can work together to providing training. Ensure your external workforce will have a solid understanding of the business line they will be supporting. If you’ve decided to augment your staff for your next big project, Experian® can help. Our Analytics on DemandTM service provides senior-level analysts, either onshore or offshore, who can help with analytical data science and modeling work for your organization.

Published: September 5, 2018 by Guest Contributor

Subscribe to our blog

Enter your name and email for the latest updates.

Email *

First Name *

Last Name *

Country *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Subscribe to our Experian Insights blog

Don't miss out on the latest industry trends and insights!