Data & Analytics

Loading...

Your model is only as good as your data, right? Actually, there are many considerations in developing a sound model, one of which is data. Yet if your data is bad or dirty or doesn’t represent the full population, can it be used? This is where sampling can help. When done right, sampling can lower your cost to obtain data needed for model development. When done well, sampling can turn a tainted and underrepresented data set into a sound and viable model development sample. First, define the population to which the model will be applied once it’s finalized and implemented. Determine what data is available and what population segments must be represented within the sampled data. The more variability in internal factors — such as changes in marketing campaigns, risk strategies and product launches — and external factors — such as economic conditions or competitor presence in the marketplace — the larger the sample size needed. A model developer often will need to sample over time to incorporate seasonal fluctuations in the development sample. The most robust samples are pulled from data that best represents the full population to which the model will be applied. It’s important to ensure your data sample includes customers or prospects declined by the prior model and strategy, as well as approved but nonactivated accounts. This ensures full representation of the population to which your model will be applied. Also, consider the number of predictors or independent variables that will be evaluated during model development, and increase your sample size accordingly. When it comes to spotting dirty or unacceptable data, the golden rule is know your data and know your target population. Spend time evaluating your intended population and group profiles across several important business metrics. Don’t underestimate the time needed to complete a thorough evaluation. Next, select the data from the population to aptly represent the population within the sampled data. Determine the best sampling methodology that will support the model development and business objectives. Sampling generates a smaller data set for use in model development, allowing the developer to build models more quickly. Reducing the data set’s size decreases the time needed for model computation and saves storage space without losing predictive performance. Once the data is selected, weights are applied so that each record appropriately represents the full population to which the model will be applied. Several traditional techniques can be used to sample data: Simple random sampling — Each record is chosen by chance, and each record in the population has an equal chance of being selected. Random sampling with replacement — Each record chosen by chance is included in the subsequent selection. Random sampling without replacement — Each record chosen by chance is removed from subsequent selections. Cluster sampling — Records from the population are sampled in groups, such as region, over different time periods. Stratified random sampling — This technique allows you to sample different segments of the population at different proportions. In some situations, stratified random sampling is helpful in selecting segments of the population that aren’t as prevalent as other segments but are equally vital within the model development sample. Learn more about how Experian Decision Analytics can help you with your custom model development needs.

Published: November 7, 2018 by Guest Contributor

Every morning, I wake up and walk bleary eyed to the bathroom, pop in my contacts and start my usual routine. Did I always have contacts? No. But putting on my contacts and seeing clearly has become part of my routine. After getting used to contacts, wearing glasses pales in comparison. This is how I view alternative credit data in lending. Are you having qualms about using this new data set? I get it, it’s like sticking a contact into your eye for the first time: painful and frustrating because you’re not sure what to do. To relieve you of the guesswork, we’ve compiled the top four myths related to this new data set to provide an in-depth view as to why this data is an essential supplement to your traditional credit file. Myth 1: Alternative credit data is not relevant. As consumers are shifting to new ways of gaining credit, it’s important for the industry to keep up. These data types are being captured by specialty credit bureaus. Gone are the days when alternative financing only included the payday store on the street corner. Alternative financing now expands to loans such as online installment, rent-to-own, point-of-sale financing, and auto-title loans. Consumers automatically default to the financing source familiar to them – which doesn’t necessarily mean traditional financial institutions. For example, some consumers may not walk into a bank branch anymore to get a loan, instead they may search online for the best rates, find a completely digital experience and get approved without ever leaving their couches. Alternative credit data gives you a lens into this activity. Myth 2: Borrowers with little to no traditional credit history are high risk. A common misconception of a thin-file borrower is that they may be high risk. According to the CFPB, roughly 45 million Americans have little to no credit history and this group may contain minority consumers or those from low income neighborhoods. However, they also may contain recent immigrants or young consumers who haven’t had exposure to traditional credit products. According to recent findings, one in five U.S. consumers has an alternative financial services data hit– some of these are even in the exceptional or very good credit segments. Myth 3: Alternative credit data is inaccurate and has poor data quality. On the contrary, this data set is collected, aggregated and verified in the same way as traditional credit data. Some sources of data, such as rental payments, are monthly and create a consistent look at a consumer’s financial behaviors. Experian’s Clarity Services, the leading source of alternative finance data, reports their consumer information, which includes application information and bank account data, as 99.9% accurate. Myth 4: Using alternative credit data might be harmful to the consumer. This data enables a more complete view of a consumer’s credit behavior for lenders, and provides consumers the opportunity to establish and maintain a credit profile. As with all information, consumers will be assessed appropriately based on what the data shows about their credit worthiness. Alternative credit data provides a better risk lens to the lender and consumers may get more access and approval for products that they want and deserve. In fact, a recent Experian survey found 71% of lenders believe alternative credit data will help consumers who would have previously been declined. Like putting in a new pair of contact lenses the first time, it may be uncomfortable to figure out the best use for alternative credit data in your daily rhythm. But once it’s added, it’s undeniable the difference it makes in your day-to-day decisions and suddenly you wonder how you’ve survived without it so long. See your consumers clearly today with alternative credit data. Learn More About Alternative Credit Data

Published: November 6, 2018 by Guest Contributor

Picking up where we left off, online fintech lenders face the same challenges as other financial institutions; however, they continue to push the speed of evolution and are early adopters across the board. Here’s a continuation of my conversation with Gavin Harding, Senior Business Consultant at Experian. (Be sure to read part 1.) Part two of a two-part series: As with many new innovations, fintechs are early adopters of alternative data. How are these firms using alt data and what are the results that are being achieved? In a competitive market, alternative data can be the key to helping fintechs lend deeper and better reach underserved consumers. By augmenting traditional credit data, a lender has access to greater insights on how a thin-file consumer will perform over time, and can then make a credit decision based on the identified risk. This is an important point. While alternative data often helps lenders expand their universe, it can also provide quantitative risk measures that traditional data doesn’t necessarily provide. For example, alternative data can recognize that a consumer who changes residences more than once every two years presents a higher credit risk. Another way fintechs are using alternative data is to screen for fraud. Fraudsters are digitally savvy and are using technology to initiate fraud attacks on a broader array of lenders, in bigger volumes than ever before. If I am a consumer who wants to get a loan through an online fintech lender, the first thing the lender wants to know is that I am who I say I am. The lender will ask me a series of questions and use traditional data to validate. Alternative data takes authentication a step further and allows lenders to not only identify what device I am using to complete the application, but whether the device is connected to my personal account records – giving them greater confidence in validating my identity. A second example of using alternative data to screen for fraud has to do with the way an application is actually completed. Most individuals who complete an online application will do so in a logical, sequential order. Fraudsters fall outside of these norms – and identifying these patterns can help lenders increase fraud detection. Lastly, alternative data can help fintech lenders with servicing and collections by way of utilizing behavioral analytics. If a consumer has a history of making payments on time, a lender may be apt to approve more credit, at better terms. As the consumer begins to pay back the credit advance, the lender can see the internal re-payment history and recommend incremental line increases. From your perspective, what is the future of data and what should fintechs consider as they evolve their products? The most sophisticated, most successful “think tanks” have two things that are evolving rapidly together: Data: Fintechs want all possible data, from a quality source, as close to real-time as possible. The industry has moved from “data sets” to “data lakes” to “data oceans,” and now to “data universes.” Analytics: Fintechs are creating ever-more sophisticated analytics and are incorporating machine learning and artificial intelligence into their strategies. Fintechs will continue to look for data assets that will help them reach the consumer. And to the degree that there is a return on the data investment, they will continue to capitalize on innovative solutions – such as alternative data.   In the competitive financial marketplace, insight is everything. Aite Group recently conducted a new report about alternative data that dives into new qualitative research collected by the firm. Join us to hear Aite Group’s findings about fintechs, banks, and credit unions at their webinar on December 4. Register today! Register for the Webinar Click here for more information about Experian’s Alternative Data solutions. Don’t forget to check out part one of this series here.   About Gavin Harding With more than 20 years in banking and finance Gavin leverages his expertise to develop sophisticated data and analytical solutions to problem solve and define strategies across the customer lifecycle for banking and fintech clients. For more than half of his career Gavin held senior leadership positions with a large regional bank, gaining experience in commercial and small business strategy, SBA lending, credit and risk management and sales. Gavin has guided organizations through strategic change initiatives and regulatory and supervisory oversight issues. Previously Gavin worked in the business leasing, agricultural and construction equipment sectors in sales and credit management roles.

Published: November 1, 2018 by Brittany Peterson

In 2011, data scientists and credit risk managers finally found an appropriate analogy to explain what we do for a living. “You know Moneyball? What Paul DePodesta and Billy Beane did for the Oakland A’s, I do for XYZ Bank.” You probably remember the story: Oakland had to squeeze the most value out of its limited budget for hiring free agents, so it used analytics — the new baseball “sabermetrics” created by Bill James — to make data-driven decisions that were counterintuitive to the experienced scouts. Michael Lewis told the story in a book that was an incredible bestseller and led to a hit movie. The year after the movie was made, Harvard Business Review declared that data science was “the sexiest job of the 21st century.” Coincidence?   The importance of data Moneyball emphasized the recognition, through sabermetrics, that certain players’ abilities had been undervalued. In Travis Sawchik’s bestseller Big Data Baseball: Math, Miracles, and the End of a 20-Year Losing Streak, he notes that the analysis would not have been possible without the data. Early visionaries, including John Dewan, began collecting baseball data at games all over the country in a volunteer program called Project Scoresheet. Eventually they were collecting a million data points per season. In a similar fashion, credit data pioneers, such as TRW’s Simon Ramo, began systematically compiling basic credit information into credit files in the 1960s. Recognizing that data quality is the key to insights and decision-making and responding to the demand for objective data, Dewan formed two companies — Sports Team Analysis and Tracking Systems (STATS) and Baseball Info Solutions (BIS). It seems quaint now, but those companies collected and cleaned data using a small army of video scouts with stopwatches. Now data is collected in real time using systems from Pitch F/X and the radar tracking system Statcast to provide insights that were never possible before. It’s hard to find a news article about Game 1 of this year’s World Series that doesn’t discuss the launch angle or exit velocity of Eduardo Núñez’s home run, but just a couple of years ago, neither statistic was even measured. Teams use proprietary biometric data to keep players healthy for games. Even neurological monitoring promises to provide new insights and may lead to changes in the game. Similarly, lenders are finding that so-called “nontraditional data” can open up credit to consumers who might have been unable to borrow money in the past. This includes nontraditional Fair Credit Reporting Act (FCRA)–compliant data on recurring payments such as rent and utilities, checking and savings transactions, and payments to alternative lenders like payday and short-term loans. Newer fintech lenders are innovating constantly — using permissioned, behavioral and social data to make it easier for their customers to open accounts and borrow money. Similarly, some modern banks use techniques that go far beyond passwords and even multifactor authentication to verify their customers’ identities online. For example, identifying consumers through their mobile device can improve the user experience greatly. Some lenders are even using behavioral biometrics to improve their online and mobile customer service practices.   Continuously improving analytics Bill James and his colleagues developed a statistic called wins above replacement (WAR) that summarized the value of a player as a single number. WAR was never intended to be a perfect summary of a player’s value, but it’s very convenient to have a single number to rank players. Using the same mindset, early credit risk managers developed credit scores that summarized applicants’ risk based on their credit history at a single point in time. Just as WAR is only one measure of a player’s abilities, good credit managers understand that a traditional credit score is an imperfect summary of a borrower’s credit history. Newer scores, such as VantageScore® credit scores, are based on a broader view of applicants’ credit history, such as credit attributes that reflect how their financial situation has changed over time. More sophisticated financial institutions, though, don’t rely on a single score. They use a variety of data attributes and scores in their lending strategies. Just a few years ago, simply using data to choose players was a novel idea. Now new measures such as defense-independent pitching statistics drive changes on the field. Sabermetrics, once defined as the application of statistical analysis to evaluate and compare the performance of individual players, has evolved to be much more comprehensive. It now encompasses the statistical study of nearly all in-game baseball activities.   A wide variety of data-driven decisions Sabermetrics began being used for recruiting players in the 1980’s. Today it’s used on the field as well as in the back office. Big Data Baseball gives the example of the “Ted Williams shift,” a defensive technique that was seldom used between 1950 and 2010. In the world after Moneyball, it has become ubiquitous. Likewise, pitchers alter their arm positions and velocity based on data — not only to throw more strikes, but also to prevent injuries. Similarly, when credit scores were first introduced, they were used only in originations. Lenders established a credit score cutoff that was appropriate for their risk appetite and used it for approving and declining applications. Now lenders are using Experian’s advanced analytics in a variety of ways that the credit scoring pioneers might never have imagined: Improving the account opening experience — for example, by reducing friction online Detecting identity theft and synthetic identities Anticipating bust-out activity and other first-party fraud Issuing the right offer to each prescreened customer Optimizing interest rates Reviewing and adjusting credit lines Optimizing collections   Analytics is no substitute for wisdom Data scientists like those at Experian remind me that in banking, as in baseball, predictive analytics is never perfect. What keeps finance so interesting is the inherent unpredictability of the economy and human behavior. Likewise, the play on the field determines who wins each ball game: anything can happen. Rob Neyer’s book Power Ball: Anatomy of a Modern Baseball Game quotes the Houston Astros director of decision sciences: “Sometimes it’s just about reminding yourself that you’re not so smart.”  

Published: October 26, 2018 by Jim Bander

While electric vehicles remain a relatively niche part of the market, with only 0.9 percent of the total vehicle registrations through June 2018, consumer demand has grown quite significantly over the past few years. As I mentioned in a previous blog post, electric vehicles held just 0.5 percent in 2016. Undoubtedly, manufacturers and retailers will look to capitalize on this growing segment of the population. But, it’s not enough to just dig into the sales number. If the automotive industry really wants to position itself for success, it’s important to understand the consumers most interested in electric vehicles. This level of data can help manufacturers and retailers make the right decisions and improve the bottom line. Based on our vehicle registration data, below is detailed look into the electric vehicle consumer. Home Value Somewhat unsurprisingly, the people most likely to purchase an electric vehicle tend to own more expensive homes. Consumers with homes valued between $450,000-$749,000 made up 25 percent of electric vehicle market share. And, as home values increase, these consumers still make up a significant portion of electric vehicle market. More than 15 percent of the electric vehicle market share was made up by those with homes valued between $750,000-$999,000, and 22.5 percent of the share was made up by those with home values of more than $1 million. In fact, consumers with home values of more than $1 million are 5.9 times more likely to purchase an electric vehicle than the general population.  Education Level Breaking down consumers by education level shows another distinct pattern. Individuals with a graduate degree are two times more likely to own an electric vehicle. Those with graduate degrees made up 28 percent of electric vehicle market share, compared to those with no college education, which made up just 11 percent. Consumer Lifestyle Segmentation Diving deeper into the lifestyles of individuals, we leveraged our Mosaic® USA consumer lifestyle segmentation system, which classifies every household and neighborhood in the U.S. into 71 unique types and 19 overachieving groups. Findings show American Royalty, who are described as wealthy, influential couples and families living in prestigious suburbs, led the way with a 17.8 percent share. Following them were Silver Sophisticates at 11.9 percent. Those in this category are described as mature couples and singles living an upscale lifestyle in suburban homes. Rounding out the top three were Cosmopolitan Achiever, described as affluent middle-aged and established couples and families who enjoy a dynamic lifestyle in metro areas. Their share was 10.1 percent. If manufacturers and retailers go beyond just the sales figures, a clearer picture of the electric vehicle market begins to form. They have an opportunity to understand that wealthier, more established individuals with higher levels of education and home values are much more likely to purchase electric vehicles. While these characteristics are consistent, the different segments represent a dynamic group of people who share similarities, but are still at different stages in life, leading different lifestyles and have different needs. As time wears on, the electric vehicle segment is poised for growth. If the industry wants to maximize its potential, they need to leverage data and insights to help make the right decisions and adapt to the evolving marketplace.

Published: October 26, 2018 by Brad Smith

I believe it was George Bernard Shaw that once said something along the lines of, “If economists were laid end-to-end, they’d never come to a conclusion, at least not the same conclusion.” It often feels the same way when it comes to big data analytics around customer behavior. As you look at new tools to put your customer insights to work for your enterprise, you likely have questions coming from across your organization. Models always seem to take forever to develop, how sure are we that the results are still accurate? What data did we use in this analysis; do we need to worry about compliance or security? To answer these questions and in an effort to best utilize customer data, the most forward-thinking financial institutions are turning to analytical environments, or sandboxes, to solve their big data problems. But what functionality is right for your financial institution? In your search for a sandbox solution to solve the business problem of big data, make sure you keep these top four features in mind. Efficiency: Building an internal data archive with effective business intelligence tools is expensive, time-consuming and resource-intensive. That’s why investing in a sandbox makes the most sense when it comes to drawing the value out of your customer data.By providing immediate access to the data environment at all times, the best systems can reduce the time from data input to decision by at least 30%. Another way the right sandbox can help you achieve operational efficiencies is by direct integration with your production environment. Pretty charts and graphs are great and can be very insightful, but the best sandbox goes beyond just business intelligence and should allow you to immediately put models into action. Scalability and Flexibility: In implementing any new software system, scalability and flexibility are key when it comes to integration into your native systems and the system’s capabilities. This is even more imperative when implementing an enterprise-wide tool like an analytical sandbox. Look for systems that offer a hosted, cloud-based environment, like Amazon Web Services, that ensures operational redundancy, as well as browser-based access and system availability.The right sandbox will leverage a scalable software framework for efficient processing. It should also be programming language agnostic, allowing for use of all industry-standard programming languages and analytics tools like SAS, R Studio, H2O, Python, Hue and Tableau. Moreover, you shouldn’t have to pay for software suites that your analytics teams aren’t going to use. Support: Whether you have an entire analytics department at your disposal or a lean, start-up style team, you’re going to want the highest level of support when it comes to onboarding, implementation and operational success. The best sandbox solution for your company will have a robust support model in place to ensure client success. Look for solutions that offer hands-on instruction, flexible online or in-person training and analytical support. Look for solutions and data partners that also offer the consultative help of industry experts when your company needs it. Data, Data and More Data: Any analytical environment is only as good as the data you put into it. It should, of course, include your own client data. However, relying exclusively on your own data can lead to incomplete analysis, missed opportunities and reduced impact. When choosing a sandbox solution, pick a system that will include the most local, regional and national credit data, in addition to alternative data and commercial data assets, on top of your own data.The optimum solutions will have years of full-file, archived tradeline data, along with attributes and models for the most robust results. Be sure your data partner has accounted for opt-outs, excludes data precluded by legal or regulatory restrictions and also anonymizes data files when linking your customer data. Data accuracy is also imperative here. Choose a big data partner who is constantly monitoring and correcting discrepancies in customer files across all bureaus. The best partners will have data accuracy rates at or above 99.9%. Solving the business problem around your big data can be a daunting task. However, investing in analytical environments or sandboxes can offer a solution. Finding the right solution and data partner are critical to your success. As you begin your search for the best sandbox for you, be sure to look for solutions that are the right combination of operational efficiency, flexibility and support all combined with the most robust national data, along with your own customer data. Are you interested in learning how companies are using sandboxes to make it easier, faster and more cost-effective to drive actionable insights from their data? Join us for this upcoming webinar. Register for the Webinar

Published: October 24, 2018 by Jesse Hoggard

This is an exciting time to work in big data analytics. Here at Experian, we have more than 2 petabytes of data in the United States alone. In the past few years, because of high data volume, more computing power and the availability of open-source code algorithms, my colleagues and I have watched excitedly as more and more companies are getting into machine learning. We’ve observed the growth of competition sites like Kaggle, open-source code sharing sites like GitHub and various machine learning (ML) data repositories. We’ve noticed that on Kaggle, two algorithms win over and over at supervised learning competitions: If the data is well-structured, teams that use Gradient Boosting Machines (GBM) seem to win. For unstructured data, teams that use neural networks win pretty often. Modeling is both an art and a science. Those winning teams tend to be good at what the machine learning people call feature generation and what we credit scoring people called attribute generation. We have nearly 1,000 expert data scientists in more than 12 countries, many of whom are experts in traditional consumer risk models — techniques such as linear regression, logistic regression, survival analysis, CART (classification and regression trees) and CHAID analysis. So naturally I’ve thought about how GBM could apply in our world. Credit scoring is not quite like a machine learning contest. We have to be sure our decisions are fair and explainable and that any scoring algorithm will generalize to new customer populations and stay stable over time. Increasingly, clients are sending us their data to see what we could do with newer machine learning techniques. We combine their data with our bureau data and even third-party data, we use our world-class attributes and develop custom attributes, and we see what comes out. It’s fun — like getting paid to enter a Kaggle competition! For one financial institution, GBM armed with our patented attributes found a nearly 5 percent lift in KS when compared with traditional statistics. At Experian, we use Extreme Gradient Boosting (XGBoost) implementation of GBM that, out of the box, has regularization features we use to prevent overfitting. But it’s missing some features that we and our clients count on in risk scoring. Our Experian DataLabs team worked with our Decision Analytics team to figure out how to make it work in the real world. We found answers for a couple of important issues: Monotonicity — Risk managers count on the ability to impose what we call monotonicity. In application scoring, applications with better attribute values should score as lower risk than applications with worse values. For example, if consumer Adrienne has fewer delinquent accounts on her credit report than consumer Bill, all other things being equal, Adrienne’s machine learning score should indicate lower risk than Bill’s score. Explainability — We were able to adapt a fairly standard “Adverse Action” methodology from logistic regression to work with GBM. There has been enough enthusiasm around our results that we’ve just turned it into a standard benchmarking service. We help clients appreciate the potential for these new machine learning algorithms by evaluating them on their own data. Over time, the acceptance and use of machine learning techniques will become commonplace among model developers as well as internal validation groups and regulators. Whether you’re a data scientist looking for a cool place to work or a risk manager who wants help evaluating the latest techniques, check out our weekly data science video chats and podcasts.

Published: October 24, 2018 by Guest Contributor

Electric vehicles are here to stay – and will likely gain market share as costs reduce, travel ranges increase and charging infrastructure grows.

Published: October 24, 2018 by Brad Smith

If your company is like many financial institutions, it’s likely the discussion around big data and financial analytics has been an ongoing conversation. For many financial institutions, data isn’t the problem, but rather what could or should be done with it. Research has shown that only about 30% of financial institutions are successfully leveraging their data to generate actionable insights, and customers are noticing. According to a recent study from Capgemini,  30% of US customers and 26% of UK customers feel like their financial institutions understand their needs. No matter how much data you have, it’s essentially just ones and zeroes if you’re not using it. So how do banks, credit unions, and other financial institutions who capture and consume vast amounts of data use that data to innovate, improve the customer experience and stay competitive? The answer, you could say, is written in the sand. The most forward-thinking financial institutions are turning to analytical environments, also known as a sandbox, to solve the business problem of big data. Like the name suggests, a sandbox is an environment that contains all the materials and tools one might need to create, build, and collaborate around their data. A sandbox gives data-savvy banks, credit unions and FinTechs access to depersonalized credit data from across the country. Using custom dashboards and data visualization tools, they can manipulate the data with predictive models for different micro and macro-level scenarios. The added value of a sandbox is that it becomes a one-stop shop data tool for the entire enterprise. This saves the time normally required in the back and forth of acquiring data for a specific to a project or particular data sets. The best systems utilize the latest open source technology in artificial intelligence and machine learning to deliver intelligence that can inform regional trends, consumer insights and highlight market opportunities. From industry benchmarking to market entry and expansion research and campaign performance to vintage analysis, reject inferencing and much more. An analytical sandbox gives you the data to create actionable analytics and insights across the enterprise right when you need it, not months later. The result is the ability to empower your customers to make financial decisions when, where and how they want. Keeping them happy keeps your financial institution relevant and competitive. Isn’t it time to put your data to work for you? Learn more about how Experian can solve your big data problems. >> Interested to see a live demo of the Ascend Sandbox? Register today for our webinar “Big Data Can Lead to Even Bigger ROI with the Ascend Sandbox.”

Published: October 4, 2018 by Jesse Hoggard

Big Data is no longer a new concept. Once thought to be an overhyped buzzword, it now underpins and drives billions in dollars of revenue across nearly every industry. But there are still companies who are not fully leveraging the value of their big data and that’s a big problem. In a recent study, Experian and Forrester surveyed nearly 600 business executives in charge of enterprise risk, analytics, customer data and fraud management. The results were surprising: while 78% of organizations said they have made recent investments in advanced analytics, like the proverbial strategic plan sitting in a binder on a shelf, only 29% felt they were successfully using these investments to combine data sources to gather more insights. Moreover, 40% of respondents said they still rely on instinct and subjectivity when making decisions. While gut feeling and industry experience should be a part of your decision-making process, without data and models to verify or challenge your assumptions, you’re taking a big risk with bigger operations budgets and revenue targets. Meanwhile, customer habits and demands are quickly evolving beyond a fundamental level. The proliferation of mobile and online environments are driving a paradigm shift to omnichannel banking in the financial sector and with it, an expectation for a customized but also digitized customer experience. Financial institutions have to be ready to respond to and anticipate these changes to not only gain new customers but also retain current customers. Moreover, you can bet that your competition is already thinking about how they can respond to this shift and better leverage their data and analytics for increased customer acquisition and engagement, share of wallet and overall reach. According to a recent Accenture study, 79% of enterprise executives agree that companies that fail to embrace big data will lose their competitive position and could face extinction. What are you doing to help solve the business problem around big data and stay competitive in your company?

Published: September 27, 2018 by Jesse Hoggard

Machine learning (ML), the newest buzzword, has swept into the lexicon and captured the interest of us all. Its recent, widespread popularity has stemmed mainly from the consumer perspective. Whether it’s virtual assistants, self-driving cars or romantic matchmaking, ML has rapidly positioned itself into the mainstream. Though ML may appear to be a new technology, its use in commercial applications has been around for some time. In fact, many of the data scientists and statisticians at Experian are considered pioneers in the field of ML, going back decades. Our team has developed numerous products and processes leveraging ML, from our world-class consumer fraud and ID protection to producing credit data products like our Trended 3DTM attributes. In fact, we were just highlighted in the Wall Street Journal for how we’re using machine learning to improve our internal IT performance. ML’s ability to consume vast amounts of data to uncover patterns and deliver results that are not humanly possible otherwise is what makes it unique and applicable to so many fields. This predictive power has now sparked interest in the credit risk industry. Unlike fraud detection, where ML is well-established and used extensively, credit risk modeling has until recently taken a cautionary approach to adopting newer ML algorithms. Because of regulatory scrutiny and perceived lack of transparency, ML hasn’t experienced the broad acceptance as some of credit risk modeling’s more utilized applications. When it comes to credit risk models, delivering the most predictive score is not the only consideration for a model’s viability. Modelers must be able to explain and detail the model’s logic, or its “thought process,” for calculating the final score. This means taking steps to ensure the model’s compliance with the Equal Credit Opportunity Act, which forbids discriminatory lending practices. Federal laws also require adverse action responses to be sent by the lender if a consumer’s credit application has been declined. This requires the model must be able to highlight the top reasons for a less than optimal score. And so, while ML may be able to deliver the best predictive accuracy, its ability to explain how the results are generated has always been a concern. ML has been stigmatized as a “black box,” where data mysteriously gets transformed into the final predictions without a clear explanation of how. However, this is changing. Depending on the ML algorithm applied to credit risk modeling, we’ve found risk models can offer the same transparency as more traditional methods such as logistic regression. For example, gradient boosting machines (GBMs) are designed as a predictive model built from a sequence of several decision tree submodels. The very nature of GBMs’ decision tree design allows statisticians to explain the logic behind the model’s predictive behavior. We believe model governance teams and regulators in the United States may become comfortable with this approach more quickly than with deep learning or neural network algorithms. Since GBMs are represented as sets of decision trees that can be explained, while neural networks are represented as long sets of cryptic numbers that are much harder to document, manage and understand. In future blog posts, we’ll discuss the GBM algorithm in more detail and how we’re using its predictability and transparency to maximize credit risk decisioning for our clients.

Published: September 12, 2018 by Alan Ikemura

The August 2018 LinkedIn Workforce Report states some interesting facts about data science and the current workforce in the United States. Demand for data scientists is off the charts, but there is a data science skills shortage in almost every U.S. city — particularly in the New York, San Francisco and Los Angeles areas. Nationally, there is a shortage of more than 150,000 people with data science skills. One way companies in financial services and other industries have coped with the skills gap in analytics is by using outside vendors. A 2017 Dun & Bradstreet and Forbes survey reported that 27 percent of respondents cited a skills gap as a major obstacle to their data and analytics efforts. Outsourcing data science work makes it easier to scale up and scale down as needs arise. But surprisingly, more than half of respondents said the third-party work was superior to their in-house analytics. At Experian, we have participated in quite a few outsourced analytics projects. Here are a few of the lessons we’ve learned along the way: Manage expectations: Everyone has their own management style, but to be successful, you must be proactively involved in managing the partnership with your provider. Doing so will keep them aligned with your objectives and prevent quality degradation or cost increases as you become more tied to them. Communication: Creating open and honest communication between executive management and your resource partner is key. You need to be able to discuss what is working well and what isn’t. This will help to ensure your partner has a thorough understanding of your goals and objectives and will properly manage any bumps in the road. Help external resources feel like a part of the team: When you’re working with external resources, either offshore or onshore, they are typically in an alternate location. This can make them feel like they aren’t a part of the team and therefore not directly tied to the business goals of the project. To help bridge the gap, performing regular status meetings via video conference can help everyone feel like a part of the team. Within these meetings, providing information on the goals and objectives of the project is key. This way, they can hear the message directly from you, which will make them feel more involved and provide a clear understanding of what they need to do to be successful. Being able to put faces to names, as well as having direct communication with you, will help external employees feel included. Drive engagement through recognition programs: Research has shown that employees are more engaged in their work when they receive recognition for their efforts. While you may not be able to provide a monetary award, recognition is still a big driver for engagement. It can be as simple as recognizing a job well done during your video conference meetings, providing certificates of excellence or sending a simple thank-you card to those who are performing well. Either way, taking the extra time to make your external workforce feel appreciated will produce engaged resources that will help drive your business goals forward. Industry training: Your external resources may have the necessary skills needed to perform the job successfully, but they may not have specific industry knowledge geared towards your business. Work with your partner to determine where they have expertise and where you can work together to providing training. Ensure your external workforce will have a solid understanding of the business line they will be supporting. If you’ve decided to augment your staff for your next big project, Experian® can help. Our Analytics on DemandTM service provides senior-level analysts, either onshore or offshore, who can help with analytical data science and modeling work for your organization.

Published: September 5, 2018 by Guest Contributor

As more financial institutions express interest and leverage alternative credit data sources to decision and assess consumers, lenders want to be assured of how they can best utilize this data source and maintain compliance. Experian recently interviewed Philip Bohi, Vice President for Compliance Education for the American Financial Services Association (AFSA), to learn more about his perspective on this topic, as well as to gain insights on what lenders should consider as they dive into the world of alternative credit data. Alternative data continues to be a hot topic in the financial services space. How have you seen it evolve over the past few years? It’s hard to pinpoint where it began, but it has been interesting to observe how technology firms and people have changed our perceptions of the value and use of data in recent years. Earlier, a company’s data was just the information needed to conduct business. It seems like people are waking up to the realization that their business data can be useful internally, as well as to others.  And we have come to understand how previously disregarded data can be profoundly valuable. These insights provide a lot of new opportunities, but also new questions.  I would also say that the scope of alternative credit data use has changed.  A few years ago, alternative credit data was a tool to largely address the thin- and no-file consumer. More recently, we’ve seen it can provide a lift across the credit spectrum. We recently conducted a survey with lenders and 23% of respondents cited “complying with laws and regulations” as the top barrier to utilizing alternative data. Why do you think this is the case? What are the top concerns you hear from lenders as it relates to compliance on this topic? The consumer finance industry is very focused on compliance, because failure to maintain compliance can kill a business, either directly through fines and expenses, or through reputation damage. Concerns about alternative data come from a lack of familiarity. There is uncertainty about acquiring the data, using the data, safeguarding the data, selling the data, etc. Companies want to feel confident that they know where the limits are in creating, acquiring, using, storing and selling data. Alternative data is a broad term. When it comes to utilizing it for making a credit decision, what types of alternative data can actually be used?  Currently the scope is somewhat limited. I would describe the alternative data elements as being analogous to traditional credit data. Alternative data includes rent payments, utility payments, cell phone payments, bank deposits, and similar records. These provide important insights into whether a given consumer is keeping up with financial obligations. And most importantly, we are seeing that the particular types of obligations reflected in alternative data reflect the spending habits of people whose traditional credit files are thin or non-existent.  This is a good thing, as alternative data captures consumers who are paying their bills consistently earlier than traditional data does. Serving those customers is a great opportunity. If a lender wants to begin utilizing alternative credit data, what must they know from a compliance standpoint? I would begin with considering what the lender’s goal is and letting that guide how it will explore using alternative data. For some companies, accessing credit scores that include some degree of alternative data along with traditional data elements is enough. Just doing that provides a good business benefit without introducing a lot of additional risk as compared to using traditional credit score information. If the company wants to start leveraging its own customer data for its own purposes, or making it available to third parties, that becomes complex very quickly.  A company can find itself subject to all the regulatory burdens of a credit-reporting agency very quickly. In any case, the entire lifecycle of the data has to be considered, along with how the data will be protected when the data is “at rest,” “in use,” or “in transit.” Alternative data used for credit assessment should additionally be FCRA-compliant. How do you see alternative credit data evolving in the future? I cannot predict where it will go, but the unfettered potential is dizzying. Think about how DNA-based genealogy has taken off, telling folks they have family members they did not know and providing information to solve old crimes. I think we need to carefully balance personal privacy and prudent uses of customer data. There is also another issue with wide-ranging uses of new data. I contend it takes time to discern whether an element of data is accurately predictive.  Consider for a moment a person’s utility bills. If electricity usage in a household goes down when the bills in the neighborhood are going up, what does that tell us? Does it mean the family is under some financial strain and using the air conditioning less? Or does it tell us they had solar panels installed? Or they’ve been on vacation?  Figuring out what a particular piece of data means about someone’s circumstances can be difficult. About Philip Bohi Philip joined  AFSA in 2017 as Vice President, Compliance Education. He is responsible for providing strategic direction and leadership for the Association’s compliance activities, including AFSA University, and is the staff liaison to the Operations and Regulatory Compliance Committee and Technology Task Forces. He brings significant consumer finance legal and compliance experience to AFSA, having served as in-house counsel at Toyota Motor Credit Corporation and Fannie Mae. At those companies, Philip worked closely with compliance staff supporting technology projects, legislative tracking, and vendor management. His private practice included work on manufactured housing, residential mortgage compliance, and consumer finance matters at McGlinchey Stafford, PLLC and Lotstein Buckman, LLP. He is a member of the Virginia State Bar and the District of Columbia Bar. Learn more about the array of alternative credit data sources available to financial institutions.

Published: July 18, 2018 by Kerry Rivera

As I mentioned in my previous blog, model validation is an essential step in evaluating a recently developed predictive model’s performance before finalizing and proceeding with implementation. An in-time validation sample is created to set aside a portion of the total model development sample so the predictive accuracy can be measured on a data sample not used to develop the model. However, if few records in the target performance group are available, splitting the total model development sample into the development and in-time validation samples will leave too few records in the target group for use during model development. An alternative approach to generating a validation sample is to use a resampling technique. There are many different types and variations of resampling methods. This blog will address a few common techniques. Jackknife technique — An iterative process whereby an observation is removed from each subsequent sample generation. So if there are N number of observations in the data, jackknifing calculates the model estimates on N - 1 different samples, with each sample having N - 1 observations. The model then is applied to each sample, and an average of the model predictions across all samples is derived to generate an overall measure of model performance and prediction accuracy. The jackknife technique can be broadened to a group of observations removed from each subsequent sample generation while giving equal opportunity for inclusion and exclusion to each observation in the data set. K-fold cross-validation — Generates multiple validation data sets from the holdout sample created for the model validation exercise, i.e., the holdout data is split into K subsets. The model then is applied to the K validation subsets, with each subset held out during the iterative process as the validation set while the model scores the remaining K-1 subsets. Again, an average of the predictions across the multiple validation samples is used to create an overall measure of model performance and prediction accuracy. Bootstrap technique — Generates subsets from the full model development data sample, with replacement, producing multiple samples generally of equal size. Thus, with a total sample size of N, this technique generates N random samples such that a single observation can be present in multiple subsets while another observation may not be present in any of the generated subsets. The generated samples are combined into a simulated larger data sample that then can be split into a development and an in-time, or holdout, validation sample. Before selecting a resampling technique, it’s important to check and verify data assumptions for each technique against the data sample selected for your model development, as some resampling techniques are more sensitive than others to violations of data assumptions. Learn more about how Experian Decision Analytics can help you with your custom model development.

Published: July 5, 2018 by Guest Contributor

An introduction to the different types of validation samples Model validation is an essential step in evaluating and verifying a model’s performance during development before finalizing the design and proceeding with implementation. More specifically, during a predictive model’s development, the objective of a model validation is to measure the model’s accuracy in predicting the expected outcome. For a credit risk model, this may be predicting the likelihood of good or bad payment behavior, depending on the predefined outcome. Two general types of data samples can be used to complete a model validation. The first is known as the in-time, or holdout, validation sample and the second is known as the out-of-time validation sample. So, what’s the difference between an in-time and an out-of-time validation sample? An in-time validation sample sets aside part of the total sample made available for the model development. Random partitioning of the total sample is completed upfront, generally separating the data into a portion used for development and the remaining portion used for validation. For instance, the data may be randomly split, with 70 percent used for development and the other 30 percent used for validation. Other common data subset schemes include an 80/20, a 60/40 or even a 50/50 partitioning of the data, depending on the quantity of records available within each segment of your performance definition. Before selecting a data subset scheme to be used for model development, you should evaluate the number of records available in your target performance group, such as number of bad accounts. If you have too few records in your target performance group, a 50/50 split can leave you with insufficient performance data for use during model development. A separate blog post will present a few common options for creating alternative validation samples through a technique known as resampling. Once the data has been partitioned, the model is created using the development sample. The model is then applied to the holdout validation sample to determine the model’s predictive accuracy on data that wasn’t used to develop the model. The model’s predictive strength and accuracy can be measured in various ways by comparing the known and predefined performance outcome to the model’s predicted performance outcome. The out-of-time validation sample contains data from an entirely different time period or customer campaign than what was used for model development. Validating model performance on a different time period is beneficial to further evaluate the model’s robustness. Selecting a data sample from a more recent time period having a fully mature set of performance data allows the modeler to evaluate model performance on a data set that may more closely align with the current environment in which the model will be used. In this case, a more recent time period can be used to establish expectations and set baseline parameters for model performance, such as population stability indices and performance monitoring. Learn more about how Experian Decision Analytics can help you with your custom model development needs.

Published: June 18, 2018 by Guest Contributor

Subscribe to our blog

Enter your name and email for the latest updates.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Subscribe to our Experian Insights blog

Don't miss out on the latest industry trends and insights!
Subscribe