In this Business Chat, we sat down with Experian’s Director of Data Management, Ben Bargoil, and Cloudera Chief Customer Officer, Anupam Singh to talk about the data management investments Experian is making in machine learning and A.I, and how these investments are helping our clients.
Ben, can you share a little bit about some of the data challenges we are having here at Experian?
[Ben]: Most of the challenges we’ve been facing and working closely with Cloudera and solving over the last say two years have been in response to growing demands from the marketplace, right? What was considered acceptable even two years ago, in terms of data coverage and data accuracy, is no longer meeting our customers’ needs in the marketplace. And add to that the legacy processes and legacy environment that we operated in, we’re challenged with keeping up with the variety, the volume, and the velocity of data that Experian has, and that coupled with our previous approaches to data management, that weren’t flexible enough necessarily to empower these new approaches. Right? Again, most of what we wanted to solve were related to how we can continue to do what we’re doing but do it in a much more efficient, much more impactful way for our customers.
Anupama turning to you; what solution did Cloudera prescribe to Experian to help us address some of these challenges?
[Anupam]: Machine Learning is only as good as the data, and so the solution that we provided is a comprehensive solution called Cloudera Data Science Workbench. Yes, the more charismatic part of the product is that you can do Machine Learning apps and build neural networks. But in reality, where we saw Experian needed the product is things like data de-duplication, classification. So it is almost a prologue to the machine learning problem. So that’s the solution that we provided.
Ben, can you explain how these investments help Experian clients?
[Ben]: When we were creating this new, these new approaches, and this new team, one thing that I was determined that I needed this centralized hub, right. I wanted a central hub in which we could build an entirely new ecosystem, and as we worked with the Cloudera team, it became obvious to us that CDSW was going to be our best choice.
[Ben]: So while we were investing in a CDSW, Experian had also been investing in our new technology environment, and putting those two together was the key to our success. Each one of these challenges has a direct line of sight to our clients. And most of them are based on direct feedback we’ve received from clients over the previous years if you will, and one of the great things that we’ve done inside of CDSW inside of the applications is measuring the impact to Experian customers. So we know confidently, we can state that millions and millions of customer interactions with our data have been improved thanks to the solutions we’ve built inside of CDSW.
Anupam, did Experian have any unique challenges that stood out to Cloudera when we engaged with you?
[Anupam]: Of course, with Experian, you know, I tell the team internally at Cloudera that we are all Experian customers indirectly, right? Anytime I’m going to buy something, Experian is in the workflow. So that always stands out for us. But the sheer scale of Experian, when you have almost a billion unique users that you’re serving, you guys are one of the biggest Internet properties on the planet that nobody has heard of. When we were looking at the nomination for Data Impact awards, any small gains, 10% for Experian’s actually a hundred million human beings on the planet, and so, that stood out for us. That’s what got us excited about working with Experian on this problem, the sheer scale of it.
How, how long did it take before you saw measurable results in working with the Cloudera solution?
[Ben]: If you go back, let’s go back two years ago when we were first creating this new ecosystem, and we first started our engagement with CDSW. There were the normal growing pains associated with a new environment, a new toolset, and a new team that we were onboarding onto Experian as well. So between the time we started working with Cloudera, it was many months until we had created a team, launched our first application, and started to make improvements to the database. Fast forward to today, we have many applications that have been created and launched in production, and the great thing is, this is very typical of most machine learning applications, you spend most of your time with the data, exploring, cleansing the data, creating the features you need to use within your machine learning application.
[Ben]: But what we see is that the large bulk of the work, once we get to the point where we’re ready to move into production through the combined power of CDSW and our new environment, we can make significant changes in a very short amount of time. I’m talking millions of improvements in a month or two months. To give you a good example of industry coverage, industry classification coverage, that was one of the challenges we wanted to solve was our customers wanted us to create more industry codes for businesses. So we spent many months building the application, doing all kinds of feature engineering. Within the course of about two to three months after we launched that application, I think we added somewhere in the 20 million range, new industry codes to our database. Again, lots of work on the front end, but as soon as we get into production, huge improvements in, in a short amount of time.
What can you share about the Cloudera Data Impact Awards?
[Anupam]: Our Data Impact Awards look for impact. We are all in the enterprise software business. Sometimes we forget what impact we have; if you have great fraud detection, you and I can safely shop on the Internet. We are talking today through some internet provider that runs its network management, and its network reliability analytics on top of Cloudera. For us, the Data Impact Awards are not just about our technology, but what impact we’ve had on the healthcare, banking, and telco systems of the world, of the world’s government systems. That’s how we measure, and it’s fairly competitive every year.
Can you share any criteria on what the judges looked for whenever they were choosing a winner?
[Anupam]: What we look for, is this real in terms of, does it have an impact or was it just a technology experiment? We found with Velcro, for example, de-duplication of records is one of the biggest problems in machine learning, and the scale at which Experian de-duplicated records, meaning, knowing which Ben is the right Ben when I’m looking it up is a very real problem. All of us face it as a consumer— the same thing with establishing a corporate identity. As somebody who runs a very large business for Cloudera, sometimes you don’t even know what is the actual name of the customer. So the idea that you can resolve the name of a customer is a real problem. So taking these two or three real problems, we saw the level of impact that Experian was having on its customers, but more importantly, on its indirect, all the consumers in the world, and that stood out for us.
Ben, what has the response been from Experian clients since deploying Cloudera?
[Ben]: Some of the problems and challenges we’ve been addressing are more behind the scenes, under the covers, like what Anupam just mentioned, improved entity resolution, improved structural integrity. So those may or may not be necessarily as overt as some of the other challenges we saw, like the industry classification example we mentioned, right? Adding all of those millions and tens of millions of codes to our database, our customers have a direct line of sight to that. We’ve received very positive feedback from the marketplace in terms of embracing these new approaches and being able to solve those challenges. I always like to say I’m not quite declaring victory on any of these challenges yet, but, you know, the end of the war is in sight on some of them. We’ve almost completely removed this as an area of opportunity, and we are meeting the needs of the marketplace.