Two factors determine the quality of your predictive scoring: the predictive model and the underlying data. The number of data-points and their accuracy are crucial for driving great results.
Modeling experts have an expression: “garbage in, garbage out.” What they mean is that if you take inaccurate data and apply it even in a state-of-the-art predictive model, your results will have dubious quality at best. Therefore, one of the crucial things that every marketer who is involved in predictive marketing should pay attention to is data quality and quantity.
CRM data: Turning lead into gold
The first source of potentially inaccurate data is the company CRM. CRMs typically contain data from multiple sources, of varying quality, that were updated manually by multiple sales reps, business development reps and sales operations managers. The result may be a total mess.
Furthermore, CRM data is typically not standardized. Some examples that we found at Mintigo include using full, abbreviated and code to denote state—California, Calif. and CA. While humans understand that these all refer to the same state, models will take them as three different states. This is even worse with job titles, where Director of Marketing, Marketing Director and Dir. Marketing are only a few examples of the plethora of permutations that can be found.
Therefore, to make CRM data usable for Predictive Marketing, it has to go through a cleansing and standardization process. The result of this process is that multiple variations of the same variable will be regarded as one.
However, while we can alleviate the problem of CRM data quality, what we cannot solve is quantity. The average company CRM contains 10 data points on each lead. These typically include: name, location and company demographics such as revenue, industry and number of employees. Our experience shows that it is nearly impossible to get any predictive power from CRM data.
Web data: Mining the gold nuggets
To increase the predictive power of the model, CRM data needs to be augmented. What makes more sense than sourcing additional data from the thousands of variables that can be obtained by mining the Web? But unfortunately, unlike CRM data, Web data is not organized in a big table. There are two challenges in leveraging Web data to improve the predictive power of a model:
- Discovering Marketing Indicators
Up until now, the only way for finding companies that use Microsoft’s SQL Server was to have an army of telemarketers cold-calling companies and asking them (hoping to get someone who can provide this answer). Now, we present a robust data mining approach.
For example, to discover users of Microsoft’s SQL Server, Mintigo mines billions of webpages and looks for relevant clues such as Microsoft Partner indication on the website, job openings or current employees who specialize in SQL server. In addition, Mintigo looks at news feeds and press releases to detect any clues that will lead to the conclusion that the company uses SQL servers. But the secret sauce is the data-mining algorithms that combine all data and calculate the probability that a company is using Microsoft’s SQL Server.
Finding the right data is the first challenge. The second challenge is to match it with the existing CRM records to create the robust database needed for effective prediction. How would you match IBM with International Business Machines Inc? The accuracy of this process is crucial for the overall accuracy of the data.
Matching is done through a multi-stage algorithm that tries to match based on various keys. However, we think that our matching engine is so powerful, that we, unfortunately, cannot reveal it here, as it is a part of our competitive advantage.
Data quantity and accuracy leads to prediction quality. To achieve that, it is important both to continuously clean CRM data, and to augment it with data from across the Web. By using data mining to ensure accuracy, and increasing the number of data-points from a dozen to thousands, it’s possible to drastically improve the quality of predictive lead scoring models.