Understanding which accounts are likely to progress through the sales pipeline is a crucial part of helping B2B Sales and Marketing teams make data-driven decisions and allocate their time and resources efficiently. At Demandbase, we use artificial intelligence and machine learning (AI/ML) to predict which of our customers’ accounts are likely to reach key points in the sales pipeline. This post will focus on one such application, Pipeline Predict, which scores accounts based on how likely they are to open a sales opportunity in the near future.
Predicting pipeline progression in complex B2B go-to-markets is challenging due to a variety of factors:
AI/ML allows us to create a more accurate ranking of accounts because it is:
Pipeline Predict ingests data from a variety of sources, including 1st party data such as Customer Relationship Management (CRM) systems, marketing automation platforms (MAPs), 3rd party data providers, and Demandbase’s own proprietary datasets. Moreover, the team is actively working on incorporating additional internal and external data sources with the goal of further enhancing Pipeline Predict. The main data sources are:
Demandbase’s account identification capabilities allow us to link intent, page visit, and advertising data to a specific account, pulling these datasets together to understand a single account’s behavior across the web.
For all of the activity, site visit, intent, and other data described above, the Pipeline Predict model looks at the following:
Here are some examples of important features from Demandbase’s internal Pipeline Predict models supporting Demandbase’s own marketing and sales teams:
Firmographics | The account is in the software industry |
Intent | The account is showing intent for important keywords such as “ABM platform” and “GTM strategy” |
Site Analytics | The account visited a relevant article at support.demandbase.com |
CRM Activities (Salesforce) | A VP responded to a product education campaign |
We train separate models for each of our customers, for a few important reasons:
We also give our customers the option to create multiple Pipeline Predict models, so they can generate different scores for different product lines. This means the model is able to learn which activities tend to generate pipeline for different products, where the specific weights can be different for different product lines. Here’s an example from one of Demandbase’s customers:
Product Line | Important Activities |
---|---|
Product A | CXO responded to a Salesforce campaign promoting an Analyst Report |
Product B | Director visited a relevant web page |
Product C | Manager, VP, or CXO responded to a Salesforce campaign promoting an eBook |
Product D | An employee attended a webinar |
Most Demandbase customers have hundreds of thousands or even millions of accounts in their CRMs. Only a small percentage of these accounts will open an opportunity during a given time period. To achieve good model performance, we need to choose a more balanced mix of accounts that have and have not recently opened an opportunity as our training data.
Additionally, when we are choosing accounts without a recent opportunity for training, we encounter further imbalances in the activity and intent features for these accounts, because the large majority of accounts in any CRM are unaware and unengaged. Since choosing accounts at random will primarily return accounts with zero activity or intent, we employ a stratified sampling approach so that we get examples of accounts without an opportunity that have a variety of different intent and activity features.
We have two primary goals when selecting a model architecture for Pipeline Predict:
This has led us to take an ensemble model approach:
To be useful, Pipeline Predict models need to adapt to changes in marketing and sales strategies, as well as the behavior of target accounts. Every model is therefore regularly retrained, and our customers can also retrain their models on-demand through the UI. This ensures the model has an up-to-date understanding of the specific actions that are leading to opportunity creation in the recent past.
Frequently updating the predictions for each account is critical to detecting high-priority accounts as early as possible, so we pull the latest data and make fresh predictions for every account every day.
In the past, Pipeline Predict has been focused solely on predicting new business opportunities. In January 2025, we rolled out Pipeline Predict for Growth, which adds the capability to predict which customer accounts are likely to open growth opportunities (upsells, cross-sells, renewals).
Customer accounts are very different from non-customer accounts, both in terms of the outcome we’re trying to predict (opportunity creation) and the features we’re using to make a prediction (activities and intent). The charts below show an example of the proportion of accounts with a currently open opportunity:
Customer accounts have fundamentally different behaviors compared to non-customer accounts, since they are already aware and likely very engaged. This means the model needs to learn different patterns of behavior that occur prior to opening a growth opportunity, and specifically, the customer accounts without an opportunity will tend to have a much larger volume of activity than the non-customer accounts without an opportunity.
Average activity counts for accounts in Demandbase’s CRM over 12 months:
Non-Customer Accounts | Customer Accounts | |
---|---|---|
Page Visits | 0.7 | 292.0 |
CRM & MAP Activities | 0.7 | 351.9 |
Weeks of High-strength Intent | 0.3 | 5.8 |
Intent Surge activities | 1.2 | 39.9 |
The goal of Pipeline Predict is to focus sales and marketing efforts onto the accounts most likely to open an opportunity, so the most important way we measure success is by checking whether the accounts that received a high Pipeline Predict score actually opened an opportunity soon afterward. We call this metric true precision.
True Precision = Accounts with a High Score 30 Days Ago and Opened an Opp within Last 30 Days /
Accounts with a High Score 30 Days Ago
We can also look at a benchmark to determine how good a model’s true precision is. Specifically, we want to compare the model’s true precision to what a sales rep might achieve without the model (more on this below).
Relative Lift = Model’s True Precision /
Sales Rep’s True Precision
A healthy Pipeline Predict model should meet the following criteria:
Pipeline Predict Label | # Accounts with Label | True Precision (TP) | Relative Lift |
---|---|---|---|
Highly Likely | 2,125 | 31.5% * | 2.9X |
Likely | 13,904 | 7.6% | 1.9X |
Unlikely | 1,031,194 | 0.0005% | 0.37X |
Sales Rep’s Account Ranking | Sales Rep’s True Precision (TP) | Relative Lift: Model TP vs Sales Rep’s TP |
---|---|---|
Top 2,125 accounts | 10.7% | Model TP is 2.9X Sales Rep’s TP |
Middle 13,904 accounts | 4.1% | Model TP is 1.9X Sales Rep’s TP |
Bottom 1,031,194 accounts | 0.1% | Model TP is 0.37X Sales Rep’s TP |
Sales and marketing users always want to know why a high-scoring account got a high score, so explainability is a critical piece of Pipeline Predict models.
We display the important features for each account in the Demandbase UI, using an LLM to convert model feature names into human-readable explanations. These explanations tell us why a particular account is hot right now.
We also aggregate the account-level explainability information across all accounts in the CRM, which tells our users which features are important overall. These explanations tell us what makes a hot account in general. There are examples from 2 models below:
Pipeline Predict provides a powerful AI/ML solution to understanding which accounts are likely to open an opportunity. By leveraging Demandbase’s large ecosystem of first-party and third-party data in addition to Demandbase’s proprietary datasets, we are able to construct a multi-faceted picture of each account’s journey toward pipeline. Advanced machine learning techniques uncover complex patterns of activity across multiple stakeholders at each account, and produce explainable rankings of accounts that allow marketing and sales users to focus their attention on the accounts most likely to convert. We continue to adapt and improve Pipeline Predict through additional data sources and improved modeling techniques, ensuring that it remains a valuable asset for driving pipeline for our customers.
Many people have developed and supported the Pipeline Predict project over many years. The Data Science team has continuously refined the model to improve accuracy and explainability across thousands of customers (Carmen Easterwood, Erin Oshinsky, Sukanya Tiwatne, Alex Veen, Sana Ghazi). Our Engineering team makes it possible to train hundreds of models and make predictions for well over half a billion accounts every single day (Manisha Sharma, Josh Cason, Adam Kramer). Our AI/ML product team’s strategic vision for Pipeline Predict has helped us continuously evolve this model to be more useful to our customers, including our recent rollout of Pipeline Predict for Growth (Cullen Wong, Marc Perramond). The UX team’s designs have made it possible for our users to interact with complex models and their predictions with ease (Sarah Neyaz). Finally, we would like to give special thanks to our CEO Gabe Rogol for providing valuable feedback on this post.
[1] Marketing and sales tend to be high-recall environments. That is, marketing and sales professionals cast a wide net because they don’t want to miss out on any potential deals, which leads to low precision (in a Machine Learning sense). This is why Pipeline Predict can bring so much value, because it makes the sales rep much more effective when they focus on the highest-ranked accounts.