Mon. Dec 23rd, 2024
Optimizing Treatment Strategies Using Dual Machine Learning And Linear Programming
Photo provided jordan mcdonald upon unsplash

Welcome to my series on causal AI. In this series, we discuss integrating causal inference into machine learning models. You will be expected to consider numerous practical applications across a variety of business contexts.

I looked into it in the previous article Bias removal treatment effect using double machine learning. This time, we will dig deeper into the possibilities of DML coverage. Optimize treatment strategies using double machine learning and linear programming.

If you missed the previous article on Double Machine Learning, check it out here.

In this article, we demonstrate how to optimize treatment strategies using dual machine learning and linear programming.

You are expected to gain a broad understanding of:

  • Why companies want to optimize their treatment strategies.
  • How conditional average treatment effects (CATE) can help personalize treatment strategies (also known as uplift modeling).
  • How to use linear programming to optimize treatment allocation given budget constraints.
  • A real case study in Python shows how to estimate CATE using dual machine learning and optimize treatment strategies using linear programming.

The complete notebook can be found here.

There are common questions that arise in most companies. “What is the best response for the customer to minimize costs and maximize future sales?”

Let’s explore this idea in more detail with a simple example.

Your business sells socks online. Since you’re not selling a must-have product, you need to encourage your existing customers to make repeat purchases. The main means for this is to send discounts. Therefore, the treatment strategy in this case is to send a discount.

  • 10% discount
  • 20% discount
  • 50% discount

Each discount has a different return on investment. If you recall from my previous article on average treatment effectiveness, you can see how he calculates the ATE for each of these discounts and chooses the most profitable discount.

But what if treatment effects are uneven? Treatment effects vary among different subgroups of the population.

This is when you need to start considering conditional average treatment effects (CATE).

kate

CATE is the average impact of a treatment or intervention on different subgroups of a population. ATE focused on “Is this treatment effective?” CATE, on the other hand, allows you to change the question to “Who should we treat?”

We “condition” the control function so that the treatment effect differs depending on the customer’s characteristics.

Recall the example of sending a discount. If customers with more previous orders respond better to discounts, you can condition on this customer characteristic.

I would like to point out that in marketing, CATE estimation is often referred to as uplift modeling.

CATE estimation using double machine learning

We talked about DML in a previous article, but just in case you’re wondering, here’s a little refresher.

“First stage:

  • Treatment model (bias removal): A machine learning model used to estimate the probability of treatment assignment (often called a propensity score). The residuals of the treatment model are then calculated.
  • Resulting model (denoised): A machine learning model used to estimate results using only control functions. The residuals of the resulting model are then calculated.

Second stage:

  • The treatment model residuals are used to predict the outcome model residuals. ”

Double Machine Learning can be used to estimate CATE by interacting the control feature (X) with the treatment effect in the second stage model.

User generated image

This is extremely powerful because it allows for customer-level therapeutic outcomes.

what is that?

Linear programming is an optimization technique that can be used to find an optimal solution to a linear function given some constraints. It is often used to solve transportation, scheduling, and resource allocation problems. A more general term is sometimes used: operations research.

Let’s take a closer look at linear programming with a simple example.

  • Decision variables: These are the unknown quantities you want to estimate the optimal value for: marketing spend on social media, TV, and paid search.
  • Objective function: It’s the linear equation we’re trying to minimize or maximize: marketing return on investment (ROI).
  • Constraints: Some restrictions on decision variables. It is usually expressed as a linear inequality. Total marketing costs range between £100,000 and £500,000.

The intersection of all constraints forms the feasible region. This is the set of all possible solutions that satisfy the specified constraints. The goal of linear programming is to find a point within the feasibility region that optimizes the objective function.

Assignment problem

An assignment problem is a particular type of linear programming problem where the objective is to assign a set of “tasks” to a set of “agents”. Let’s explain it concretely using an example.

Run an experiment in which you send different discounts to four random groups of existing customers (and you don’t actually send any discounts to the fourth group). Build two her CATE models. (1) Estimate how offer value affects order value, and (2) Estimate how offer value affects cost.

  • Agent: Existing customer base
  • Task: Whether to send a 10%, 20%, or 50% discount
  • Decision variable: Binary decision variable
  • Objective function: Order amount minus costs
  • Constraint 1: Each agent is assigned at most one task
  • Constraint 2: Cost ≥ 10,000
  • Constraint 3: Cost ≤ £100,000
User generated image

We basically want to find the best treatment for each customer, given the overall cost constraints. Linear programming can help you do this.

Note that this problem is “NP-hard,” a classification of problems that is at least as hard as the hardest problems in NP (nondeterministic polynomial time).

Linear programming is a very difficult but rewarding topic. Here are some ideas to get you started. If you would like to learn more, we recommend the following resources:

OR tool

The OR tool is an open source package developed by Google that can solve a variety of linear programming problems, including assignment problems. I’ll explain how it actually works later in this article.

background

Continuing with the assignment problem example, we’ll show you how to solve it in Python.

Data generation process

Set up a data generation process with the following characteristics:

  • Difficult nuisance parameters (b)
  • Heterogeneity of treatment effects (tau)

Characteristics X are customer characteristics taken before treatment.

User generated image

T is a binary flag indicating whether the customer received the offer. Create three different treatment interactions so that you can simulate different treatment effects.

User generated image
def data_generator(tau_weight, interaction_num):

# Set number of observations
n=10000

# Set number of features
p=10

# Create features
X = np.random.uniform(size=n * p).reshape((n, -1))

# Nuisance parameters
b = (
np.sin(np.pi * X[:, 0] * X[:, 1])
+ 2 * (X[:, 2] - 0.5) ** 2
+ X[:, 3]
+ 0.5 * X[:, 4]
+ X[:, 5] * X[:, 6]
+ X[:, 7] ** 3
+ np.sin(np.pi * X[:, 8] * X[:, 9])
)

# Create binary treatment
T = np.random.binomial(1, expit(b))

# treatment interactions
interaction_1 = X[:, 0] * X[:, 1] + X[:, 2]
interaction_2 = X[:, 3] * X[:, 4] + X[:, 5]
interaction_3 = X[:, 6] * X[:, 7] + X[:, 9]

# Set treatment effect
if interaction_num==1:
tau = tau_weight * interaction_1
elif interaction_num==2:
tau = tau_weight * interaction_2
elif interaction_num==3:
tau = tau_weight * interaction_3

# Calculate outcome
y = b + T * tau + np.random.normal(size=n)

return X, T, tau, y

You can use the data generator to simulate three treatments, each with a different treatment effect.

User generated image
np.random.seed(123)

# Generate samples for 3 different treatments
X1, T1, tau1, y1 = data_generator(0.75, 1)
X2, T2, tau2, y2 = data_generator(0.50, 2)
X3, T3, tau3, y3 = data_generator(0.90, 3)

As in the previous article, the Python code for the data generation process is based on the synthetic data creator from the Ubers Causal ML package.

Estimating CATE using DML

Next, you’ll train three DML models using LightGBM as a flexible first-stage model. This allows difficult nuisance parameters to be captured while accurately calculating treatment effects.

Note how we pass the X functionality through X instead of W (unlike in the previous article where we passed the X functionality through W). Features that pass through X are used in both the first and second stage models. In the second stage model, the features are used to create an interaction term with the treatment residuals.

np.random.seed(123)

# Train DML model using flexible stage 1 models
dml1 = LinearDML(model_y=LGBMRegressor(), model_t=LGBMClassifier(), discrete_treatment=True)
dml1.fit(y1, T=T1, X=X1, W=None)

# Train DML model using flexible stage 1 models
dml2 = LinearDML(model_y=LGBMRegressor(), model_t=LGBMClassifier(), discrete_treatment=True)
dml2.fit(y2, T=T2, X=X2, W=None)

# Train DML model using flexible stage 1 models
dml3 = LinearDML(model_y=LGBMRegressor(), model_t=LGBMClassifier(), discrete_treatment=True)
dml3.fit(y3, T=T3, X=X3, W=None)

Plotting the actual CATE versus the estimated CATE shows that the model is doing a good job.

# Create a figure and subplots
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Plot scatter plots on each subplot
sns.scatterplot(x=dml1.effect(X1), y=tau1, ax=axes[0])
axes[0].set_title('Treatment 1')
axes[0].set_xlabel('Estimated CATE')
axes[0].set_ylabel('Actual CATE')

sns.scatterplot(x=dml2.effect(X2), y=tau2, ax=axes[1])
axes[1].set_title('Treatment 2')
axes[1].set_xlabel('Estimated CATE')
axes[1].set_ylabel('Actual CATE')

sns.scatterplot(x=dml3.effect(X3), y=tau3, ax=axes[2])
axes[2].set_title('Treatment 3')
axes[2].set_xlabel('Estimated CATE')
axes[2].set_ylabel('Actual CATE')

# Add labels to the entire figure
fig.suptitle('Actual vs Estimated')

# Show plots
plt.show()

User generated image

simple optimization

We start by considering this as an optimization problem. We have three treatments available for our customers. Below, we create a cost mapping for each treatment and set an overall cost constraint.

# Create mapping for cost of each treatment
cost_dict = {'T1': 0.1, 'T2': 0.2, 'T3': 0.3}

# Set constraints
max_cost = 3000

It then estimates each customer’s CATE and first selects the most appropriate treatment for each customer. However, selecting the optimal treatment method does not always fit within the maximum cost constraint. Therefore, we select the customers with the highest CATE until we reach the maximum cost constraint.

# Concatenate features
X = np.concatenate((X1, X2, X3), axis=0)

# Estimate CATE for each treatment using DML models
Treatment_1 = dml1.effect(X)
Treatment_2 = dml2.effect(X)
Treatment_3 = dml3.effect(X)
cate = pd.DataFrame({"T1": Treatment_1, "T2": Treatment_2, "T3": Treatment_3})

# Select the best treatment for each customer
best_treatment = cate.idxmax(axis=1)
best_value = cate.max(axis=1)

# Map cost for each treatment
best_cost = pd.Series([cost_dict[value] for value in best_treatment])

# Create dataframe with each customers best treatment and associated cost
best_df = pd.concat([best_value, best_cost], axis=1)
best_df.columns = ["value", "cost"]
best_df = best_df.sort_values(by=['value'], ascending=False).reset_index(drop=True)

# Naive optimisation
best_df_cum = best_df.cumsum()
opt_index = best_df_cum['cost'].searchsorted(max_cost)
naive_order_value = round(best_df_cum.iloc[opt_index]['value'], 0)
naive_cost_check = round(best_df_cum.iloc[opt_index]['cost'], 0)

print(f'The total order value from the naive treatment strategy is {naive_order_value} with a cost of {naive_cost_check}')

User generated image

Optimization of treatment strategies using linear programming

First, create a dataframe containing the cost of each treatment for each customer.

# Cost mapping for all treatments
cost_mapping = {'T1': [cost_dict["T1"]] * 30000,
'T2': [cost_dict["T2"]] * 30000,
'T3': [cost_dict["T3"]] * 30000}

# Create DataFrame
df_costs = pd.DataFrame(cost_mapping)

Now we will use the OR tool package to solve this assignment problem. The code takes the following input:

  • cost constraints
  • An array containing the cost of each treatment for each customer
  • An array containing the estimated order amount for each treatment for each customer

This code outputs a dataframe containing potential treatments for each customer, and a column indicating which is the optimal allocation.

solver = pywraplp.Solver.CreateSolver('SCIP')

# Set constraints
max_cost = 3000
min_cost = 3000

# Create input arrays
costs = df_costs.to_numpy()
order_value = cate.to_numpy()

num_custs = len(costs)
num_treatments = len(costs[0])

# x[i, j] is an array of 0-1 variables, which will be 1 if customer i is assigned to treatment j.
x = {}
for i in range(num_custs):
for j in range(num_treatments):
x[i, j] = solver.IntVar(0, 1, '')

# Each customer is assigned to at most 1 treatment.
for i in range(num_custs):
solver.Add(solver.Sum([x[i, j] for j in range(num_treatments)]) <= 1)

# Cost constraints
solver.Add(sum([costs[i][j] * x[i, j] for j in range(num_treatments) for i in range(num_custs)]) <= max_cost)
solver.Add(sum([costs[i][j] * x[i, j] for j in range(num_treatments) for i in range(num_custs)]) >= min_cost)

# Objective
objective_terms = []
for i in range(num_custs):
for j in range(num_treatments):
objective_terms.append((order_value[i][j] * x[i, j] - costs[i][j] * x[i, j] ))
solver.Maximize(solver.Sum(objective_terms))

# Solve
status = solver.Solve()

assignments = []
values = []

if status == pywraplp.Solver.OPTIMAL or status == pywraplp.Solver.FEASIBLE:
for i in range(num_custs):
for j in range(num_treatments):
# Test if x[i,j] is 1 (with tolerance for floating point arithmetic).
if x[i, j].solution_value() > -0.5:
assignments.append([i, j])
values.append([x[i, j].solution_value(), costs[i][j] * x[i, j].solution_value(), order_value[i][j]])

# Create a DataFrame from the collected data
df = pd.DataFrame(assignments, columns=['customer', 'treatment'])
df['assigned'] = [x[0] for x in values]
df['cost'] = [x[1] for x in values]
df['order_value'] = [x[2] for x in values]

df

User generated image

Using an optimized treatment strategy, an order value of £18,000 can be generated while maintaining a cost constraint of £3,000. This is 36% higher than the naive approach.

opt_order_value = round(df['order_value'][df['assigned'] == 1].sum(), 0)
opt_cost_check = round(df['cost'][df['assigned'] == 1].sum(), 0)

print(f'The total order value from the optimised treatment strategy is {opt_order_value} with a cost of {opt_cost_check}')

User generated image

Today we talked about how to use dual machine learning and linear programming to optimize treatment strategies. Finally, here are some thoughts:

  • Although we have discussed linear DML, you can also consider other approaches that are better suited to handle complex interaction effects in your second-stage model.
  • However, please also note that you don’t have to use DML, you can use other methods such as T-Learner or DR-Learner.
  • To make this article easier to read, I did not adjust any hyperparameters. As the complexity of the problem and the approach used increases, more attention needs to be paid to this part.
  • Linear programming/assignment problems are NP-hard, so if you have a large customer base or multiple processes, this part of your code can take a long time to run.
  • Operating daily pipelines with linear programming and assignment problems can be difficult. Alternatively, you can run regular optimizations and learn the best policy based on the results to create segmentations for use in your daily pipeline.