Publish AI, ML & data-science insights to a global community of data professionals.

Unleash the Power of Probability to Predict the Future of Your Business 🚀

A Practical Guide to Applying Probability Concepts with Python in Real-World Contexts

Source
Source

Introduction

Tired of Guessing About The Future? 🤔

You’re in the right place! My name is Sabrine. I’m an applied mathematics engineer working in AI for 10 years, and I struggled in my early experiences to bring the power of probability into the real world. Managers like charts, graphs, and KPIs, but they never ask to use probability to answer business questions.

I’ve always found that all the probability we learn in school is underused or maybe only used via some ML concepts. Yet, probability can be a powerful tool to answer business questions.

You don’t believe me? Let me show you how 🔐 .

Today’s Menu 🍔

  • 🍛 Fundamental Probability Concepts: Events, intersections, unions, independent events, conditional probabilities, Bayes’ Theorem, the famous Normal distribution.
  • 🥤 Practical Python Implementations: Hands-on examples using Python to simulate real-world business scenarios.
  • 🍰 Industry Applications: Explore how probability models drive decision in risk management, supply chain, and marketing.

Before jumping into probability, maybe you want to review some important statistical concepts! I’ve summarized here my two articles that can help you: Some basics statistics concepts and confidence interval .

1. Understanding Probability in Business 🎯

What Is Probability Theory?

Probability theory helps us make sense of uncertainty. It lets businesses figure out how likely different outcomes are, so they can make smarter decisions.

We can’t erase uncertainty, but we can predict it.

Some vocabulary:

  • Events: Occurrences that we can measure.
  • Probability Distribution: A function that describes the likelihood of different occurrences.
  • Variable: A variable whose values depend on outcomes of a phenomenon.
  • Bayesian Probability: A method of statistical inference that updates the probability estimate as more evidence becomes available.

Why Probability Matters for Businesses

  • Risk Assessment: Quantify risks and their impacts to make decisions.
  • Forecasting: Predict future trends and manage the uncertainity part of the prediction.
  • Strategic Planning: Develop robust strategies to handle the variability a specific market.

2. Fundamental Probability Concepts 🧠

2.1. Defining Probabilities 😎

Probability quantifies the likelihood of an event occurring. It ranges from 0 (impossible event) to 1 (certain event). Understanding probabilities helps you anticipate outcomes and make decisions that account for various levels of uncertainty.

Formula:

By the author
By the author

Scenario:

Imagine you want to determine the probability that a new product launch will be successful based on historical data.

# Calculate the probability of a successful product launch
total_launches = 50
successful_launches = 30

P_success = successful_launches / total_launches
print(f"Probability of a successful product launch: {P_success:.2%}")

Output:

Probability of a successful product launch: 60.00%

Explanation:

With a 60% probability of success based on past launches, you can get an idea about the risk and potential return of investing in a new similar product. What do you think? Let’s pursue!

2.2. The Counting Principle ✌️

The Counting Principle helps determine the number of possible outcomes in a sequence of events. It’s essential when you need to calculate probabilities for multiple steps experiments.

Formula:

By the author
By the author

where n1,n2,…,nk​ are the number of possible outcomes for each event.

Scenario:

Imagine that you work for a marketing team, and your manager wants you to determine the number of different promotional bundles possible if we have three distinct products and two variants for each one.

# Calculate the number of possible promotional bundles
products = 3
variants_per_product = 2

total_bundles = variants_per_product ** products
print(f"Total number of promotional bundles: {total_bundles}")

Output:

Total number of promotional bundles: 8

Explanation:

There are 8 possible promotional bundles. We can now evaluate which combinations might appeal most to our customers. Imagine you have thousands of products and variants. This analysis is crucial for allocating the budget among the different bundles 💸

3. Understanding Events

3.1. Introduction to Events 🏟 ️

In probability, an event is a set of one or more outcomes of an experiment. Events can be:

  • Simple Events: Consisting of a single outcome.
  • Compound Events: Consisting of multiple outcomes.

To calculate probabilities accurately, it’s essential to understand what events are. Let’s dive in!

3.2. Intersection of Events ❎

The intersection of two events A and B (denoted as A∩B) refers to the occurrence of both events simultaneously.

Formula:

By the author
By the author

Scenario:

The e-commerce platform which we work for, wants us to find the probability that a customer makes a purchase and uses a discount code in the same transaction.

# Calculate the intersection probability of making a purchase and using a discount code
total_transactions = 10000
purchase_and_discount = 1500

P_purchase_and_discount = purchase_and_discount / total_transactions
print(f"Probability of making a purchase and using a discount code: {P_purchase_and_discount:.2%}")

Output:

Probability of making a purchase and using a discount code: 15.00%

Explanation:

We can see that 15% of transactions involve both purchasing and using a discount code. Interesting, right? How can we use this to boost sales?

3.3. Union of Events 🤝🏿

The union of two events A and B (denoted as A∪B) refers to the occurrence of at least one of the events.

Formula:

By the author
By the author

Scenario:

Our software company needs to know the probability that a customer either renews their subscription or upgrades to a premium plan.

# Calculate the union probability of renewing subscription or upgrading to premium
P_renew = 0.60  # 60% renew subscription
P_upgrade = 0.30  # 30% upgrade to premium
P_renew_and_upgrade = 0.20  # 20% do both
P_renew_or_upgrade = P_renew + P_upgrade - P_renew_and_upgrade
print(f"Probability of renewing or upgrading: {P_renew_or_upgrade:.2%}")

Output:

Probability of renewing or upgrading: 70.00%

Explanation:

There is a 70% probability that our customers will either renew their subscription or upgrade to a premium plan. This signifies a strong retention and upselling strategy. Maybe it could be interesting to compare this number to our competitors?

4. Independent and Dependent Events

4.1. Independent Events 🔗

Independent events are events where the occurrence of one event doesn’t affects the probability of the other event occurring.

Formula:

Assuming these events are independent.

By the author
By the author

Where:

  • P(A∩B) is the probability of both events A and B occurring.
  • P(A) is the probability of event A occurring.
  • P(B) is the probability of event B occurring.

Scenario:

A telecom company wants to calculate the probability that a customer renews their contract and also purchases an additional service, assuming these events are independent.

Given:

  • P(Renew)=0.40 (40% of customers renew their contract)
  • P(Additional Service)=0.50 (50% of customers purchase an additional service)

Calculations:

P(RenewANDAdditional Service)=P(Renew)×P(Additional Service)=0.40×0.50=0.20

So, there’s a 20% probability that a customer will both renew their contract and purchase an additional service.

# Given probabilities
P_renew = 0.40  # 40% renew contract
P_additional_service = 0.50  # 50% purchase additional service

# Calculate the probability of both events occurring
P_renew_and_additional = P_renew * P_additional_service

print(f"Probability of renewing and purchasing additional service: {P_renew_and_additional:.2%}")

# Since the events are assumed to be independent
are_independent = True
print("Are the two events independent?", are_independent)

Output:

Probability of renewing and purchasing additional service: 20.00%
Are the 2 events independents? True

Explanation:

A 20% probability that a customer will both renew their contract and purchase an additional service suggests that these two behaviors occur together at that rate, assuming they are independent. The company can develop separate marketing campaigns for contract renewals and for promoting additional services, as one does not affect the other.

4.2. Independent vs. Mutually Exclusive Events 🤼‍♂️

  • Independent Events: The occurrence of one event does not affect the probability of the other.
  • Mutually Exclusive Events: Two events cannot occur at the same time; the occurrence of one event excludes the possibility of the other.

It’s important to differentiate between these two to avoid confusion in probability calculations.

Formula:

For mutually exclusive events A and B:

By the author
By the author

And for their union:

By the author
By the author
  • Since P(A∩B)=0 for mutually exclusive events.

Scenario:

Suppose a customer can choose to receive either a 10% discount coupon or a free gift with their purchase, but not both. These two offers are mutually exclusive.

Given:

  • Probability of receiving a discount coupon: P(Discount)=0.60
  • Probability of receiving a free gift: P(Gift)=0.30

Calculations:

Since the offers are mutually exclusive:

  • P(DiscountANDGift)=0
  • P(DiscountORGift)=P(Discount)+P(Gift)=0.60+0.30=0.90

There’s a 90% probability that a customer will receive either a discount coupon or a free gift

# Calculate the probability of receiving either a discount or a gift
P_discount = 0.60   # 60% chance of receiving a discount coupon
P_gift = 0.30       # 30% chance of receiving a free gift

# Since the events are mutually exclusive
P_either_offer = P_discount + P_gift
print(f"Probability of receiving either a discount or a gift: {P_either_offer:.2%}")

Output:

Probability of receiving either a discount or a gift: 90.00%

Explanation:

A 90% probability means most customers receive an incentive, which can boost sales. Understanding these probabilities helps in planning inventory and managing promotion costs.

5. Key Probability Theorems

5.1. Conditional Probabilities 🥕

Conditional probability is the probability of an event occurring given that another event has already occurred.

Formula:

By the author
By the author

where P(A∣B) is the probability of event A occurring given event B has occurred.

Scenario:

Suppose you’re a retailer wanting to determine the probability that a customer will purchase a warranty plan given that they have already purchased a laptop.

# Calculate conditional probability of purchasing warranty given laptop purchase
P_laptop = 0.30  # 30% purchase a laptop
P_warranty_and_laptop = 0.10  # 10% purchase both laptop and warranty
P_warranty_given_laptop = P_warranty_and_laptop / P_laptop
print(f"Probability of purchasing warranty given laptop purchase: {P_warranty_given_laptop:.2%}")

Output:

Probability of purchasing warranty given laptop purchase: 33.33%

Explanation:

One-third of customers who purchase a laptop also opt for a warranty plan. Maybe we have to think about a strategy to increase warranty uptake.

5.2. Bayes’ Theorem 🍍

Bayes’ Theorem allows you to update prior probabilities based on new evidence. It’s particularly useful when dealing with conditional probabilities

Formula:

By the author
By the author

where:

  • P(A∣B) is the posterior probability of event A given event B.
  • P(B∣A) is the likelihood of event B given event A.
  • P(A) is the prior probability of event A.
  • P(B) is the marginal probability of event B.

Scenario:

A bank wants to determine the probability that a loan applicant is a defaulter given that they missed a payment.

# Define known probabilities
P_default = 0.05  # 5% of applicants default
P_missed_payment_given_default = 0.80  # 80% of defaulters miss a payment
P_missed_payment_given_non_default = 0.10  # 10% of non-defaulters miss a payment
P_non_default = 1 - P_default

# Calculate P(missed_payment)
P_missed_payment = (P_missed_payment_given_default * P_default) + (P_missed_payment_given_non_default * P_non_default)
# Apply Bayes' Theorem
P_default_given_missed_payment = (P_missed_payment_given_default * P_default) / P_missed_payment
print(f"Probability of default given missed payment: {P_default_given_missed_payment:.2%}") 

Output:

Probability of default given missed payment: 30.00%

Explanation:

With a 30% probability that an applicant is a defaulter given they missed a payment, the bank can adjust its risk assessment strategies. This allows for more informed lending decisions.

6. Concepts to Understand Distributions

6.1 Equiprobable vs. Non-Equiprobable Outcomes 🙉

  • Equiprobable Outcomes: All possible outcomes have the same probability of occurring (e.g., fair dice roll).
  • Non-Equiprobable Outcomes: Outcomes have different probabilities based on biases or external factors.

Formula:

  • Equiprobable:
By the author
By the author
  • where n is the number of possible outcomes.
  • Non-Equiprobable: Each outcome i has its own probability P(Ei).

Scenario:

Your marketing team wants to analyze customer responses across different advertising channels, each with a different effectiveness rate.

# Define probabilities for marketing channels
marketing_channels = ['Email', 'Social Media', 'TV', 'Radio']
probabilities = [0.40, 0.30, 0.20, 0.10]

# Display probabilities
for channel, prob in zip(marketing_channels, probabilities):
    print(f"Probability of response from {channel}: {prob:.2%}")

Output:

Probability of response from Email: 40.00%
Probability of response from Social Media: 30.00%
Probability of response from TV: 20.00%
Probability of response from Radio: 10.00%

Explanation:

Knowing that Email campaigns have a 40% response rate compared to Radio’s 10% helps you allocate budgets more effectively. Let’s focus on the channels with higher probabilities to maximize ROI.

6.2. Discrete vs. Continuous Variables 🌋

  • Discrete Variables: Can take on a countable number of distinct values (e.g., number of purchases).
  • Continuous Variables: Can take on an infinite number of values within a given range (e.g., revenue amounts).

Identifying whether a variable is discrete or continuous helps in selecting appropriate probability distributions.

6.3. Normal Distribution ⚖️

The Normal Distribution is a continuous probability distribution characterized by its symmetric, bell-shaped curve. It’s defined by its mean (μ) and standard deviation (σ). Many natural phenomena and measurement errors tend to follow a normal distribution, making it widely applicable in business analytics.

Formula:

Probability Density Function of Normal Distribution:

By the author
By the author

Scenario:

Your company wants to analyze the distribution of customer spending to identify typical spending ranges and detect anomalies.

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import norm

# Parameters for normal distribution
mu, sigma = 200, 50  # mean and standard deviation
# Generate data
data = np.random.normal(mu, sigma, 1000)
# Plot the distribution
sns.histplot(data, bins=30, kde=True, stat="density", color='skyblue')
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = norm.pdf(x, mu, sigma)
plt.plot(x, p, 'r', linewidth=2)
plt.title('Normal Distribution of Customer Spending')
plt.xlabel('Amount Spent ($)')
plt.ylabel('Density')
plt.show()

Output:

By the author
By the author

Explanation:

We can see that on average, our customers spend $200. If the mean is $200 and standard deviation is $50, then 95% of customers spend within $100 to $300 (mean ± 2σ).

7. Practical Applications of Probability in Business

7.1. Risk Assessment and Management

Risk assessment involves identifying, analyzing, and prioritizing potential risks that could negatively impact business operations.

Formula:

By the author
By the author

Scenario:

A project manager assesses the risk of delays in a project based on historical data.

# Define risks with their probabilities and impacts
risks = {
    'Supplier Delay': {'probability': 0.2, 'impact': 10000},
    'Labor Shortage': {'probability': 0.1, 'impact': 15000},
    'Technical Glitch': {'probability': 0.15, 'impact': 12000},
}

# Calculate expected risk for each
for risk, details in risks.items():
    expected_risk = details['probability'] * details['impact']
    print(f"Expected risk from {risk}: ${expected_risk:.2f}")

Output:

Expected risk from Supplier Delay: $2000.00
Expected risk from Labor Shortage: $1500.00
Expected risk from Technical Glitch: $1800.00

Explanation:

By calculating the expected risks, we can prioritize which risks to address first based on their potential financial impact. This probabilistic approach ensures efficient resource allocation.

7.2. Inventory Management and Supply Chain Optimization

Effective inventory management ensures that businesses maintain optimal stock levels to meet customer demand without overstocking or understocking.

Formula:

By the author
By the author

where:

By the author
By the author

Z is the Z-score corresponding to the desired service level and σ​ is the standard deviation of demand during lead time.

Scenario:

A retailer wants to determine the reorder point for a popular product.

import numpy as np
from scipy.stats import norm

# Parameters
average_demand = 50  # units per week
lead_time = 2  # weeks
standard_deviation_demand = 10  # units per week
service_level = 0.95  # 95% service level
# Calculate safety stock
z_score = norm.ppf(service_level)
safety_stock = z_score * np.sqrt(lead_time) * standard_deviation_demand
# Calculate reorder point
reorder_point = (average_demand * lead_time) + safety_stock
print(f"Safety Stock: {safety_stock:.2f} units")
print(f"Reorder Point: {reorder_point:.2f} units")

Output:

Safety Stock: 27.99 units
Reorder Point: 127.99 units

Explanation:

With a reorder point of approximately 128 units, you ensure that new stock is ordered before existing inventory finished, maintaining a 95% service level. Let’s keep those customers happy by ensuring product availability!

7.3. Marketing and Customer Behavior Analysis

Understanding customer behavior is crucial for effective marketing strategies. Probability models help analyze customer preferences and predict purchasing patterns.

Scenario:

A subscription-based service wants to calculate the expected Customer Lifetime Value (CLV) to inform marketing spend.

Formula:

By the author
By the author

where:

  • Pt​ is the probability of the customer being active at time t.
  • Gt​ is the gross profit at time t.
  • d is the discount rate.
  • T is the time horizon.
# Calculate Customer Lifetime Value (CLV)
import numpy as np
# Parameters
prob_active = 0.90  # 90% probability of staying active each month
gross_profit = 50  # $50 profit per active customer per month
discount_rate = 0.01  # 1% monthly discount rate
time_horizon = 24  # 2 years
# Calculate CLV
clv = sum([(prob_active ** t) * gross_profit / ((1 + discount_rate) ** t) for t in range(1, time_horizon + 1)])
print(f"Customer Lifetime Value (CLV): ${clv:.2f}")

Output:

Customer Lifetime Value (CLV): $383.40

Explanation:

An estimated CLV of $383 helps you determine how much you can afford to spend on acquiring a new customer while remaining profitable. We can use this insight to optimize our marketing budget!


References and Ressources if you want to go further🍀

You made it to the end – congrats! 🎉 I hope you enjoyed this article. If you found it helpful, please consider leaving a like and following me on Medium | LinkedIn. I will regularly write about demystifying machine learning algorithms, clarifying statistics concepts, and sharing insights on deploying ML projects into production.

See you soon!

Note: Some parts of this article were initially written in French and translated into English with the assistance of ChatGPT.


Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

Related Articles