3.1 Revenue Realization Per State
Summary¶
Summary¶
3.3 High-Revenue States: Volume vs. Average Order Value (AOV)
Summary¶
3.4 Regional Growth (Revenue & Order) Over Time
Summary¶
import pandas as pd
import sqlite3
from sqlalchemy import create_engine
import os
import numpy as np
import sys
from IPython.display import display, Markdown, Image, HTML
from pathlib import Path
import json
import requests
import math
import warnings
warnings.filterwarnings("ignore")
import importlib
import chart_studio.plotly as py
import plotly.express as px
import plotly.graph_objects as go
import cufflinks as cf
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
cf.go_offline()
# Add the project root to the system path:
project_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
if project_root not in sys.path:
sys.path.append(project_root)
# Import custom modules:
from utils import helper
from src.regional_performance import regional_performance_metrics, regional_performance_summary
from plots import regional_performance_plots
# Create a connection to the ecommerce database which is required for RFM metric function calls:
conn = sqlite3.connect("../data/ecommerce.db")
# A helper function to reload the package to pick up necessary changes:
def reload_package():
importlib.reload(regional_performance_metrics)
importlib.reload(regional_performance_summary)
importlib.reload(helper)
print("Packages reloaded!!")
# Create a global path for all SQL files that is related to the regional performance analysis:
SQL_BASE_PATH = Path("../sql/04_Regional_Performance_with_Growth_Trends")
helper.extract_table_names(conn=conn)
| name | |
|---|---|
| 0 | customers |
| 1 | geolocation |
| 2 | orders |
| 3 | order_items |
| 4 | order_payments |
| 5 | order_reviews |
| 6 | products |
| 7 | sellers |
| 8 | product_category_translation |
To analyze regional sales trends over time by revenue, order volume and average order value (AOV), enabling the identification of growing and declining regions and helping distinguish between demand-driven and value-driven performance changes.
For the Regional Performance analysis, the tables below will be used:
customer_unique_id feature to identify the unique customers.geolocation_zip_code_prefix related to specific customer location.order_status, where delivered status is the key for the analysis.reload_package()
Packages reloaded!!
get_rev_by_state_at_customer_level() function from regional_performance_metrics module to get the information about the total amount spent by rev_by_state_w_customer_info = regional_performance_metrics.get_rev_by_state_at_customer_level(conn=conn,
sql_base_path=SQL_BASE_PATH)
The table hold these information as displayed below:
rev_by_state_w_customer_info.head()
| customer_unique_id | total_spending | zip_code | city | state | |
|---|---|---|---|---|---|
| 0 | 0000366f3b9a7992bf8c76cfdf3221e2 | 141.90 | 7787 | cajamar | SP |
| 1 | 0000b849f77a49e4a4ce2b2a4ca5be3f | 27.19 | 6053 | osasco | SP |
| 2 | 0000f46a3911fa3c0805444483337064 | 86.22 | 88115 | sao jose | SC |
| 3 | 0000f6ccb0745a6a4b88665a16c9f078 | 43.62 | 66812 | belem | PA |
| 4 | 0004aac84e0df4da2b147fca70cf8255 | 196.89 | 18040 | sorocaba | SP |
state and sum up the total_spendings:¶total_rev_per_state = (rev_by_state_w_customer_info.groupby(by=['state'])
.agg(total_revenue = ('total_spending', 'sum'))
.reset_index())
# Add an extra column to represent the revenue contribution of each state in percentage:
total_rev_per_state['revenue_contribution (%)'] = (np.round(total_rev_per_state['total_revenue']
/
total_rev_per_state['total_revenue'].sum(), 4) * 100)
The aggregated table total_rev_per_state holds these information as displayed below:
total_rev_per_state.head()
| state | total_revenue | revenue_contribution (%) | |
|---|---|---|---|
| 0 | AC | 19575.33 | 0.13 |
| 1 | AL | 94172.49 | 0.61 |
| 2 | AM | 27585.47 | 0.18 |
| 3 | AP | 16141.81 | 0.10 |
| 4 | BA | 590861.20 | 3.83 |
Get the mapping data as json:
url = "https://gist.githubusercontent.com/henriquejensen/1032c47a44d2cddaa2ef47fc531025db/raw"
state_names = requests.get(url).json()
Create a dictionary out of the key & values pairs:
state_mapping = {state['sigla']: state['nome'] for state in state_names['UF']}
Add an additional column called state_names mapped with the abbreviated names for the full state names:
total_rev_per_state['state_names'] = total_rev_per_state['state'].map(state_mapping)
The highest revenue generating state:
highest_rev_generating_state = (total_rev_per_state[total_rev_per_state['total_revenue']
==
total_rev_per_state['total_revenue'].max()])
highest_rev_generating_state
| state | total_revenue | revenue_contribution (%) | state_names | |
|---|---|---|---|---|
| 25 | SP | 5773869.02 | 37.44 | São Paulo |
The lowest revenue generating state:
lowest_rev_generating_state = (total_rev_per_state[total_rev_per_state['total_revenue']
==
total_rev_per_state['total_revenue'].min()])
lowest_rev_generating_state
| state | total_revenue | revenue_contribution (%) | state_names | |
|---|---|---|---|---|
| 21 | RR | 9039.52 | 0.06 | Roraima |
plot_total_rev_per_state() function from regional_performance_plots module toreload_package()
Packages reloaded!!
plt_total_rev_by_state = regional_performance_plots.plot_total_rev_per_state(df=total_rev_per_state)
get_revenue_real_per_state_summary() function from regional_performance_summary Revenue Realization Per State section:total_states = total_rev_per_state['state_names'].nunique()
regional_performance_summary.get_revenue_real_per_state_summary(highest_rev_generating_state=highest_rev_generating_state,
lowest_rev_generating_state=lowest_rev_generating_state,
plt_total_rev_by_state=plt_total_rev_by_state,
total_states=total_states)
Analyze the total realized revenue across 27 Brazilian states (based on delivered orders only) and visualize the geographic revenue distribution using a choropleth map, where color intensity represents revenue magnitude.
São Paulo is the highest revenue-generating state, contributing R$ 5,773,869.02 (37.44% of total revenue).
Roraima is the lowest revenue-generating state, contributing R$ 9,039.52 (0.06% of total revenue.
The choropleth visualization highlights strong geographic revenue concentration, enabling quick identification of high-value and underperforming regions.
total_states
27
get_order_vol_by_state_at_customer_level() function from regional_performance_metrics module to get the order counts of each customer for different states:order_count_at_customer_lvl = regional_performance_metrics.get_order_vol_by_state_at_customer_level(conn=conn,
sql_base_path=SQL_BASE_PATH)
print(f"The `order_count_at_customer_lvl` table has a total of {order_count_at_customer_lvl.shape[0]} rows"
f" and a {order_count_at_customer_lvl.shape[1]} features.")
The `order_count_at_customer_lvl` table has a total of 93358 rows and a 3 features.
The table order_count_at_customer_lvl holds the information displayed below:
order_count_at_customer_lvl.head()
| customer_unique_id | total_orders | state | |
|---|---|---|---|
| 0 | 0000366f3b9a7992bf8c76cfdf3221e2 | 1 | SP |
| 1 | 0000b849f77a49e4a4ce2b2a4ca5be3f | 1 | SP |
| 2 | 0000f46a3911fa3c0805444483337064 | 1 | SC |
| 3 | 0000f6ccb0745a6a4b88665a16c9f078 | 1 | PA |
| 4 | 0004aac84e0df4da2b147fca70cf8255 | 1 | SP |
state level and summing up the total_orders:¶order_volume_per_state = (order_count_at_customer_lvl.groupby('state')
.agg(order_volume = ('total_orders', 'sum'))
.reset_index())
The table order_volume_per_state holds the information below:
order_volume_per_state.head()
| state | order_volume | |
|---|---|---|
| 0 | AC | 80 |
| 1 | AL | 397 |
| 2 | AM | 145 |
| 3 | AP | 67 |
| 4 | BA | 3254 |
state_mapping from earlier section:¶order_volume_per_state['state_names'] = order_volume_per_state['state'].map(state_mapping)
order_volume_per_state['order_volume_contribution (%)'] = (np.round(order_volume_per_state['order_volume']
/
order_volume_per_state['order_volume'].sum(), 4) * 100)
order_volume_per_state.head()
| state | order_volume | state_names | order_volume_contribution (%) | |
|---|---|---|---|---|
| 0 | AC | 80 | Acre | 0.08 |
| 1 | AL | 397 | Alagoas | 0.41 |
| 2 | AM | 145 | Amazonas | 0.15 |
| 3 | AP | 67 | Amapá | 0.07 |
| 4 | BA | 3254 | Bahia | 3.37 |
The state with the highest order volume:
highest_order_volume = (order_volume_per_state[order_volume_per_state['order_volume']
==
order_volume_per_state['order_volume'].max()])
highest_order_volume
| state | order_volume | state_names | order_volume_contribution (%) | |
|---|---|---|---|---|
| 25 | SP | 40519 | São Paulo | 42.0 |
The state with the lowest order volume:
lowest_order_volume = (order_volume_per_state[order_volume_per_state['order_volume']
==
order_volume_per_state['order_volume'].min()])
lowest_order_volume
| state | order_volume | state_names | order_volume_contribution (%) | |
|---|---|---|---|---|
| 21 | RR | 41 | Roraima | 0.04 |
plot_total_order_vol_per_state() function from regional_performance_plots module Total Orders Per Statereload_package()
Packages reloaded!!
plt_total_orders_per_state = regional_performance_plots.plot_total_order_vol_per_state(df=order_volume_per_state)
get_order_volume_per_state_summary() function from regional_performance_summary Order-Volume Per State section:total_states = order_volume_per_state['state_names'].nunique()
regional_performance_summary.get_order_volume_per_state_summary(total_states = total_states,
highest_order_volume=highest_order_volume,
lowest_order_volume=lowest_order_volume,
plot=plt_total_orders_per_state)
Analyze the total volume of orders across the 27 Brazilian states based on delivered orders and visualize the
geographic order volume distribution via choropleth map, where the color intensity represents the magnitude of
the volume of orders made.
São Paulo has the highest volume for orders and is also the highest
revenue generating state.
Roraima has the lowest volume for orders and is also the lowest
revenue generating state.
Volume of Order and Revenue Generation has a linear relationship for these two states, this relationship
will be analyzed for other states in the later stage.The choropleth visualization below highlights strong geographic order-volume concentration, enabling quick identification of states where the most of the revenue for the company lies in:
reload_package()
Packages reloaded!!
total_rev_per_state and order_volume_per_state:¶performance_by_state = pd.merge(total_rev_per_state,
order_volume_per_state,
on=['state', 'state_names'],
how='inner')
AOV (Average Order Value):¶performance_by_state['aov_index'] = (
performance_by_state['revenue_contribution (%)'] /
performance_by_state['order_volume_contribution (%)']
)
Re-ordering the columns:
performance_by_state = performance_by_state.loc[:, ['state', 'state_names','total_revenue',
'order_volume', 'revenue_contribution (%)',
'order_volume_contribution (%)', 'aov_index']]
performance_by_state.head()
| state | state_names | total_revenue | order_volume | revenue_contribution (%) | order_volume_contribution (%) | aov_index | |
|---|---|---|---|---|---|---|---|
| 0 | AC | Acre | 19575.33 | 80 | 0.13 | 0.08 | 1.625000 |
| 1 | AL | Alagoas | 94172.49 | 397 | 0.61 | 0.41 | 1.487805 |
| 2 | AM | Amazonas | 27585.47 | 145 | 0.18 | 0.15 | 1.200000 |
| 3 | AP | Amapá | 16141.81 | 67 | 0.10 | 0.07 | 1.428571 |
| 4 | BA | Bahia | 590861.20 | 3254 | 3.83 | 3.37 | 1.136499 |
max_val = performance_by_state[performance_by_state['aov_index'] == performance_by_state['aov_index'].max()]
max_val
| state | state_names | total_revenue | order_volume | revenue_contribution (%) | order_volume_contribution (%) | aov_index | |
|---|---|---|---|---|---|---|---|
| 14 | PB | Paraíba | 137529.57 | 515 | 0.89 | 0.53 | 1.679245 |
min_val = performance_by_state[performance_by_state['aov_index'] == performance_by_state['aov_index'].min()]
min_val
| state | state_names | total_revenue | order_volume | revenue_contribution (%) | order_volume_contribution (%) | aov_index | |
|---|---|---|---|---|---|---|---|
| 25 | SP | São Paulo | 5773869.02 | 40519 | 37.44 | 42.0 | 0.891429 |
plot_aov_wrt_revenue_and_order_vol() function from regional_performance_plots module:reload_package()
Packages reloaded!!
plt_aov_wrt_reve_and_order_vol = regional_performance_plots.plot_aov_wrt_revenue_and_order_vol(df=performance_by_state)
get_order_aov_relation_w_revenue_summary() function from regional_performance_summary High-Revenue States: Volume vs. Average Order Value (AOV) regional_performance_summary.get_order_aov_relation_w_revenue_summary(max_val=max_val,
min_val=min_val,
plot=plt_aov_wrt_reve_and_order_vol)
Determine whether high-revenue states are driven by higher order volume or higher average order value (AOV).
Paraíba state has the highest AOV of 1.68x
but is the lowest revenue generating state, contributing 0.89% in the
total revenue, with the lowest number of orders contribution 0.53%
in the total order-volume.
São Paulo state has the lowest AOV of 0.89x
but is the highest revenue generating state, with a contribution of 37.44%,
in the total revenue, with the highest number of orders contribution
42.00%, in the total order-volume.
AOV and Order volume shows an inverse relationship, several states exhibit high AOV but low order volume,
indicating premium but niche markets.
Business Implication:¶
States with high volume orders benefit more from scale optimization, while states with high AOV can derive better results with targeted offerings to their VIP customers.
A chart below represents the relation of the revenue with respect to the order volume and AOV:
get_orders_and_revenue_by_time_period_per_state() function regional_performance_metrics module:total_ordrs_and_revn_by_time_period = (regional_performance_metrics.get_orders_and_revenue_by_time_period_per_state
(conn=conn, sql_base_path=SQL_BASE_PATH))
Mapping the full-state name:
total_ordrs_and_revn_by_time_period['state_name'] = total_ordrs_and_revn_by_time_period['state'].map(state_mapping)
n states by revenue, where n is the number total_revenue:Call the plot_growth_rev_over_time() function from regional_performance_plots module which returns
defined n number of top states and the growth revenue over time for n of states:
n=3
top_n_states_by_rev_data_agg, rev_fig = (regional_performance_plots
.plot_growth_rev_over_time(df=total_ordrs_and_revn_by_time_period, n=n))
top_n_states_by_rev_data_agg
| state_name | total_revenue | |
|---|---|---|
| 25 | São Paulo | 5769703.15 |
| 20 | Rio de Janeiro | 2055401.57 |
| 12 | Minas Gerais | 1818891.67 |
n states by by orders, where n is the number total_orders:Call the plot_growth_order_vol_over_time() function from regional_performance_plots module which returns
defined n number of top states and the growth in order volume over time for n of states:
reload_package()
Packages reloaded!!
n = 3
top_n_states_by_orders_data_agg, order_fig = (regional_performance_plots
.plot_growth_order_vol_over_time(df=total_ordrs_and_revn_by_time_period, n=n))
reload_package()
Packages reloaded!!
get_regional_growth_revenue_and_order_over_time_summary() regional_performance_summary module to load the summary for Regional Growth (Revenue & Order) Over Time section:(regional_performance_summary.
get_regional_growth_revenue_and_order_over_time_summary(top_n_states_by_orders_data_agg=top_n_states_by_orders_data_agg,
top_n_states_by_rev_data_agg=top_n_states_by_rev_data_agg,
order_fig=order_fig,
revenue_fig=rev_fig))
Analyze regional growth trends over time to understand how revenue and order volume evolve together and identify states driving sustained business growth.
São Paulo consistently leads across time with the highest order
volume of 46,448.00 and shows the steepest growth trajectory,
significantly outperforming other states.São Paulo is also the highest revenue-generating state
of $R5,769,703.15, reinforcing its role as the primary
growth engine for the business.Business Implicaton:¶
São Paulorepresents the company’s most critical market from both volume and revenue perspectives. The strong and consistent performance of southeastern states suggests that operational efficiency, logistics maturity, and customer density may be key growth drivers.
The company could experiment with replicating successful operational and marketing strategies from these top-performing states in other high-potential regions through controlledA/B testingto assess scalability.
This analysis examined regional performance across 27 Brazilian states using four complementary lenses: total revenue distribution, order volume patterns, the relationship between volume and average order value (AOV), and temporal growth trends. The goal is to assess geographic performance differences and identify the main drivers of regional revenue contribution.
1. Geographic Concentration Defines the Business:
Revenue is highly concentrated geographically, with São Paulo accounting for 37.44% of total revenue (R\$ 5.77M) and 42% of total order volume. This pattern is consistent across the time period analyzed, with São Paulo also exhibiting the strongest growth trend. In contrast, the bottom 10 states together contribute less than 2% of total revenue, with Roraima contributing only 0.06% (R$ 9,039).
2. Revenue Is Primarily Driven by Order Volume:
Across high-revenue states, revenue differences are largely explained by order volume rather than higher average order values (AOV). São Paulo, despite leading in total revenue, has one of the lowest AOVs at 0.89× the national average. Conversely, Paraíba shows the highest AOV at 1.68×, but contributes only 0.89% (R$ 137K) of total revenue due to low order volume (0.53%, 515 orders). This pattern suggests that high-volume states follow a mass-market model, while high-AOV states represent smaller, niche segments.
3. Revenue and Order Volume Move Together Over Time
Time-series analysis shows that revenue and order volume follow very similar growth patterns across states. States experiencing increases in order volume also see proportional increases in revenue, with limited variation driven by changes in AOV. This indicates that regional revenue growth is more closely linked to transaction volume than to changes in order value over time.
Revenue concentration in São Paulo creates a strong dependency on a single region.
Regional revenue growth is driven mainly by increases in order volume rather than higher spending per order.
Some smaller states exhibit higher AOVs, but their overall business impact remains limited due to low transaction volumes.