Regional Performance With Growth Trends¶

Author: Nishan Karki¶


Table of Contents¶

1. Setup & Configuration¶

  • 1.1 Import Libraries
  • 1.2 Verify Tables

2. Business Context¶

  • 2.1 Objective
  • 2.2 Data Scope Definition

3. Analysis & Metrics¶

  • 3.1 Revenue Realization Per State

    Summary¶
  • 3.2 Order-Volume Per State

    Summary¶
  • 3.3 High-Revenue States: Volume vs. Average Order Value (AOV)

    Summary¶
  • 3.4 Regional Growth (Revenue & Order) Over Time

    Summary¶

4. Key Findings & Business Interpretation¶

1. Setup & Configuration:¶

1.1 Import Libraries:¶

In [1]:
import pandas as pd
import sqlite3
from sqlalchemy import create_engine
import os
import numpy as np
import sys
from IPython.display import display, Markdown, Image, HTML
from pathlib import Path
import json
import requests
import math
import warnings
warnings.filterwarnings("ignore")
import importlib
import chart_studio.plotly as py
import plotly.express as px
import plotly.graph_objects as go
import cufflinks as cf


from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
cf.go_offline()


# Add the project root to the system path:
project_root = os.path.abspath(os.path.join(os.getcwd(), ".."))

if project_root not in sys.path:
    sys.path.append(project_root)


# Import custom modules:
from utils import helper
from src.regional_performance import regional_performance_metrics, regional_performance_summary
from plots import regional_performance_plots
In [2]:
# Create a connection to the ecommerce database which is required for RFM metric function calls:
conn = sqlite3.connect("../data/ecommerce.db")
In [3]:
# A helper function to reload the package to pick up necessary changes:
def reload_package():
    importlib.reload(regional_performance_metrics)
    importlib.reload(regional_performance_summary)
    importlib.reload(helper)
    print("Packages reloaded!!")

1.2 Verify Tables:¶

In [4]:
# Create a global path for all SQL files that is related to the regional performance analysis:
SQL_BASE_PATH = Path("../sql/04_Regional_Performance_with_Growth_Trends")
In [5]:
helper.extract_table_names(conn=conn)
Out[5]:
name
0 customers
1 geolocation
2 orders
3 order_items
4 order_payments
5 order_reviews
6 products
7 sellers
8 product_category_translation
In [ ]:
 

2. Business Context:¶

2.1 Objective:¶

To analyze regional sales trends over time by revenue, order volume and average order value (AOV), enabling the identification of growing and declining regions and helping distinguish between demand-driven and value-driven performance changes.

2.2 Data Scope Definition:¶

For the Regional Performance analysis, the tables below will be used:

  1. customers: Holds information about the customer and also customer_unique_id feature to identify the unique customers.
  2. geolocation: Longitude and latitude information along with the geolocation_zip_code_prefix related to specific customer location.
  3. orders: Information holding about the orders and the order_status, where delivered status is the key for the analysis.
  4. order_items: Item level orders from the which the company realized the profit.
In [6]:
reload_package()
Packages reloaded!!

3. Analysis & Metrics:¶

3.1 Revenue Realization Per State:¶

3.1.1 Call the get_rev_by_state_at_customer_level() function from

regional_performance_metrics module to get the information about the total amount spent by

each customer along with their address information

In [7]:
rev_by_state_w_customer_info = regional_performance_metrics.get_rev_by_state_at_customer_level(conn=conn,
                                                                    sql_base_path=SQL_BASE_PATH)

The table hold these information as displayed below:

In [8]:
rev_by_state_w_customer_info.head()
Out[8]:
customer_unique_id total_spending zip_code city state
0 0000366f3b9a7992bf8c76cfdf3221e2 141.90 7787 cajamar SP
1 0000b849f77a49e4a4ce2b2a4ca5be3f 27.19 6053 osasco SP
2 0000f46a3911fa3c0805444483337064 86.22 88115 sao jose SC
3 0000f6ccb0745a6a4b88665a16c9f078 43.62 66812 belem PA
4 0004aac84e0df4da2b147fca70cf8255 196.89 18040 sorocaba SP

3.1.2 Aggregate the table by state and sum up the total_spendings:¶

In [9]:
total_rev_per_state = (rev_by_state_w_customer_info.groupby(by=['state'])
                       .agg(total_revenue = ('total_spending', 'sum'))
                       .reset_index())

# Add an extra column to represent the revenue contribution of each state in percentage:
total_rev_per_state['revenue_contribution (%)'] = (np.round(total_rev_per_state['total_revenue'] 
                                                           / 
                                                    total_rev_per_state['total_revenue'].sum(), 4) * 100)

The aggregated table total_rev_per_state holds these information as displayed below:

In [10]:
total_rev_per_state.head()
Out[10]:
state total_revenue revenue_contribution (%)
0 AC 19575.33 0.13
1 AL 94172.49 0.61
2 AM 27585.47 0.18
3 AP 16141.81 0.10
4 BA 590861.20 3.83

3.1.3 Map the full state name with the abbreviated form:¶

Get the mapping data as json:

In [11]:
url = "https://gist.githubusercontent.com/henriquejensen/1032c47a44d2cddaa2ef47fc531025db/raw"
state_names = requests.get(url).json()

Create a dictionary out of the key & values pairs:

In [12]:
state_mapping = {state['sigla']: state['nome'] for state in state_names['UF']}

Add an additional column called state_names mapped with the abbreviated names for the full state names:

In [13]:
total_rev_per_state['state_names'] = total_rev_per_state['state'].map(state_mapping)

The highest revenue generating state:

In [14]:
highest_rev_generating_state = (total_rev_per_state[total_rev_per_state['total_revenue'] 
                                                == 
                                                total_rev_per_state['total_revenue'].max()])

highest_rev_generating_state
Out[14]:
state total_revenue revenue_contribution (%) state_names
25 SP 5773869.02 37.44 São Paulo

The lowest revenue generating state:

In [15]:
lowest_rev_generating_state = (total_rev_per_state[total_rev_per_state['total_revenue'] 
                                                   == 
                                                   total_rev_per_state['total_revenue'].min()])

lowest_rev_generating_state
Out[15]:
state total_revenue revenue_contribution (%) state_names
21 RR 9039.52 0.06 Roraima

3.1.4 Call the plot_total_rev_per_state() function from regional_performance_plots module to

plot a choropleth plot that represents the total revenue by state:

In [16]:
reload_package()
Packages reloaded!!
In [17]:
plt_total_rev_by_state = regional_performance_plots.plot_total_rev_per_state(df=total_rev_per_state)

Summary:¶

Call the get_revenue_real_per_state_summary() function from regional_performance_summary

module to load the summary for Revenue Realization Per State section:

In [18]:
total_states = total_rev_per_state['state_names'].nunique()
regional_performance_summary.get_revenue_real_per_state_summary(highest_rev_generating_state=highest_rev_generating_state,
                                                               lowest_rev_generating_state=lowest_rev_generating_state,
                                                               plt_total_rev_by_state=plt_total_rev_by_state,
                                                               total_states=total_states)

Summary: Revenue Realization Per State:¶


Objective:

Analyze the total realized revenue across 27 Brazilian states (based on delivered orders only) and visualize the geographic revenue distribution using a choropleth map, where color intensity represents revenue magnitude.

Key Insights:

  • São Paulo is the highest revenue-generating state, contributing R$ 5,773,869.02 (37.44% of total revenue).

  • Roraima is the lowest revenue-generating state, contributing R$ 9,039.52 (0.06% of total revenue.


The choropleth visualization highlights strong geographic revenue concentration, enabling quick identification of high-value and underperforming regions.

In [19]:
total_states
Out[19]:
27
In [ ]:
 
In [ ]:
 

3.2 Order-Volume Per State:¶

3.2.1 Call the get_order_vol_by_state_at_customer_level() function from

regional_performance_metrics module to get the order counts of each customer for different states:

In [20]:
order_count_at_customer_lvl = regional_performance_metrics.get_order_vol_by_state_at_customer_level(conn=conn,
                                                          sql_base_path=SQL_BASE_PATH)
In [21]:
print(f"The `order_count_at_customer_lvl` table has a total of {order_count_at_customer_lvl.shape[0]} rows" 
      f" and a {order_count_at_customer_lvl.shape[1]} features.")
The `order_count_at_customer_lvl` table has a total of 93358 rows and a 3 features.

The table order_count_at_customer_lvl holds the information displayed below:

In [22]:
order_count_at_customer_lvl.head()
Out[22]:
customer_unique_id total_orders state
0 0000366f3b9a7992bf8c76cfdf3221e2 1 SP
1 0000b849f77a49e4a4ce2b2a4ca5be3f 1 SP
2 0000f46a3911fa3c0805444483337064 1 SC
3 0000f6ccb0745a6a4b88665a16c9f078 1 PA
4 0004aac84e0df4da2b147fca70cf8255 1 SP

3.2.2 Aggregating at state level and summing up the total_orders:¶

In [23]:
order_volume_per_state = (order_count_at_customer_lvl.groupby('state')
                         .agg(order_volume = ('total_orders', 'sum'))
                         .reset_index())

The table order_volume_per_state holds the information below:

In [24]:
order_volume_per_state.head()
Out[24]:
state order_volume
0 AC 80
1 AL 397
2 AM 145
3 AP 67
4 BA 3254

3.2.3 Mapping the abbreviated state with the full state name using state_mapping from earlier section:¶

In [25]:
order_volume_per_state['state_names'] = order_volume_per_state['state'].map(state_mapping)

3.2.4 Adding order contribution in % by state:¶

In [26]:
order_volume_per_state['order_volume_contribution (%)'] = (np.round(order_volume_per_state['order_volume'] 
                                                    / 
                                                    order_volume_per_state['order_volume'].sum(), 4) * 100) 
In [27]:
order_volume_per_state.head()
Out[27]:
state order_volume state_names order_volume_contribution (%)
0 AC 80 Acre 0.08
1 AL 397 Alagoas 0.41
2 AM 145 Amazonas 0.15
3 AP 67 Amapá 0.07
4 BA 3254 Bahia 3.37

The state with the highest order volume:

In [28]:
highest_order_volume = (order_volume_per_state[order_volume_per_state['order_volume'] 
                                               == 
                                               order_volume_per_state['order_volume'].max()])


highest_order_volume
Out[28]:
state order_volume state_names order_volume_contribution (%)
25 SP 40519 São Paulo 42.0

The state with the lowest order volume:

In [29]:
lowest_order_volume = (order_volume_per_state[order_volume_per_state['order_volume'] 
                                               == 
                                               order_volume_per_state['order_volume'].min()])


lowest_order_volume
Out[29]:
state order_volume state_names order_volume_contribution (%)
21 RR 41 Roraima 0.04

3.2.5 Call plot_total_order_vol_per_state() function from regional_performance_plots module

to plot Total Orders Per State

In [30]:
reload_package()
Packages reloaded!!
In [31]:
plt_total_orders_per_state = regional_performance_plots.plot_total_order_vol_per_state(df=order_volume_per_state)

Summary:¶

Call the get_order_volume_per_state_summary() function from regional_performance_summary

module to load the summary for Order-Volume Per State section:

In [32]:
total_states = order_volume_per_state['state_names'].nunique()

regional_performance_summary.get_order_volume_per_state_summary(total_states = total_states, 
                                                               highest_order_volume=highest_order_volume,
                                                               lowest_order_volume=lowest_order_volume, 
                                                               plot=plt_total_orders_per_state)

Summary: Order Volume Per State:¶


Objective:

Analyze the total volume of orders across the 27 Brazilian states based on delivered orders and visualize the geographic order volume distribution via choropleth map, where the color intensity represents the magnitude of the volume of orders made.

Key Insights:¶

  • São Paulo has the highest volume for orders and is also the highest revenue generating state.

  • Roraima has the lowest volume for orders and is also the lowest revenue generating state.

  • The Volume of Order and Revenue Generation has a linear relationship for these two states, this relationship will be analyzed for other states in the later stage.

The choropleth visualization below highlights strong geographic order-volume concentration, enabling quick identification of states where the most of the revenue for the company lies in:

In [ ]:
 

3.3 High-Revenue States: Volume vs. Average Order Value (AOV):¶

In [33]:
reload_package()
Packages reloaded!!

3.3.1 Merging the total_rev_per_state and order_volume_per_state:¶

In [34]:
performance_by_state = pd.merge(total_rev_per_state, 
                               order_volume_per_state,
                               on=['state', 'state_names'],
                               how='inner')

3.3.2 Adding AOV (Average Order Value):¶

In [35]:
performance_by_state['aov_index'] = (
    performance_by_state['revenue_contribution (%)'] / 
    performance_by_state['order_volume_contribution (%)']
)

Re-ordering the columns:

In [36]:
performance_by_state = performance_by_state.loc[:, ['state', 'state_names','total_revenue',
                                                    'order_volume', 'revenue_contribution (%)',
                                                    'order_volume_contribution (%)', 'aov_index']]
In [37]:
performance_by_state.head()
Out[37]:
state state_names total_revenue order_volume revenue_contribution (%) order_volume_contribution (%) aov_index
0 AC Acre 19575.33 80 0.13 0.08 1.625000
1 AL Alagoas 94172.49 397 0.61 0.41 1.487805
2 AM Amazonas 27585.47 145 0.18 0.15 1.200000
3 AP Amapá 16141.81 67 0.10 0.07 1.428571
4 BA Bahia 590861.20 3254 3.83 3.37 1.136499
In [38]:
max_val = performance_by_state[performance_by_state['aov_index'] == performance_by_state['aov_index'].max()]
max_val
Out[38]:
state state_names total_revenue order_volume revenue_contribution (%) order_volume_contribution (%) aov_index
14 PB Paraíba 137529.57 515 0.89 0.53 1.679245
In [39]:
min_val  = performance_by_state[performance_by_state['aov_index'] == performance_by_state['aov_index'].min()]
min_val
Out[39]:
state state_names total_revenue order_volume revenue_contribution (%) order_volume_contribution (%) aov_index
25 SP São Paulo 5773869.02 40519 37.44 42.0 0.891429

3.3.3 Call the plot_aov_wrt_revenue_and_order_vol() function from

from regional_performance_plots module:

In [40]:
reload_package()
Packages reloaded!!
In [41]:
plt_aov_wrt_reve_and_order_vol = regional_performance_plots.plot_aov_wrt_revenue_and_order_vol(df=performance_by_state)

Summary:¶

Call the get_order_aov_relation_w_revenue_summary() function from regional_performance_summary

module to load the summary for High-Revenue States: Volume vs. Average Order Value (AOV)

section:

In [42]:
regional_performance_summary.get_order_aov_relation_w_revenue_summary(max_val=max_val,
                                                             min_val=min_val,
                                                             plot=plt_aov_wrt_reve_and_order_vol)

Summary: High-Revenue States: Volume vs. Average Order Value (AOV)¶


Objective:¶

Determine whether high-revenue states are driven by higher order volume or higher average order value (AOV).

Key Insights:¶

  • Paraíba state has the highest AOV of 1.68x but is the lowest revenue generating state, contributing 0.89% in the total revenue, with the lowest number of orders contribution 0.53% in the total order-volume.

  • São Paulo state has the lowest AOV of 0.89x but is the highest revenue generating state, with a contribution of 37.44%, in the total revenue, with the highest number of orders contribution 42.00%, in the total order-volume.

  • AOV and Order volume shows an inverse relationship, several states exhibit high AOV but low order volume, indicating premium but niche markets.

  • Revenue for the company is driven by the volume of orders.

Business Implication:¶

States with high volume orders benefit more from scale optimization, while states with high AOV can derive better results with targeted offerings to their VIP customers.


A chart below represents the relation of the revenue with respect to the order volume and AOV:

In [ ]:
 

3.4 Regional Growth (Revenue & Order) Over Time:¶

3.4.1 Call the get_orders_and_revenue_by_time_period_per_state() function

from the regional_performance_metrics module:

In [43]:
total_ordrs_and_revn_by_time_period = (regional_performance_metrics.get_orders_and_revenue_by_time_period_per_state
                                       (conn=conn, sql_base_path=SQL_BASE_PATH))

Mapping the full-state name:

In [44]:
total_ordrs_and_revn_by_time_period['state_name'] = total_ordrs_and_revn_by_time_period['state'].map(state_mapping)

3.4.2 Identifying top n states by revenue, where n is the number

of desired top states by total_revenue:

Call the plot_growth_rev_over_time() function from regional_performance_plots module which returns defined n number of top states and the growth revenue over time for n of states:

In [45]:
n=3

top_n_states_by_rev_data_agg, rev_fig = (regional_performance_plots
                                 .plot_growth_rev_over_time(df=total_ordrs_and_revn_by_time_period, n=n))
In [46]:
top_n_states_by_rev_data_agg
Out[46]:
state_name total_revenue
25 São Paulo 5769703.15
20 Rio de Janeiro 2055401.57
12 Minas Gerais 1818891.67
In [ ]:
 

3.4.3 Identifying top n states by by orders, where n is the number

of desired top states by total_orders:

Call the plot_growth_order_vol_over_time() function from regional_performance_plots module which returns defined n number of top states and the growth in order volume over time for n of states:

In [47]:
reload_package()
Packages reloaded!!
In [48]:
n = 3


top_n_states_by_orders_data_agg, order_fig = (regional_performance_plots
                                    .plot_growth_order_vol_over_time(df=total_ordrs_and_revn_by_time_period, n=n))
In [49]:
reload_package()
Packages reloaded!!

Summary:¶

Call the get_regional_growth_revenue_and_order_over_time_summary()

function from regional_performance_summary module to load the summary for

Regional Growth (Revenue & Order) Over Time section:

In [50]:
(regional_performance_summary.
 get_regional_growth_revenue_and_order_over_time_summary(top_n_states_by_orders_data_agg=top_n_states_by_orders_data_agg, 
                                                         top_n_states_by_rev_data_agg=top_n_states_by_rev_data_agg,
                                                         order_fig=order_fig, 
                                                         revenue_fig=rev_fig))

Summary: Regional Growth (Revenue & Order) Over Time¶


Objective:¶

Analyze regional growth trends over time to understand how revenue and order volume evolve together and identify states driving sustained business growth.

Key Insights:¶

  • Revenue and order volume exhibit nearly identical growth patterns, indicating that revenue growth is primarily driven by increasing order volume rather than price effects.
  • São Paulo consistently leads across time with the highest order volume of 46,448.00 and shows the steepest growth trajectory, significantly outperforming other states.
  • São Paulo is also the highest revenue-generating state of $R5,769,703.15, reinforcing its role as the primary growth engine for the business.

    Business Implicaton:¶

    São Paulo represents the company’s most critical market from both volume and revenue perspectives. The strong and consistent performance of southeastern states suggests that operational efficiency, logistics maturity, and customer density may be key growth drivers.
    The company could experiment with replicating successful operational and marketing strategies from these top-performing states in other high-potential regions through controlled A/B testing to assess scalability.


4. Key Findings & Business Interpretation:¶

Analysis Overview:¶

This analysis examined regional performance across 27 Brazilian states using four complementary lenses: total revenue distribution, order volume patterns, the relationship between volume and average order value (AOV), and temporal growth trends. The goal is to assess geographic performance differences and identify the main drivers of regional revenue contribution.


Key Findings:¶

1. Geographic Concentration Defines the Business:

Revenue is highly concentrated geographically, with São Paulo accounting for 37.44% of total revenue (R\$ 5.77M) and 42% of total order volume. This pattern is consistent across the time period analyzed, with São Paulo also exhibiting the strongest growth trend. In contrast, the bottom 10 states together contribute less than 2% of total revenue, with Roraima contributing only 0.06% (R$ 9,039).

2. Revenue Is Primarily Driven by Order Volume:

Across high-revenue states, revenue differences are largely explained by order volume rather than higher average order values (AOV). São Paulo, despite leading in total revenue, has one of the lowest AOVs at 0.89× the national average. Conversely, Paraíba shows the highest AOV at 1.68×, but contributes only 0.89% (R$ 137K) of total revenue due to low order volume (0.53%, 515 orders). This pattern suggests that high-volume states follow a mass-market model, while high-AOV states represent smaller, niche segments.

3. Revenue and Order Volume Move Together Over Time

Time-series analysis shows that revenue and order volume follow very similar growth patterns across states. States experiencing increases in order volume also see proportional increases in revenue, with limited variation driven by changes in AOV. This indicates that regional revenue growth is more closely linked to transaction volume than to changes in order value over time.


Business Interpretation:¶

  • Revenue concentration in São Paulo creates a strong dependency on a single region.

  • Regional revenue growth is driven mainly by increases in order volume rather than higher spending per order.

  • Some smaller states exhibit higher AOVs, but their overall business impact remains limited due to low transaction volumes.

In [ ]: