The plot and data is inspired from the below link :
https://pythondata.com/visualizing-data-overlaying-charts/
The data preparation steps and comments of original link has been retained as it is very informative.
However, there are some major changes with regard to creation of the visualisation :
I will be showcasing two versions of the plot :
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
#%matplotlib inline # needed for jupyter notebooks
plt.rcParams['figure.figsize']=(20,10) # set the figure size
plt.rcdefaults()
path1 = "https://raw.githubusercontent.com/Bhaskar-JR/Matplotlib_Overlaying_Charts/main/Sales_Data.csv"
#path = '/Users/bhaskarroy/Files/Data Science/PYTHON/Visualisation/Matplotlib/Overlaying charts/sales.csv'
sales = pd.read_csv(path1) # Read the data in
sales.Date = pd.to_datetime(sales.Date,format='%Y-%m-%d') #set the date column to datetime
sales.set_index('Date', inplace=True) #set the index to the date column
# now the hack for the multi-colored bar chart:
# create fiscal year dataframes covering the timeframes you are looking for. In this case,
# the fiscal year covered October - September.
# --------------------------------------------------------------------------------
# Note: This should be set up as a function, but for this small amount of data,
# I just manually built each fiscal year. This is not very pythonic and would
# suck to do if you have many years of data, but it isn't bad for a few years of data.
# --------------------------------------------------------------------------------
fy10_all = sales[(sales.index >= '2009-10-01') & (sales.index < '2010-10-01')]
fy11_all = sales[(sales.index >= '2010-10-01') & (sales.index < '2011-10-01')]
fy12_all = sales[(sales.index >= '2011-10-01') & (sales.index < '2012-10-01')]
fy13_all = sales[(sales.index >= '2012-10-01') & (sales.index < '2013-10-01')]
fy14_all = sales[(sales.index >= '2013-10-01') & (sales.index < '2014-10-01')]
fy15_all = sales[(sales.index >= '2014-10-01') & (sales.index < '2015-10-01')]
sales.index
sales.describe()
sales.dtypes
sales.columns
Objective : We want the color of the bars alternate across fiscal years. If the color of a bars of a fiscal year is orange, we want the next fiscal year to be grey followed by orange again for the subsequent year.
Actionable
We shall be using cycle object along with islice from itertools module in Python.
islice allows us to always start cycling from a particular index position.
https://stackoverflow.com/questions/8940737/cycle-through-list-starting-at-a-certain-element/8940984#8940984
https://docs.python.org/3/library/itertools.html
from itertools import cycle, islice
from datetime import datetime
import matplotlib.patches as mpatches
color_list = islice(cycle(['orange','grey']), 0, None)
We shall be using for-loop to plot the fiscal year data sequentially as bar plots.
The color shall be assigned from cycler object.
# Let's build our plot
plt.rcdefaults()
fig, ax1 = plt.subplots()
# set up the 2nd axis/secondary axis
ax2 = ax1.twinx()
# making a copy of original dataframe
df1 = sales.copy()
ax1.plot(df1.Sales_Dollars) #plot the Revenue on axis #1
#incase we want the lower limit of yaxis to be zero
#ax1.set_ylim(0, ax1.get_ylim()[1])
# Using for loop to plot the fiscal year data sequentially as bar plots
# Assign color from cycler object
kwargs = dict(width=20, alpha=0.2) #dict object to be unpacked as ax.bar arguments
for fy in [fy10_all, fy11_all, fy12_all, fy13_all, fy14_all, fy15_all]:
fyr = fy.copy()
fyr.index = mdates.date2num(fyr.index)
print(type(fyr.index))
ax2.bar(fyr.index, fyr.Quantity, **kwargs, color = next(color_list))
ax2.grid(visible=False) # turn off grid #2
ax1.set_title('Monthly Sales Revenue vs Number of Items Sold Per Month')
ax1.set_ylabel('Monthly Sales Revenue')
ax2.set_ylabel('Number of Items Sold')
[tkl.set(ha = 'right',rotation = 15, rotation_mode = "anchor")
for tkl in ax1.xaxis.get_ticklabels()]
plt.show()
# Inspecting data frame
df1.head()
# Loading from locally stored file
pd.read_csv(path1)
# Preprocessing the data for visualisation
dfx = pd.read_csv(path1) # Read the data in
dfx.Date = pd.to_datetime(dfx.Date) #set the date column to datetime
dfx.set_index('Date', inplace=True) #set the index to the date column
dfx
# We will be however working with the df1 copy
df1.index
While previous plot was basic, we will refine it further with below changes :
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#%matplotlib inline # needed for jupyter notebooks
plt.rcParams['figure.figsize']=(20,10) # set the figure size
plt.style.use('fivethirtyeight') # using the fivethirtyeight matplotlib theme
path1 = "https://raw.githubusercontent.com/Bhaskar-JR/Matplotlib_Overlaying_Charts/main/Sales_Data.csv"
#path = '/Users/bhaskarroy/Files/Data Science/PYTHON/Visualisation/Matplotlib/Overlaying charts/sales.csv'
sales = pd.read_csv(path1) # Read the data in
sales.Date = pd.to_datetime(sales.Date, format = '%Y-%m-%d') #set the date column to datetime
sales.set_index('Date', inplace=True) #set the index to the date column
# now the hack for the multi-colored bar chart:
# create fiscal year dataframes covering the timeframes you are looking for. In this case,
# the fiscal year covered October - September.
# --------------------------------------------------------------------------------
# Note: This should be set up as a function, but for this small amount of data,
# I just manually built each fiscal year. This is not very pythonic and would
# suck to do if you have many years of data, but it isn't bad for a few years of data.
# --------------------------------------------------------------------------------
fy10_all = sales[(sales.index >= '2009-10-01') & (sales.index < '2010-10-01')]
fy11_all = sales[(sales.index >= '2010-10-01') & (sales.index < '2011-10-01')]
fy12_all = sales[(sales.index >= '2011-10-01') & (sales.index < '2012-10-01')]
fy13_all = sales[(sales.index >= '2012-10-01') & (sales.index < '2013-10-01')]
fy14_all = sales[(sales.index >= '2013-10-01') & (sales.index < '2014-10-01')]
fy15_all = sales[(sales.index >= '2014-10-01') & (sales.index < '2015-10-01')]
Objective : We want the color of the bars alternate across fiscal years. If the color of a bars of a fiscal year is orange, we want the next fiscal year to be grey followed by orange again for the subsequent year.
Actionable
We shall be using cycle object along with islice from itertools module in Python.
islice allows us to always start cycling from a particular index position.
https://stackoverflow.com/questions/8940737/cycle-through-list-starting-at-a-certain-element/8940984#8940984
https://docs.python.org/3/library/itertools.html
from itertools import cycle, islice
from datetime import datetime
import matplotlib.patches as mpatches
color_list = islice(cycle(['orange','grey']), 0, None)
We shall now create the labels corresponding to the start of the financial years in the format : 'FY 2009'
We will use tick locator/formatter pairs for controlling tick position and string representation.
https://matplotlib.org/stable/api/ticker_api.html?highlight=locator%20formatter
https://quantdare.com/how-to-create-calendars-in-finance/
https://stackoverflow.com/questions/3898572/what-is-the-standard-python-docstring-format
https://stackoverflow.com/questions/14822184/is-there-a-ceiling-equivalent-of-operator-in-python
https://dateutil.readthedocs.io/en/stable/rrule.html
#Using rrule from dateutil to get list of dates corresponding to start of financial years
from dateutil.rrule import rrule, MONTHLY,YEARLY
from datetime import datetime, timedelta
start_date = datetime(2009,10,1)
#Let's do ceiling division to get the relevant number of financial years
count = -(-(sales.index[-1]-start_date)//timedelta(days = 365.25))
date_list = list(rrule(freq=YEARLY, count=count, dtstart=start_date))
date_list
#Converting datetime object to matplotlibdates
import matplotlib.dates as mdates
date_list1 = mdates.date2num(date_list)
date_list1
#We shall now create the labels corresponding to the start of the financial years
#Using strftime to format a datetime object to string format.
#strftime means string formatter
import matplotlib.ticker as ticker
import matplotlib.dates as mdates
locator = ticker.FixedLocator(date_list1)
date_labels = [datetime.strftime(x, "FY %Y") for x in mdates.num2date(date_list1)]
formatter = ticker.FixedFormatter(date_labels)
date_labels
# Let's build our plot
# using the fivethirtyeight matplotlib theme
plt.style.use('fivethirtyeight')
fig, ax1 = plt.subplots()
# set up the 2nd axis/secondary axis
ax2 = ax1.twinx()
df2 = sales.copy()
df2.index = mdates.date2num(df2.index)
#plot the Revenue on axis #1
ax1.plot(df2.Sales_Dollars, label = 'Monthly Revenue')
# Using for loop to plot the fiscal year data sequentially as bar plots
# Assign color from cycler object
#dict object to be unpacked as ax.bar arguments
kwargs = dict(width=20, alpha=0.2)
for fy in [fy10_all, fy11_all, fy12_all, fy13_all, fy14_all, fy15_all]:
fyr = fy.copy()
fyr.index = mdates.date2num(fyr.index)
ax2.bar(fyr.index, fyr.Quantity, **kwargs, color = next(color_list))
# turn off grid #2
ax2.grid(visible =False)
# Setting the title and axis labels
ax1.set_title('Monthly Sales Revenue vs Number of Items Sold Per Month')
ax1.set_ylabel('Monthly Sales Revenue')
ax2.set_ylabel('Number of Items Sold')
[tkl.set(ha = 'right',rotation = 15, rotation_mode = "anchor")
for tkl in ax1.xaxis.get_ticklabels()]
# Locator formatter pairs
ax1.xaxis.set_major_locator(locator)
ax1.xaxis.set_major_formatter(formatter)
ax1.xaxis.grid(visible = True, color = 'grey', lw = 1,ls = 'dashdot' ,alpha = 0.5)
ax1.xaxis.set_tick_params(pad = 15)
[tkl.set(ha = 'left',rotation = 0, rotation_mode = "anchor") \
for tkl in ax1.xaxis.get_ticklabels()]
bboxargs = dict(alpha=0.9, ec = None,
lw = 0, boxstyle = 'rarrow', pad = 0.8)
# Generating the Legend
handle, label = ax1.get_legend_handles_labels()
## Creating patches to pass in legend function
patch1 = mpatches.Patch(color='orange',alpha = 0.5,
label='Monthly Sales Qty',zorder = 100)
patch2 = mpatches.Patch(color='grey',alpha = 0.5,
label='Monthly Sales Qty', zorder = 100)
handle.extend([patch1, patch2])
ax1.legend(handles = handle,bbox_to_anchor = (1,1),
loc = 'upper right', bbox_transform = fig.transFigure)
plt.subplots_adjust(bottom = 0.2)
Objective : Currently the bars are directly below the line plot. We want to achieve clarity in the graph by rescaling the height of the bars. In essence, the peak of the bars will lie much below the line plot.
# Let's build our plot
fig, ax1 = plt.subplots()
# set up the 2nd axis/secondary axis
ax2 = ax1.twinx()
df3 = sales.copy()
df3.index = mdates.date2num(df3.index)
#plot the Revenue on axis #1
ax1.plot(df3.Sales_Dollars, label = 'Monthly Revenue')
# Using for loop to plot the fiscal year data sequentially as bar plots
# Assign color from cycler object
kwargs = dict(width=20, alpha=0.2)
for fy in [fy10_all, fy11_all, fy12_all, fy13_all, fy14_all, fy15_all]:
fyr = fy.copy()
fyr.index = mdates.date2num(fyr.index)
ax2.bar(fyr.index, fyr.Quantity, **kwargs, color = next(color_list))
ax2.grid(visible = False) # turn off grid #2
# Monthly Sales Revenue in form of line plot
ax1.set_title('Monthly Sales Revenue vs Number of Items Sold Per Month',
fontsize = 25, y = 1.05)
ax1.set_ylabel('Monthly Sales Revenue')
ax2.set_ylabel('Number of Items Sold')
# Locator formatter pairs
ax1.xaxis.set_major_locator(locator)
ax1.xaxis.set_major_formatter(formatter)
ax1.xaxis.grid(visible = True, color = 'grey',
lw = 1,ls = 'dashdot', alpha = 0.5)
ax1.xaxis.set_tick_params(pad = 15)
[tkl.set(ha = 'left',rotation = 0, rotation_mode = "anchor")
for tkl in ax1.xaxis.get_ticklabels()]
# Generating the Legend
handle, label = ax1.get_legend_handles_labels()
## Creating patches to pass in legend function
patch1 = mpatches.Patch(color='orange',alpha = 0.5,
label='Monthly Sales Qty',zorder = 100)
patch2 = mpatches.Patch(color='grey',alpha = 0.5,
label='Monthly Sales Qty', zorder = 100)
handle.extend([patch1, patch2])
ax1.legend(handles = handle,bbox_to_anchor = (1,1),
loc = 'upper right', bbox_transform = fig.transFigure)
# Resetting the
ax2.set_ylim(0,ax2.get_ylim()[1]*10//3)
plt.subplots_adjust(bottom = 0.2)
plt.show()
#fig.savefig('Timeseries_overlaying charts.png', dpi = 300, pad_inches = 1)
# inspecting the dataframe
fy10_all
# checking the plot of monthly revenue sales
sales['Sales_Dollars'].plot();
plt.rcParams['figure.figsize']=(20,10) # set the figure size
plt.style.use('fivethirtyeight') # using the fivethirtyeight matplotlib theme
path1 = "https://raw.githubusercontent.com/Bhaskar-JR/Matplotlib_Overlaying_Charts/main/Sales_Data.csv"
#path = '/Users/bhaskarroy/Files/Data Science/PYTHON/Visualisation/Matplotlib/Overlaying charts/sales.csv'
sales = pd.read_csv(path1) # Read the data in
sales.Date = pd.to_datetime(sales.Date) #set the date column to datetime
sales.set_index('Date', inplace=True) #set the index to the date column
# now the hack for the multi-colored bar chart:
# create fiscal year dataframes covering the timeframes you are looking for. In this case,
# the fiscal year covered October - September.
# --------------------------------------------------------------------------------
# Note: This should be set up as a function, but for this small amount of data,
# I just manually built each fiscal year. This is not very pythonic and would
# suck to do if you have many years of data, but it isn't bad for a few years of data.
# --------------------------------------------------------------------------------
fy10_all = sales[(sales.index >= '2009-10-01') & (sales.index < '2010-10-01')]
fy11_all = sales[(sales.index >= '2010-10-01') & (sales.index < '2011-10-01')]
fy12_all = sales[(sales.index >= '2011-10-01') & (sales.index < '2012-10-01')]
fy13_all = sales[(sales.index >= '2012-10-01') & (sales.index < '2013-10-01')]
fy14_all = sales[(sales.index >= '2013-10-01') & (sales.index < '2014-10-01')]
fy15_all = sales[(sales.index >= '2014-10-01') & (sales.index < '2015-10-01')]
# Let's build our plot
fig, ax1 = plt.subplots()
# set up the 2nd axis/secondary axis
ax2 = ax1.twinx()
df = sales.copy()
df.index = mdates.date2num(df.index)
ax1.plot(df.Sales_Dollars, label = 'Monthly Revenue') #plot the Revenue on axis #1
# Creating cycler object with islice to set bar plot colors to grey, orange alternately
from itertools import cycle, islice
from datetime import datetime
import matplotlib.patches as mpatches
color_list = islice(cycle(['orange','grey']), 0, None)
# Using for loop to plot the fiscal year data sequentially as bar plots
# Assign color from cycler object
kwargs = dict(width=20, alpha=0.2) #kwargs dictionary for bar plot arguments
for fy in [fy10_all, fy11_all, fy12_all, fy13_all, fy14_all, fy15_all]:
fyr = fy.copy()
fyr.index = mdates.date2num(fyr.index)
ax2.bar(fyr.index, fyr.Quantity, **kwargs, color = next(color_list), label = 'Monthly Quantity')
ax2.grid(visible =False) # turn off grid #2
ax1.set_title('Monthly Sales Revenue vs Number of Items Sold Per Month', fontsize = 25, y = 1.05)
ax1.set_ylabel('Monthly Sales Revenue')
ax2.set_ylabel('Number of Items Sold')
#Creating locator/formatter
from dateutil.rrule import rrule, MONTHLY,YEARLY
from datetime import datetime, timedelta
start_date = datetime(2009,10,1)
import matplotlib.dates as mdates
#Let's do ceiling division to get the relevant number of financial years
count = -(-(sales.index[-1]-start_date)//timedelta(days = 365.25))
date_list = list(rrule(freq=YEARLY, count=count, dtstart=start_date))
date_list = mdates.date2num(date_list)
import matplotlib.ticker as ticker
import matplotlib.dates as mdates
locator = ticker.FixedLocator(date_list)
date_labels = [datetime.strftime(x, "FY %Y") for x in mdates.num2date(date_list)]
formatter = ticker.FixedFormatter(date_labels)
# Set the x-axis labels to be more meaningful than just some random dates.
ax1.xaxis.set_major_locator(locator)
ax1.xaxis.set_major_formatter(formatter)
ax1.xaxis.grid(visible = True, color = 'grey', lw = 1,alpha = 0.5)
ax1.xaxis.set_tick_params(pad = 15)
# Tweak the x-axis tick labels setting
for tkl in ax1.xaxis.get_ticklabels():
tkl.set(ha = 'left',rotation = 0, rotation_mode = "anchor")
# Creating the legend
handle, label = ax1.get_legend_handles_labels()
patch1 = mpatches.Patch(color='orange',alpha = 0.5, label='Monthly Qty',zorder = 100)
patch2 = mpatches.Patch(color='grey',alpha = 0.5, label='Monthly Qty', zorder = 100)
handle.extend([patch1, patch2])
ax1.legend(handles = handle,bbox_to_anchor = (1,1),
loc = 'upper right', bbox_transform = fig.transFigure)
# Rescaling the secondary axis to reduce the bar heights
ax2.set_ylim(0,ax2.get_ylim()[1]*10//3)
plt.show()
We have created a combo chart with monthly sales revenue plotted as line plot overalying over monthly quantity sold as bar plots. To achieve that, we have utilised number of features available in matplotlib.