Tips & Tricks for Data Science

Can Context Analysis Help Extract Meaningful Insights?

Three strategies to extract meaningful insights from data through Context Analysis

Angelica Lo Duca's avatar
Angelica Lo Duca
Feb 08, 2022
∙ Paid
Share
Photo by Aaron Jones on Unsplash

I am currently reading a very interesting book by James Gate, entitled Storytelling with Data: The New Visualization Data Guide to Reaching Your Business Aim in The Fastest Way. In the book, I found a very interesting chapter, that describes the importance of the Context, when extracting insights from data.

I have elaborated on the author’s thoughts and here I describe what I have learned, regarding the importance of Data Context. I exploit a practical example to illustrate the concepts I learned.

Context Analysis involves the analysis of all the world around a dataset. The world around a dataset may include different aspects. For example, if you are measuring the temperature of the sea surface over time, context may include the weather conditions, the presence of some ships, and so on.

Three elements concur to define Context Analysis:

  • Events

  • Environment

  • Time

I consider the three aspects separately, and I use a practical example to explain them.

1 Setup of the Scenario

For example, I consider the number of passengers carried by Italian air transport from 1970 to 2020. The used dataset is released by the World Data Bank under the CC-BY 4.0 license.

The objective of this section is to convert the raw dataset into a time series, that contains the considered indicator for Italy.

Firstly, I load the dataset as a Pandas Dataframe:

import pandas as pddf = pd.read_csv('API_ITA_DS2_en_csv_v2_3472313.csv')

The original dataset contains different indicators, thus I select only what interests me:

indicator = 'Air transport, passengers carried'
df_ind = df[df['Indicator Name'] == indicator]

Then, I drop the unused columns:

df_ind.drop(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code','Unnamed: 65'], axis=1, inplace = True)

I rename the index to value:

df_ind.rename(index={511 : 'value'}, inplace=True)

Now, I build the time series:

df_ind = df_ind.transpose()
ts = df_ind['value']
ts.dropna(inplace=True)
ts.index = ts.index.astype(int)

Finally, I plot the time series:

import matplotlib.pyplot as plt
import numpy as npxmin = np.min(ts.index)
xmax = np.max(ts.index)
plt.figure(figsize=(15,6))
plt.plot(ts)
plt.title(indicator)
plt.gca().yaxis.set_major_formatter(plt.matplotlib.ticker.StrMethodFormatter('{x:,.0f}'))
plt.xticks(np.arange(xmin, xmax+1, 1),rotation=45)
plt.xlim(xmin,xmax)
plt.grid()
plt.savefig('air_transport_basics.png',bbox_inches="tight")
plt.show()

Now, I am ready to analyze the context around the time series.

2 Events

An event is a thing that happens and influences in some way the dataset's trend or behavior.

In our dataset, we can clearly identify at least three events that produce negative peaks:

  • Twin Towers Fall in 2001 and the negative peak in 2002 ,

  • The Economic Crisis, that began in 2008 in Italy,

  • The beginning of the Covid-19 pandemics in 2020.

3 Environment

The environment includes all the particular external or internal constraints that influence a trend in a dataset. For example, a very strict professor could give students the maximum score of 28/30, while another could also assign the maximum value of 30/30. Therefore, it is not possible to compare the grades of students evaluated by the two different professors, unless after having correctly normalized the data.

Keep reading with a 7-day free trial

Subscribe to Tips & Tricks for Data Science to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Angelica Lo Duca
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture