4.2 Making Quick Exploratory plots

(Prepared by Prashaad)

The power of pandas plotting is that one can move from data subsetting and exploration to plotting quickly within a few lines of code instead of having to create arrays of values to then pass to matplotlib for plotting.

Let’s use the famous Iris dataset to demonstrates some of the possible ways of plotting directly from within Pandas.

Hello Iris

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("iris.csv")
df.columns
# Index(['sepal.length', 'sepal.width', 'petal.length', 'petal.width',
#        'variety'],
#       dtype='object')

Example 1 : Bar Chart

The basic plotting function is pd.plot which supports easy substitution of plot styles using the kind keyword argument

df['variety'].value_counts().plot(kind='bar')

Example 2 : Stacked Bar Chart

Since Pandas uses matplotlib under the hood, you can optimise the plot by speaking to matplotlib directly.

fig = df[['sepal.width','sepal.length']].plot(kind='barh', stacked=True)
fig.axes.yaxis.set_visible(False)                             # Don't show the y-axis
plt.show()

Example 3 : Histrogram

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("iris.csv")
df['sepal.length'].plot(kind='hist')

Example 4 : Scatter Plot

df.plot("petal.width", 'petal.length', kind='scatter')

Example 5 : Pie

df['variety'].value_counts().plot(kind='pie',autopct='%1.1f%%', startangle = 90)
plt.legend(bbox_to_anchor=(1,0.5))

Example 6 : Box & Whiskers Plots

The power of pandas is demonstrated in the following example where complex subsetting of data is done before the plotting. In this case, the requirement is for the distributions of 80 largest petal lengths, grouped by the variety of the flower, to be shown as a boxplot for flowers that have a larger petal length than sepal width.

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("iris.csv")

df_1 = df.loc[df['petal.length'] > df['sepal.width']]

df_2 = df_1.nlargest(80,"petal.length")
ax = df_2.boxplot('petal.length',by='variety')

ax.set_title("")
ax.set_ylabel("Petal Length")
ax.grid(False)
plt.show()

Example 7 : Scatter matrix!

Amongst the other interesting plots from pd.plotting is the scatter_matrix.

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("iris.csv")

# We can avoid seeing all the lines of text output by assigning to _
_=pd.plotting.scatter_matrix(df, alpha=0.4, diagonal='kde', figsize=(12,12))

plt.show()