4.2 Making Quick Exploratory plots
(Prepared by Prashaad)
The power of pandas plotting is that one can move from data subsetting and exploration to plotting quickly within a few lines of code instead of having to create arrays of values to then pass to matplotlib for plotting.
Let’s use the famous Iris dataset to demonstrates some of the possible ways of plotting directly from within Pandas.
Hello Iris
import pandas as pd
import matplotlib.pyplot as plt
= pd.read_csv("iris.csv")
df
df.columns# Index(['sepal.length', 'sepal.width', 'petal.length', 'petal.width',
# 'variety'],
# dtype='object')
Example 1 : Bar Chart
The basic plotting function is pd.plot which supports easy substitution of plot styles using the kind keyword argument
'variety'].value_counts().plot(kind='bar') df[
Example 2 : Stacked Bar Chart
Since Pandas uses matplotlib
under the hood, you can optimise the plot by speaking to matplotlib
directly.
= df[['sepal.width','sepal.length']].plot(kind='barh', stacked=True)
fig False) # Don't show the y-axis
fig.axes.yaxis.set_visible( plt.show()
Example 3 : Histrogram
import pandas as pd
import matplotlib.pyplot as plt
= pd.read_csv("iris.csv")
df 'sepal.length'].plot(kind='hist') df[
Example 4 : Scatter Plot
"petal.width", 'petal.length', kind='scatter') df.plot(
Example 5 : Pie
'variety'].value_counts().plot(kind='pie',autopct='%1.1f%%', startangle = 90)
df[=(1,0.5)) plt.legend(bbox_to_anchor
Example 6 : Box & Whiskers Plots
The power of pandas is demonstrated in the following example where complex subsetting of data is done before the plotting. In this case, the requirement is for the distributions of 80 largest petal lengths, grouped by the variety of the flower, to be shown as a boxplot for flowers that have a larger petal length than sepal width.
import pandas as pd
import matplotlib.pyplot as plt
= pd.read_csv("iris.csv")
df
= df.loc[df['petal.length'] > df['sepal.width']]
df_1
= df_1.nlargest(80,"petal.length")
df_2 = df_2.boxplot('petal.length',by='variety')
ax
"")
ax.set_title("Petal Length")
ax.set_ylabel(False)
ax.grid( plt.show()
Example 7 : Scatter matrix!
Amongst the other interesting plots from pd.plotting
is the scatter_matrix
.
import pandas as pd
import matplotlib.pyplot as plt
= pd.read_csv("iris.csv")
df
# We can avoid seeing all the lines of text output by assigning to _
=pd.plotting.scatter_matrix(df, alpha=0.4, diagonal='kde', figsize=(12,12))
_
plt.show()