more about Pandas入门 DataFrame的常用函数
Prepping Data
Let’s download, import and clean our primary Canadian Immigration dataset using pandas read_excel()
method for any visualization.
1 | df_can = pd.read_excel('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/Canada.xlsx', |
Waffle Charts
A waffle chart
is an interesting visualization that is normally created to display progress toward goals. It is commonly an effective option when you are trying to add interesting visualization features to a visual that consists mainly of cells, such as an Excel dashboard.
1 | import matplotlib as mpl |
Unfortunately, unlike R, waffle
charts are not built into any of the Python visualization libraries. Therefore, we will learn how to create them from scratch.
Let’s revisit the previous case study about Denmark, Norway, and Sweden.
1 | # let's create a new dataframe for these three countries |
Step 1. The first step into creating a waffle chart is determing the proportion of each category with respect to the total.
1 | # compute the proportion of each category with respect to the total |
Denmark: 0.32255663965602777
Norway: 0.1924094592359848
Sweden: 0.48503390110798744
Step 2. The second step is defining the overall size of the waffle
chart.
1 | width = 40 # width of chart |
Total number of tiles is 400
Step 3. The third step is using the proportion of each category to determe it respective number of tiles
1 | # compute the number of tiles for each catagory |
Denmark: 129
Norway: 77
Sweden: 194
Based on the calculated proportions, Denmark will occupy 129 tiles of the waffle
chart, Norway will occupy 77 tiles, and Sweden will occupy 194 tiles.
Step 4. The fourth step is creating a matrix that resembles the waffle
chart and populating it.
1 | # initialize the waffle chart as an empty matrix |
Waffle chart populated!
Let’s take a peek at how the matrix looks like.
1 | waffle_chart |
array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 2.,
2., 2., 2., 2., 2., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.,
3., 3., 3., 3., 3., 3., 3., 3.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 2.,
2., 2., 2., 2., 2., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.,
3., 3., 3., 3., 3., 3., 3., 3.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 2.,
2., 2., 2., 2., 2., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.,
3., 3., 3., 3., 3., 3., 3., 3.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 2.,
2., 2., 2., 2., 2., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.,
3., 3., 3., 3., 3., 3., 3., 3.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 2.,
2., 2., 2., 2., 2., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.,
3., 3., 3., 3., 3., 3., 3., 3.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 2.,
2., 2., 2., 2., 2., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.,
3., 3., 3., 3., 3., 3., 3., 3.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 2.,
2., 2., 2., 2., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.,
3., 3., 3., 3., 3., 3., 3., 3.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 2.,
2., 2., 2., 2., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.,
3., 3., 3., 3., 3., 3., 3., 3.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 2.,
2., 2., 2., 2., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.,
3., 3., 3., 3., 3., 3., 3., 3.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 2., 2.,
2., 2., 2., 2., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.,
3., 3., 3., 3., 3., 3., 3., 3.]])
As expected, the matrix consists of three categories and the total number of each category’s instances matches the total number of tiles allocated to each category.
Step 5. Map the waffle
chart matrix into a visual.
1 | # instantiate a new figure object |
Step 6. Prettify the chart.
1 | # instantiate a new figure object |
Step 7. Create a legend and add it to chart.
1 | # instantiate a new figure object |
And there you go! What a good looking delicious waffle
chart, don’t you think?
Function packed
Now it would very inefficient to repeat these seven steps every time we wish to create a waffle
chart. So let’s combine all seven steps into one function called create_waffle_chart. This function would take the following parameters as input:
- categories: Unique categories or classes in dataframe.
- values: Values corresponding to categories or classes.
- height: Defined height of waffle chart.
- width: Defined width of waffle chart.
- colormap: Colormap class
- value_sign: In order to make our function more generalizable, we will add this parameter to address signs that could be associated with a value such as %, $, and so on. value_sign has a default value of empty string.
1 | def create_waffle_chart(categories, values, height, width, colormap, value_sign=''): |
Now to create a waffle
chart, all we have to do is call the function create_waffle_chart
. Let’s define the input parameters:
1 | width = 40 # width of chart |
And now let’s call our function to create a waffle
chart.
1 | create_waffle_chart(categories, values, height, width, colormap) |
Total number of tiles is 400
Denmark: 129
Norway: 77
Sweden: 194
There seems to be a new Python package for generating waffle charts
called PyWaffle, but it looks like the repository is still being built. But feel free to check it out and play with it.