altairis an interactive visualization library. It offers a more consistent API. This is how the authors describe the library.
Declarative statistical visualization library for Python
What does it mean is that its focus on what to plot instead of how to plot, and you can easily combine different components like ggplot. Before I jump into the library, I want to quickly introduce two great tools that are useful for data exploratory. You can easily add more interactiveness to libraries like seaborn or matplotlib. Check out some simple examples on Github, I have included most of the example in the repository
ipywidgets allows you to interact with Jupyter Notebook/Lab with your mouse, it makes Jupyter Notebook almost looks like a little app itself. You can add Slicer, Button, and Checkbox inside a notebook. Here are some nice projects that use ipywidgets.
In response to Hong Kong overcrowding Hospital, I have made an example with Accident & Emergency (A&E) service waiting time. Here the plot show the average waiting time by hospitals, while the ipywidgets here simply add a dropdown menu that you can change the color of the plot interactively.
import seaborn as sns import ipywidgets @ipywidgets.interact def plot(color=['red','steelblue']): (sns.barplot(y='hospNameEn', x='topWaitTime', data=df_mean, orient='h', color=color) .set_title('Average waiting Time of Accident & Emergency waiting Time in HK Hospital'))
qgrid let you have an Excel like table inside Jupyter Notebook/Lab, under the hood, it make uses of ipywidgets. This is extremely useful when you are trying to understand your data, instead of typing a lot of code, you can sort your data with a click, filter some data temporarily with one click. It’s just awesome!
altair has an example gallery which demonstrates a wide range of visualization that you can make.
altair offers a lot of choice for interactive plotting. The above gif gives you a sense of how easy it could be. With just 1 line of code, you can change the behavior of the chart. Interactiveness is crucial for data exploratory, as often you want to drill down on a certain subset of the data. Functions like cross-filtering are very common in Excel or Tableau.
The syntax is a bit like
ggplot , where you create a Chart object and add encoding /color/scale on it.
import altair as alt (alt.Chart(df_sub) .mark_rect() .encode(x='yearmonthdate(hospTime):O', y='hospNameEn:N', color=alt.Color('mean(topWaitTime):Q', scale=alt.Scale(scheme='orangered'))) )
If you have used matplotlib, you probably try to look up the doc a lot of time. altair make great use of symbol like
| which are very intuitive when you want to combine different charts. For example,
a|b mean stacking Chart
a+b mean overlay Chart b on Chart a.
The ability to overlay two charts is substantial, it allows us to plot two charts on the same graph without actually joining it.
Here is an example. I have two graphs created from 2 different datasets, while they are both connected to a selection filter with the hospital name. You can find my example here. Chart A shows historical average waiting Time by hours swhile Chart B shows the last 30 days average waiting Time by hours.
Often time you just want to share a graph. You just need to do
Chart.save() and share the HTML directly with your colleague. The fact that it is just JSON and HTML also means that you could easily integrate it with your web front-end.
If you want a sophisticated dashboard with a lot of data, Tableau or tools like Dash (Plotly), Bokeh is still better at this point. I haven’t found a great solution to deal with large data with altair. I found it is most useful when you try to do interactive plotting no more than 4charts, where you have some cross-filtering, dropdown or highlights.
These are the libraries that I tried to include in my workflow recently. Let me know what you feel and share the tools that you use. I think the interactive control of altair is the most interesting part, but it still has to catch up with other libraries in terms of functionalities and support more chart types.
I will try to experiment plotly + Dash to see if it is better. I have liked the API of altair so far but it may have less production ready. Running altair data server may be a solution but I have not tried it yet.