Data Visualization
January 21, 2024
The Maker Take
Data visualization is an important tool in the data practitioner's toolkit. Proper data visualization communicates information more efficiently and accurately. As someone who is primarily a data engineer, I often get a bit of a free pass on so-so visualizations compared to my data scientist and data analyst counterparts. That doesn't mean I haven't found one of my visualizations on a PowerPoint slide presented to much larger or higher-profile audience.
My Don't Do's
Omit units
Use double y-axis graphs
Use colors for style over substance
Use pie charts at all
Use 3D visualizations unnecessarily, especially for jazzed up 2D graphsÂ
Use a library like D3 because it's powerful
Do These Instead
Keep it simple
Think of your audience
Get a second opinion
Make sure to label axes and include units
Include a header, maybe even a date
Include a caption with any caveats, especially if your chart might be "borrowed"
Keep reading below for an overview of learning resources and popular tools used for data visualization or check out the rest of our Blog!
Learning Resources
Check out these helpful sources of information to learn more about visualization for deep dives and quick references.
A Periodic Table of Visualization Methods
I found this great interactive tool that catalogs visualization methods in perspective that will be familiar to many scientists out there. Don't just admire my screenshot though, you have to try it out yourself. This is just one of many examples of the wealth of information at https://www.visual-literacy.org.
Data Visualization With Python
There are tons of courses online for learning about data visualization. Here's the most popular one on Coursera.
The Visual Display of Quantitative Information
A popular read by author Edward Tufte. A brief description from www.edwardtufte.com:
The classic book on statistical graphics, charts, tables. Theory and practice in the design of data graphics, 250 illustrations of the best (and a few of the worst) statistical graphics, with detailed analysis of how to display data for precise, effective, quick analysis.
You can purchase directly from the site or you can find it on Amazon using the embedded preview.
Data Is Ugly
This is a great Reddit thread highlighting examples of not-so-great visualizations.
Open-Source Tools
There are lots and lots of tools for visualizing data. These are some of the most widely used libraries and frameworks available.
Matplotlib
Matplotlib is a fairly ubiquitous library for Python that has great integration with Jupyter and other tools. It can be a little unwieldy, especially if you are new to Python. Their documentation is great though so you can't go wrong learning from their website https://matplotlib.org/.
Plotly
Plotly is a handy way to build interactive visualizations. They are fairly responsive and can be easy to self-host for developers and data scientists. There is also a paid counterpart, Dash Enterprise which helps manage and host Plotly apps. Check out the interactive demos at https://plotly.com/examples/.
D3.js
D3.js by Observable is a powerful visualization library for Javascript. You can build responsive, interactive visualizations embedded in a website that others can quickly access. This beats trying to get your CEO to run a Docker container. Learn more at https://d3js.org/
Honorable Mention
The data visualization space is always changing. The libraries above are the ones I learned when I was first starting but there are more!
Seaborn
ggplot
bokeh
Grafana, Kibana, Prometheus
These tools are unique in that they are geared for real-time dashboards and are often used by engineers to monitor the status of their applications and digital infrastructure. They can be exceptionally powerful and integrate with many datasources.
Proprietary Tools
You might not use this working alone but if your visualization data for a company, there's a good chance you have access to one of these.
Mode Analytics
Mode Analytics is (or was) the "new kid on the block" of analytics and BI tools. It is lightweight and easy to configure and use. I have had lots of good experiences with it.
Tableau and PowerBI
Salesforce Tableau and Microsoft PowerBI are heavy hitters in the "business intelligence" category. If you work for a really big company, you will probably encounter one of these. They are rich and complex and provide tons of ways to query and visualize data. They can be expensive and difficult to set up. However, they have decent customer service and technical support if you don't stray to far from their target audience.
Excel
Excel is a classic tool for managing tabular data. It also includes many visualization options that you can build with a visual editor. Sometimes exporting your data to CSV or XLSX is the best way to go. You can also export your visualizations to image files too.