Fork me on GitHub

Project Team 23: Public Transportation Systems Efficiency Analysis, COSI 116A F23

Aaron Gold, Jimmy Kong, Abbie Murphy, Emily Szabo

Project-long Course Project as part of COSI 116A: Information Visualization, taught by Prof. Dylan Cashman, Brandeis University.

Motivation

The primary motivation for selecting this research topic is to discover patterns and trends in public transportation spending efficiency. While there are several potential uses for the visualizations generated, discovering new hypotheses and trends is the foremost reason. This is because discovery “is about the generation and verification of hypotheses, associated with modes of scientific inquiry” (Brehmer, 2013). The goal is for governments that want to improve their public transportation systems to use the trends from these visualizations to better allocate their resources and improve the quality of their systems. We plan to address these needs by making it clear what aspects of various systems are successful, essentially handing them a blueprint from which to improve their own designs.

Visualization

Explore the interactive graphs by engaging with the scatterplot points or brushing over them to select one or more countries. The side bar charts will dynamically update to display data for the selected countries. Similarly, selections made on the side bar charts will synchronize with the scatterplot.

The visualizations are divided into three categories:

Hover over countries to view detailed information. Clicking on a country will reveal a pair of percentage bars showing the ratio of infrastructure spending on maintenance and infrastructure spending on rail. If data for a metric isn’t available, only the relevant percentage bar will appear.

To reset your selections and explore anew, use the "Clear Selection" button.

Demo Video

Visualization Explanation

Data Analysis

Our initial purpose of creating this visualization was to explore the data to find which countries may be spending efficiently on rail through seeing how much usage for rail there was in comparison to spending in each country. While looking at trends is important, we were sure to follow the advice of finding the story that pops out of our data (R. Kosara and J. Mackinlay, pp. 44-50, 2013). We went in with an initial goal to first illustrate which countries may be spending ineffectively, as a wakeup call to those who may be inclined to push for a better system, or those who may have the ability to reallocate spending.

First, we must determine how we will deem a country an “effective spender” versus “ineffective.” There is no magic ratio of spending to usage but we can see how some countries compare to others with the same ridership compared to how each spends. For most countries, there seems to be a trend of spending more correlating to more usage. The left end of the axis seems to have a cluster of points that increases as spending does. These can be considered to have effective spending since they are “getting out what they are putting in.” Additionally, they are on the low end of spending compared to the less effective spenders. These outliers include Turkey, Australia, the USA, Germany, and France which have similar or worse ridership to those in the cluster but spend significantly more. For example, Germany and France perform well with usage; they fall in the top half of countries with usage, ranking fourth and fifth most usage respectively. They are still not deemed “effective spenders” as they are also the top two biggest spenders to attain a similar level of usage as Belgium. Why do they have less ridership than Belgium while spending 10x (France) and 8x (Germany) the amount? The exception to this is Turkey, which falls in the low spending range but still falls outside of the trend by spending similar amounts to Austria (our runner up in ridership) while having the third worst ridership (not much better than Lithuania).

Some glaring takeaways from these comparisons are the extreme outliers: Switzerland and the USA. Switzerland is the model citizen for usage while still spending a moderate amount. In comparison, the USA is spending more than 2.36x the amount of Switzerland but has the worst ridership. In fact, the USA spends approximately 80x the amount of Lithuania but has even worse ridership with only 0.03 passengers per capita. This is so drastic compared to other countries that the center of the USA point does not seem to diverge from the y axis at all. Additionally, the USA rider usage does not even register on the usage bar graph. In terms of our initial story, these two points are prime examples of countries to take examples from as well as countries to take into consideration for improvement.

After deciding which countries spend effectively on rail, we look at the breakdown of this spending. Do the “effective spenders” spend their money differently from “ineffective spenders?” First, does spending more on maintenance or other infrastructure correlate with more effective spending? Should countries like France and Turkey follow the ratio of Switzerland with 18.2% maintenance or other effective spenders? Note that for this measure, there is no data for the USA, Germany, and Australia. There appears to be no formula for spending vs maintenance. Within countries deemed “effective spenders,” there are large variations in ratios. Turkey (an outlier) has a similar ratio to effective spender the Slovak Republic, France (ineffective) is similar to Belgium and Austria (both effective). In addition to this comparison, it is essential to look at other factors surrounding a country's spending.

The next split is the percentage of total inland transport budget that goes to rail. Inland transportation includes rail, roads, waterways, and air infrastructure. This also showed little correlation, however the USA is notable with a significant proportion of total budget going to rail, lower than 8%. This may be something to consider in the bigger picture of why USA ridership may be so low. While the USA is spending a large amount of money in comparison to other countries, it spends around 91.7% of their total inland budget on other forms of transportation infrastructure such as roads and waterways.

This leads into another measure we took into consideration which is what each country is working with in terms of infrastructure. While data for the year of ridership measurement (2022) was not available, the quality and access measures provide a little more insight into the story of certain countries. For example, Australia,Turkey and the United States are both “ineffective spenders” by our definition. They also are working with less initial rail density in 2021 during the investment stages and are in the bottom three measures of density in kilometers of rail per 100 kilometers of space. Additionally, they are in the bottom three for percentage of the population with access to a public transport location. This may mean people are not riding because they do not have access and potentially public transport might not be effective for taking passengers where they need to go in these countries. These countries may be catching up to others with their infrastructure as, for example, the USA seems to be a large country and invests more in other inland transport than roads. But whether the spending of those countries is on catching up with those measures is unknown. This also does not explain fully why countries like Germany are ‘ineffective’ spenders but are pretty high in these measures. These measures are not not direct correlations between effective spending and ridership but can provide more insight into the full picture of a country's transport systems.

Finally, the quality measure on a scale from 1-7 indicates a general quality measure of transport systems, reported by a survey from citizens. This measure seems to have little indication of rail usage or spending effectiveness. In fact, USA, Germany, and France are all in the top reported quality measures. This could indicate a few things. One is that people are unaware of the true quality of their systems compared to the rest of the world, or that quality is not a motivator for people to ride or is not an indicator of effective spending. Perhaps the United States is focusing on the wrong aspects of what to improve about their systems as they are second in quality but lowest in rider usage.

Overall, this visualization is a starting point for our audience to look at certain countries and ask further questions about possible improvement. It has no definitive answers or exact correlations but may be a wakeup call for some countries and an indicator as to why countries may be spending ineffectively.

Task Analysis

The domain tasks are the essential questions that drive the research and discovery in infrastructure investment analysis, such as understanding the correlations between spending and efficiency. Initially, our domain tasks included a range of questions, including cost of transport and the geographical extent of system coverage. However, as we narrowed our focus, these became less critical. Ultimately, the most important domain task we outlined was the correlation between annual investment and the efficiency of public transportation.

The domain tasks are the essential questions that drive the research and discovery in infrastructure investment analysis, such as understanding the correlations between spending and efficiency. Initially, our domain tasks included a range of questions that looked at the cost of transport and the geographical extent of system coverage in a range of countries. However, as we narrowed our focus, these became less critical as we decided to focus mainly on infrastructure investment. Ultimately, the most important domain task we outlined was the correlation between annual investment and the efficiency of public transportation.

The mid-level search tasks include Browse, Locate, and Explore. These tasks focus on identifying relevant data points and uncovering patterns within the dataset. In our visualization, browse and locate are expected to be more frequently utilized, as most users will be searching the data without a specific target. From this they are able to locate specific countries and browse countries grouped in the same area.

Finally, the highest level tasks include Discover and Present, and these focus on how the visualization is being used. These tasks enable the users to to consume and discover new knowledge and uncover patterns. Our visualization's primary goal is for users to utilize these tasks to gain insight into infrastructure investment across different countries.

Design Process

We adopted the ICE-T methodology—"Insight, Confidence, Essence, and Time"—to guide the design of various components (Amar, R.A., Eagan, J.R., & Stasko, J.T., pp. 3-4). This approach emphasizes the following:

In designing this model we also wanted to keep in mind storytelling strategies with visualizations (R. Kosara and J. Mackinlay, pp. 44-50, 2013). Since our model was mainly about exploration the aim was for any story that was evident to be able to pop out of the visualization. The scatter plot provides an initial starting point to the story, allowing obvious trends or outliers to be perceived clearly. Additionally, to make the scatter plot “easier to understand” we followed the recommendation to include multiple views of the same data. In this case, we included bar graphs with utilization and spending. This allowed for the relationship between the two to be evident on the graph, but also another side to the data which is the comparison of countries placement in rankings.

We chose the years to be one after another (spending in 2021 and usage in 2022) because of the temporal nature of storytelling (R. Kosara and J. Mackinlay, pp. 44-50, 2013). Overall we are looking for causal relationships between both spending and usage, and other measures and effectiveness. While the scatterplot is the origin of the story, one can branch off of it in different ways to investigate points of interest in deeper context. We give examples in the data analysis section, but clicking on a point allows further specification into spending type, something that might add further explanation to a country's results. The additional bar graphs under quality and access provide further causal explanations and context to add to the potential stories uncovered.

Finally, we were sure to incorporate considerations for human memory with our interactions. We wanted to be able to follow a specific country to investigate their placement on the graph more in depth. This is the reason for the toggle feature between bar graph categories. While we could have had different visualizations for each of these measures or moved them so they were all displayed at once, we knew that if a user were to investigate a group of points and had to scroll down to see all measures that they may not remember the layout on the scatterplot. This could make it hard to follow or pick out any trends between these quality measures and spending effectiveness. The goal of our visualizaton was to be able to follow a point through all additional measures while still comparing it to the original scatterplot.

Prototypes

In the initial planning phase of our visualizations, we explored various sketch designs and ultimately combined three of them to create the final visualizations presented above. After finalizing our sketches, we translated them into a semi-interactive Figma prototype to better understand the user workflow. The sections below showcase our initial designs.

Figma


Sketches

Fullscreen Image

Conclusion

Throughout this project, we explored and implemented a range of visualization tools to effectively present our analysis. One of the key challenges we faced was finding comprehensive datasets that included all of the information that we wanted to showcase for each country. For instance, we were unable to find data for infrastructure maintenance spending and therefore it is absent for some of the countries in our visualization. As we progressed from planning to implementation we experienced the depth of refinement that can occur, as we continued to identify areas for improvement and adjustments that would enhance the clarity of our visualization. In the future, we would prioritize expanding our dataset to be more uniform and to include more countries. We would also be interested in including more metrics about the cost of using public transportation. Overall, this project has been an amazing learning experience and there is a lot of potential to delve deeper into the topic of public transportation around the world.

Console Error Note

Due to the use of multiple interaction tools, when a single point is clicked in the scatterplot, an error message appears in the console. Because the brush activates upon a click, but the click does not cover enough area of any single point to consider itself as being selected, the brushing outputs an error to the console. This is not preventable, but has no effect on the visualization outside of the console message.

Acknowledgments