Why Mapping COVID is So Hard

A Geography Student Investigates the Outbreak in Florida

Kate Talano & Ben Meader

Naples, Florida in the fall of 2020. Photo by Madison Krol.

Naples, Florida in the fall of 2020. Photo by Madison Krol.

No one likes to hear their community is a disease “hot zone.” When the reality of shutdowns and pandemic measures hit the country in March earlier this year, Middlebury College student Kate Talano watched the outbreak spread through her home state of Florida from distant Vermont. Like many other students, she had to decide how and when to go home—a simple act that now raised many questions: Should she self-quarantine before returning? Should she get tested as a precaution or rather reserve resources for front-line workers? During a remote internship with Rhumb Line Maps this past summer, Kate decided to focus on mapping SARS-CoV-2 (hereafter called COVID-19) in Florida for her independent project.

As a geography major, the frequent reports of unprecedented levels of  COVID-19 across the state left me feeling both deep concern and curiosity. I wanted to better understand the robustness of this ‘hotzone’ claim—not just where the zones were, but how they were defined and who was in them.
          - Kate

In particular, a New York Times article in late June detailed an outbreak in Immokalee, a community near Kate’s home town of Naples. The story covers how agricultural workers in Immokalee were greatly affected by the pandemic.

Landsat Imagery showing southern Florida. Characteristics to note: 1) the depopulated wetlands in the south, 2) sugar plantations just west of West Palm Beach, 3) high coastal populations, and 4) the mixed agriculture surrounding Immokalee.

Earlier that same month a statistician and GIS specialist working for the State of Florida resigned in a highly publicized critique of the government’s messaging. So with a healthy level of skepticism, Kate set out to investigate how the COVID-19 outbreak was being represented, with an eye towards seeing if the data recorded by the state provided evidence for the New York Times story. [^1]

With news headlines, statistics, and visual graphics of COVID19 in Florida constantly changing, it was extremely difficult to parse out the severity and extent of the spread of the disease. Depending on the data source and level of analysis, some reports had the entire state painted as a fiery red “danger zone,” whereas others seemed to highlight only specific areas. Also, the majority of maps were either proportional circle or choropleth maps, generally made at the county-wide level. These mapping techniques are useful, but can feel somewhat one-dimensional; the phenomenon is much more nuanced.   
          - Kate

To be fair, the depiction of the spread of COVID-19 will probably be the subject of fierce debate for years to come. To attempt to create even a simple diagram for Florida at the basic county level is no small task—one that is mired in disjointed testing methods, variable reporting standards, and collection methods that have evolved and changed. So Kate and I—with help from Maja Cannavo, another Middlebury intern—found as many COVID-19 maps as we could on the web, ranging from the best to the not-as-great.
Surprisingly, many of the minimalist maps were among the most useful. The Johns Hopkins map, for instance, was especially noteworthy at the outset of the pandemic. Displaying live data for perusal and basic in-browser metrics, it also provides data for public download. The accompanying map (still live now) reflects many other services, like state websites, Google’s COVID resources, and others. These proportional circle maps give a sense of how many confirmed cases are where, but little visual sense about whether or not those numbers are problematic relative to other variables. Other story maps, like this one, provide stunning visuals that correctly alarm the viewer, but act more as a narrative illustration than an information graphic. Even though we saw a few articles that reviewed aspects of COVID mapping (check out this great article by Kelsey Taylor), many visuals we saw early in the pandemic used a similar symbology: virus-like dots and circles engulf large geographic areas in, as Kate said, a fiery red. [See footnote ^A].
In this regard, it seemed appropriate to begin to experiment with other visual methods. It has, after all, been six months. We’re living with this virus now, for the foreseeable future. Therefore, risk and alarm—like the data that help us understand the virus itself—must be normalized to be properly understood. Oh, but how to normalize it. What statistics are most meaningful? Active cases per 100,000? Death rates per case? Perhaps new cases per day?

Population and population density estimates for ZIP Codes and Counties were based on the 2010 Census. ZIP Code populations (used in the second visual below) were estimated using an areal weighted re-aggregation from census blocks. Example of a “choropleth” map showing population density before and after re-aggregation. Data from IPUMS NHGIS, University of Minnesota.

We liked this visual from the Brown School of Public Health, and this one from Act Now because they leave the time variable to a chart and use a well-researched classification scheme for their choropleth maps. The symbology connotes current risk based on geographic area instead of attempting to represent quantities or or other qualities of the pandemic’s spread. In particular, we appreciated the metric: “average daily new cases per 100k.” The figure attempts to consolidate the concept of active spread into a single ratio. [^2]

 
Legend.png

Our “Risk” Classification Scheme:

This legend applies to the color schemes in both visuals below. Although based on the Harvard Global Health Institute’s classification scheme, we added and subdivided classes to further illuminate the distribution of observations.

Even though it sounds like a mouthful, it is in fact a simple figure that can be translated to the layperson—that is: about how many new cases per person are in my county this week? With this metric in mind, Kate turned back to Immokalee.

Lost in the raw case counts and the early focus on the age variable was the fact that our migrant farm and essential working community was hit really hard, and hit really early. In early to mid-May, there was an entire hot spot of cases that was completely overlooked, perhaps in part due to their status as migrant workers. It wasn’t until Doctors Without Borders stepped in that adequate testing and health services were made accessible to this historically under-resourced community. [^3]
          - Kate

But then, how to map it? Is it even possible to map one story? First, we had to discuss the stumbling blocks. One issue is that mapping time-series data requires a “grain” or “resolution,” just like the pixels in an image. The temporal unit that seems to have gained traction is one in which daily new cases are averaged with weekly totals. [See footnote ^B]. The infection rates are very cyclical and semi-chaotic on a daily basis, and so, depending on the community, this weekly sampling method helps smooth out the peakedness of the observations.
The next problem is that both raw and normalized figures are important in this phenomenon, especially across scales. Put another way: some outbreaks have relatively few total cases, but these cases constitute a large proportion of that particular population (think nursing homes). Other outbreaks, however, have comparatively high case counts, but these comprise only a modest number of the whole community (think San Francisco or Singapore). Because the ratio and volume of occurrences are both meaningful, we chose to create a bivariate map: graduated colors of extruded bar charts are centered on each reporting area. For our first visual, made at the county level for the whole state, the raw number of cases per week is symbolized with an extruded length. The relative infection rate is represented with the color scheme described above. The goal of the map is to show how the risk of spread and number of weekly cases changed over time, and to see if the media coverage of the outbreak in Immokalee was justified by the data recorded.

Click here for a full screen view of the diagram. In this map the vertical dimension relates to “total number of cases per week” to show occasional disparities between risk and total cases. For example: in the last two weeks of May and the first week of June, Dade County has many more new cases per week than Collier County, but the overall risk rate is lower. Click the Population Density Map for reference. The inset map below details southern Florida around the time of the NYT article’s publication. [Map sources: ^4].

The result is interesting. Some counties appear as a small, violent blip—outliers that came and went. Nearly all urban places in Florida appear to be in Tier 3 or 4 at different points, and they follow similar regional changes. Immokalee, which is part of Collier County, is somewhat averaged out in comparison to the phenomenon as a whole. At this scale, it is hard to tell if the story of the outbreak of migrant workers was unique or atypical, but we suspected that might be the case, as the “county” reporting unit is very generalized. Meaning: Collier County includes many community types—urban, suburban, and rural—and throwing them all in the same bucket skews and sloshes the numbers. So, we decided to shift the geographic and temporal resolutions of the map to see if we could focus in on the outbreak at a local scale. Looking at just southern Florida, we used Zip Code counts at both the weekly and daily reporting units. Below, if you can bear the flickering of daily positive cases, you’ll see that these spatial and temporal resolutions do indeed show a serious outbreak in Immokalee in late May; however it also shows that when the story came out in late June, other serious outbreaks began to appear everywhere. [See footnote ^C].

AnimationGIF.gif

In this animation the vertical dimension represents new cases per day: the actual number of cases in the upper section and an average in the lower. This weekly summary helps us see overall trends—risk peaks early, but it is later matched by surrounding communities. Total weekly sums (as shown at the county wide scale) would be interesting but would require two differing linear scales, which would make the visual semantics less coherent. [Map sources: ^5].

So—you might say—why not map the whole country at the Zip Code level? Aside from creating unnecessary and excessive visual noise at that scale, ZIP Code boundaries change frequently, and are less reliable for comparing population information. We also found that the reporting was variable. In this case, the Zip Code level at a weekly temporal analysis does show us statistical significance for the New York Times story in late June. There clearly was a disproportionate level of infections in that particular community; however, these rates quickly became the norm for the majority of Florida during the major spikes of July and early August.

As cartographers, we have great power over how people interpret the world. Therefore it’s important that we’re mindful of how we both intentionally and unintentionally frame a given narrative. With something as widespread, prolonged, and fatal as COVID-19, how can we establish transparent and digestible norms for visualizing the associated trends? Throughout this process, we’ve found that giving special attention to both spatial and temporal scales of a dynamic phenomenon is, perhaps, the most important way you can control the interpretation. So many maps set out simply to engage the viewer in an emotion, or else they show a simple phenomenon with more flair than substance.   
          - Kate

Perhaps the lesson we found here was that when visualizations focus on limited variables that carry explanatory power, they not only have narrative resonance, but also can serve as useful statistical references. If you’re mapping a dynamic process, then static maps of either raw or even normalized values are only one frame in a flip-book, and even the flip-book itself can be a type of tunnel vision.
Should the goal of cartography in journalism, then, be to relay valuable information in as stoic and straightforward a method as possible? Or should it be to aid in the illustration of a narrative? Or maybe these two aren’t mutually exclusive? As data visualizers and cartographers, maybe the best we can do, at least in the case of assisting journalists, is to regard our work as more akin to that of the photographer than the writer. We set our apertures and shutter speeds, twist the lens to find the focal point, and add significance to the narrative when possible. We’re all accustomed to the outrage machine, the echo chamber, and the near-incessant barrage of information—hopefully the snapshots we offer the world lend clarity instead of confusion to the public.


Special thanks:

Middlebury College’s Geography Department, the Center for Careers and Internships, and the Robert Churchill Fund for supporting the internship program; Cathy Jewitt for her editing; and a very special thanks as well to Kate Telano. Without her spirit, persistence, and hard work, this article would probably not have come to fruition.


References, Further Reading:

1. Articles concerning COVID-19 in Florida:

Mazzei, Patricia. “Florida's Coronavirus Spike Is Ravaging Migrant Farmworkers.” The New York Times, The New York Times, 18 June 2020, link.

Wamsley, Laurel. “Fired Florida Data Scientist Launches A Coronavirus Dashboard Of Her Own.” Special Series: The Coronavirus Crisis, NPR, 14 June 2020, link.

2. Mapping COVID, examples, resources:

“COVID-19 Map.” Johns Hopkins Coronavirus Resource Center, Johns Hopkins University & Medicine, link.

Watkins, Derek, et al. “How the Virus Won.” The New York Times, The New York Times, 25 June 2020, link.

Taylor, Kelsey. “7 Best Practices for Mapping a Pandemic.” Mapbox Blog, Mapbox, 19 Mar. 2020, link.

“How Severe Is the Pandemic Where You Live?” County by County: Explore Your COVID-19 Risk Level, Brown School of Public Health, link.

“Florida.” America's COVID Warning System., Covid Act Now, link.

“The Path to Zero: Suppressing COVID-19 through TTSI.” Edmond J. Safra Center for Ethics, Harvard Global Health Institute, 1 July 2020, link.

3. Medical Relief Concerning Immokalee Farming Community:

“Florida: MSF and Local Health Partners Bring COVID-19 Testing and Mobile Health Clinics to Migrant Farmworkers.” News & Stories, Doctors Without Borders - USA, 18 May 2020, link.

“As COVID-19 Cases Increase Nearly Tenfold in Two Weeks in Immokalee, Doctors Without Borders Turns to the Press for Tests, Resources.” CIW Blog, Coalition of Immokalee Workers, 20 May 2020, link.

4. MAP: Florida COVID-19: Weekly Cases and Rate by County:

General shoreline: Made with Natural Earth data.
County level COVID data: Florida Department of Health Open Data, Florida COVID Cases by County, link. Florida COVID-19 Hub, data archive: link.
Population data:
IPUMS NHGIS, University of Minnesota, link.
Created with: QGIS, Qgis2threejs, R (script by Kate Talano), Adobe Animate.

5. MAP: Southern Florida COVID-19: Daily vs. Weekly Average Cases and Rates by ZIP Code:

General shoreline: Made with Natural Earth data.
ZIP level COVID data: Florida Department of Health Open Data, Florida COVID Cases by ZIP, link.
Florida COVID-19 Hub, data archive: link.
Population data: IPUMS NHGIS, University of Minnesota, link.
Created with: QGIS, Qgis2threejs, R (script by Kate Talano), Adobe Animate.


Footnotes:

A. On “COVID maps” and timing:

It should be noted that this post was compiled during the authors’ free time over the course of several months. There have been hundreds if not thousands of “COVID maps” published in the intervening months—we’ve only listed a small sampling of maps found during mid summer, ones that specifically addressed the nature of the outbreak itself. We hope that a healthy discussion of other COVID visuals can happen in the comments section.

B. On Weekly Averages:

Most other maps we’ve seen use a “rolling weekly average” whereas we used a “fixed weekly average.” The rolling weekly average does indeed stand on firmer ground; it samples each day by uniquely framing it with the days that immediately precede and follow it, rather than setting arbitrary week boundaries. We chose to use “fixed” for several reasons. It was computationally easier, the heterogeneity of the Zip Code data allowed for somewhat more regularity with a fixed method, and we found that our numbers were largely similar to those we saw on other sites.

C. On Reporting Trends:

There’s a great section in Visual Explanations by Edward Tufte (pp. 27-37) where he discusses both spatial and temporal sampling, specifically in relation to London’s cholera epidemic of 1854. The famous story of how John Snow used a dot density map to find the well responsible for transmitting the disease is interesting, but the discussion of whether or not the outbreak may have stopped without interference is all the more fascinating. Death rates were in decline even before the well was identified as the culprit. Depending on how you sample the time-series data that record the number of deaths—by day or by various week intervals—it can appear that: 1) the removing of the Broad Street pump’s handle significantly helped to resolve the issue, or 2) removing the handle was simply an event that co-occurred with a trend already underway. In the case of these COVID numbers, it would be callous and false to say that the New York Time’s story was insignificant—the terrible nature of the outbreak in Immokalee was definitely an outlier worthy of investigation. The comparison to make here is only that the reality of the situation changed, and it could be possible to “use” the reported data to either support or discredit pieces of the narrative. But that is, perhaps, the eternal hurdle for journalism. As soon as a story is published, the context that surrounds it will almost certainly have changed.