Based on Canada's and Alberta's official statistics.
This dashboard is a collection of tabs and panels. If you hover your mouse over the borders of one of the panels, you'll see the cursor turn into an arrow. If you click and drag, you can change the shape of those boundaries. Go ahead, try moving one or two and get a feel for how they work.
Across the upper-left of a panel, you'll see the name of a tab: "Welcome / Help", "Overview Map", and so on. These represent the visible part of the panel. Next to the tab name you'll see an "x". For instance, that map is kind of buggy. Let's get rid of it! Click on the "x" next to "Overview Map", and watch it disappear. As an added bonus, you should see a lot more of this help text.
If you look to the left of a tab name, you'll see a box and another "x" in the upper-right of the panel. Hover your mouse over them, and they'll give you hints as to what they do. Try maximizing the "Welcome / Help" panel. Neat, right? The box has turned into a line, which you can use to "minimize" this panel back down to normal. You should probably do that now, it's tough to read text that wide.
That map did look quite pretty, though. Let's bring it back. Hover the mouse over that bright red bar at the top, and series of buttons will pop down. Click on the "Overview Map" button, drag it into this panel, and release it. The map is back! And, well, buggy. Not a problem: refresh the page, and it'll fix itself. Did you notice all the panels stayed in the same place? This means that all the time you spent customizing this page won't go to waste if you refresh the page or even close the tab. No luck bringing it up? Not a problem, you scrolled past another set of those buttons on this panel. They function the same, yet will only disappear on you if you close this panel.
A quick aside on those map colours. They're calculated by tabulating up the average number of new COVID-19 cases in the past week, then assigning a colour gradient based on the maximum average found. This means that it auto-scales with the data, so a vivid colour doesn't necessarily mean "nasty outbreak", but instead "this area is worth keeping an eye on." In contrast, some districts have done especially well and managed to record no new cases in the past two weeks. Since most people with COVID-19 who show symptoms will do so within two weeks, that's a pretty good indicator that either nobody in that district has COVID-19, or they aren't getting sick in hospitals that can detect COVID-19.
We're just getting warmed up, though. Hover over the bright red bar, click on "Summary", and drag it immediately to the right of "Alberta: New Confirmed and Probable Cases." We just put two tabs in the same panel! Use the drop-down above the graph to change the data you're viewing (why would we need two of the same graph, after all?). Notice the name of the tab changes to match. You can click back and forth between each tab, too. Want more? Go on, add more! When you add too many to display, a little upside-down triangle will appear to allow you to switch one of the visible tabs for an invisible one. Added too many? Get rid of them all by clicking on the panel's "x", the one that says "close" when you hover over it.
That "Preview" panel has likely been begging you to click on something. Might as well humour it: scroll around on the map for a bit until you find a district that strikes your fancy, and click on it. Voila, there's now a chart where the "Preview" panel used to be. You can change the data series being shown, but unlike the "Summary" panel you can't change the district you're viewing. That's easy enough to fix: click on another district, and that one will take the first one's place. If you want to compare multiple districts, hover on that bright red bar and drag a new "Preview" panel into place. Now if you click on a district, that first panel will stay the same and the second will change. You can drag as many Preview panels into the dashboard, and the last one you drag will be the one that changes if you click on the map.
After all this tinkering, you might make a right mess of everything. That's ok: hover over the bright red bar to bring back the menu. See the "Reset Layout" button? Click it, and all of your changes are forgotten. As long as you remember where that button is, you can muck about to your heart's content.
The charts themselves offer a lot of options. Hover your mouse in the upper-right of them, and another menu of options will pop up. These allow you to pan, zoom, and select data. "Autoscale" shows you all of the data, while "Reset Axes" brings you back to the default two-week window; double-clicking toggles between those two. If you really like what you're seeing, there's an option to take a snapshot of it too.
Local Geographic Area: Alberta Health Services divides the province up in a complex heirarchy. At the highest level are five zones: North, Edmonton, Central, Calgary, and South. At the lowest level are 132 "local geographic areas." This lowest level is ideal for providing fine detail of the scope and spread of COVID-19, for instance (as of May 17th) it is possible to tell at a glance that Calgary and the Newell/Brooks region are the primary source of new cases in the province; that the severity within Calgary is correlated with proximity to the airport; and that the Cardston and High Level regions may be developing new outbreaks. This fine-grain approach is especially useful in Canada, where a combination of low population, large distances, and limited domestic air and rail travel help isolate population centres and reduce the accuracy of generalizations across an entire province.
Average Daily Cases: The average number of new cases per day, over the last seven days. This time period was chosen as it removes any effects due to the day of the week, while at the same time being relatively narrow and therefore responsive to short-term changes.
New Confirmed and Probable Cases: The count of new cases which arrived on a given day. Probable cases have been included with cases that are confirmed via test, as the majority of probable cases are COVID-19, and detected cases are likely an undercount of total cases. Note that these numbers may be inaccurate on a per-district basis for Alberta, on the scale of one to three cases, as due to technical reasons the culmulative cases also includes preliminary data from the given day and does not include retroactive changes due to data entry errors. The summary data for Canada does not suffer from this issue.
This series is useful as it shows the derivative of the case count. If SARS-COV-2's spread is slowing, the number of daily cases should show a downward trend; if it is infecting about as many people as recover from it, instead, these numbers will remain steady. As it only includes known cases, though, it is subject to testing bias. If only those with the typical symptoms of COVID-19 are tested, asymptomatic cases will not be detected and cannot be in this count. Estimates for undetected cases vary, but could easily double the number of known cases. Case count is usually a lagging indicator, as the median time it takes for COVID-19 to display symptoms after infection is about 5 days and in rare cases can be as long as two weeks, so if most or all people tested are those with symptoms then the circumstances that led them to be infected occurred days or weeks prior.
New Deaths: The count of new deaths noted on a given day. Note this this is likely an undercount of all COVID-19 mortality in Canada, as in line with most pandemics there is excess mortality. Canada's statistics are of limited quality at estimating excess mortality. The previous caveats for the per-district Alberta apply here as well. Deaths are also a lagging indicator, as if death occurs it happens roughly 18 days from the date of infection.
Fraction Resolved: The fraction of "resolved" COVID-19 cases to all cases. "Resolved" includes both recoveries from the disease as well as deaths. At the start of an epidemic, this number will be zero; at the end, it will be one. Thus it provides a rough estimate of where a district is within the course of the pandemic.
It is important to stress that it is only a rough measure. In South Korea, for instance, COVID-19 was well-controlled until a small number of people in a small religous cult caught the disease; as of the end of February, that handful were responsible for two-thirds of all cases in South Korea. A week before South Korea lifted their social distancing guidlines, one person visited several nightclubs. Inadvertently, they led to a new outbreak and at least 131 new cases of COVID-19. Superspreader events are rare, but when case counts are low they have a disproportionate impact. A high fraction resolved thus should be treated skeptically.
Fraction New Positives: The number of new cases divided by the number of tests performed, per day. This provides a crude estimate of the number of undiagnosed cases in the community. Outbreak areas have shown a correlation between this metric and the severity of the outbreak, for instance New York's rates have ranged from 7% to 31%, with high numbers occurring in regions with greater prevalence of COVID-19. Alberta's rate is consistently lower, but peaked at 8% when the Cargill outbreak was discovered. On the opposite end, the false positive rate of COVID-19 viral tests is approximately 0.8-4% based on preliminary research, but can vary widely depending a wide range of factors. As a general rule, then, the greater the fraction of new positives is above 4%, the worse the outbreak is.
This metric is highly dependent on who is tested, however. If most of those tested are health care workers being repeatedly checked for COVID-19, this number will be artificially deflated. If only symptomatic people are tested, this number will be artificially inflated. There is also a lag between when someone is tested and the results are known, which depends heavily on the workload and schedule of the lab performing those tests.
The objective is to investigate a short-term data-based predictive model for the total number of COVID-19 cases observed in a geographic region. Currently, we are focusing on the province of Alberta. The main idea is to look at countries/regions that experienced their onset of the COVID-19 pandemic before Alberta, and use their trajectories as fitting functions to fit the trajectory of AB to get a sense of how the trajectory will unfold in the short-term.
For the purpose of this task, the golabl case data aggregated by the Center for Systems Science and Engineering, Johns Hopkins University (JHU) was used. This dataset serves the popular COVID-19 dashboard maintained by JHU and is updated at regular intervals. It is available via github at the following link: https://github.com/CSSEGISandData/COVID-19.git
Specifically, we used the global time-series data (available at the following path) which tabluates the total number of confirmed cases on a daily-basis. In addition to country-wide data, for some countries (e.g. Australia, Canada and Chine), this datasets also includes case data for provinces.
All data cleanup and wrangling tasks were performed using the pandas library. The dataset was split into two, one containing Canadian cases, and the other containing cases for the rest of the world. From the latter, a number of comparator countries/regions were extracted.
A comparator country is one for which 100 or more cases are reported for more than N consecutive days with the first day being atleast seven days before the day Alberta reported 100 or more cases.
The value N was adjusted to further refine the list of comparator countries/regions to include only those that have at least two additional weeks of data as compared to Alberta. This way, the trajectory of AB can be compared against the trajectories of the comparator countries to get a sense of how things will unfold in the next two weeks. Animated transitions between weeks are also included to facilitate week-by-week comparisons.
A sparse linear model was used to fit the trajectory of AB with a sparse linear combination of the trajectories of the comparator countries. The orthognal matching pursuit (OMP) algorithm was used to perform the fit. The level of sparsity was varied in the range (0,1] and only fits with an R-squared score greater than or equal to 0.95 were retained.
The results of our prediction are included in the dashboard. The predictions were updated on a weekly basis. In order to get a sense of how the pandemic has been evolving in AB, a slider is included that lets the user explore the trajectory of AB on a weekly basis. The predictions are underlaid on the visualization. For past weeks, actual reported AB data is also included so that we can get a sense of the accuracy of this short-term predictive model.
&nbps;
&nbps;