After cleaning all our data, we moved on to EDA and making visualizations to see what features we can use to generate our model to predict COVID-19 outbreaks. The first thing we decided to look at was the vaccine distribution and adminstration per state to visualize our problem statement. Below we can see that distribution from the US government to individual states is based on population other than for Alaska.
Here is a visual that where we have the total vaccinations for the top 10 states and their populations on top.
Next we wanted to see how many vaccines are actually being administered to people once the state obtains the vaccines. Here we see that for California, less that half of the vaccines have been administered to people, while Michgan has administered 70% of their vaccines.
To see why the administration of vaccines was so slow, we looked at the number of people that had received a vaccine from January 16th. Interestingly, from January 16th to January 20th we don't see any change, and then we see a sudden spike afterwards. One thing that is concerning though is the number of people that have received their 2nd vaccine is lower than expected.
Afterwards, we looked to see if a survey conducted about mask use in each county was indicative of an area being more susceptable to COVID-19 outberaks. This turned out to not be the case as the discrepency between counties was very small and wasn't helpful. As many outbreak statistics are based on a per 100k population basis, we decided to scale our data the same way. Below are some plots showing the 7 day rolling average of newly confirmed cases and new deaths per 100k population.
We then moved on to looking at the hospital data and seeing how similar it was to the cases data. While the data doesn't line up perfectly with the cases data, it still follows a similar trend which we expected as even though someone might get COVID-19 they don't neccessarily have to stay at the hospital if their symptoms are minor. Below are charts showing the hospital data for each county, based on the total number of available hospital beds as well as the total number of available ICU beds.
Finally, we looked at the total number of cases based on population density in each county and plotted a map of it as shown below
If you want a deeper dive into our EDA process you can check out the notebooks below.