5) Predictions, Recommendations and Conclusions

Predicting the number of vaccines that will administered

Getting all the datasets

EDA on county population

EDA on prediction data

EDA on vaccination Adminstration data

Two steps to be added for this

Step 1: need to roll up the data for 7 day ahead. i.e 2021-01-16 should be summed up for 6 days ahead.

Step 2: need to split up the data for each county by date.

Step 1: need to roll up the data for 7 day ahead. i.e 2021-01-16 should be summed up for 6 days ahead.

Generating recommendations for how many vaccines each county should get

Merge data sets together

create a dataframe that has County, Date, Prediction, County population, vaccine allocation

Append the 'county to population ratio' to merged data set.

Calculate the DELTA the difference between predicted new patients and vaccines available


The primary findings from our study are as follows:

(1.) With 2.8M vaccines allocated to California over next 22 weeks ( till July 2021), no county within California should have a shortfall for vaccinating everyone.

(2.) There is uncertainity and external factors that impact the findings from this study. These include (but are not restricted to) Goverment decisions, cadence of vaccine distribution and adminstration, spread of new strain of virus, opening up of cities and suburbs, which may have impact on the spread of virus. A majority of these decisions are driven by goverment policy. All factors remaining the same, there results from the study can be implemented.

(3.) Counties in Southern California esp. Los Angeles, Orange County and Riverside are expected to have a spike in new cases (by volume) and Los Angeles, Marin County and Kern County will see a spike (by density). Recommendation is to keep a watch on the number of Covid Patients in these counties in case vaccine resource reallocation is needed.

(4.) This study also raises the question of ethics around life saving resource allocation. Using data techniques to divert life saving technology brings up the consideration on making decision whose impact can be irreversible. Also, the disease may impact demographics like age, gender, race ( and other ethonographic parameters) differently which introduces complexity in the decision making.