In this notebook, I am going to expore two dataset: COVID-19 Vaccine Distribution Allocations by Jurisdiction - Pfizer and COVID-19 Vaccine Distribution Allocations by Jurisdiction - Moderna, also combine them together.

https://data.cdc.gov/Vaccinations/COVID-19-Vaccine-Distribution-Allocations-by-Juris/b7pe-5nws https://data.cdc.gov/Vaccinations/COVID-19-Vaccine-Distribution-Allocations-by-Juris/saz5-9hgg

Data Dictionary

Moderna Columns Pfizer Columns
Jurisdiction Jurisdiction
HHS Region HHS Region
Doses allocated week of 12/14
Second Dose Shipment (21 days later) week of 12/14
Doses allocated week of 12/21 Doses allocated week of 12/21
Second Dose Shipment (28 days later) week of 12/21 Second Dose Shipment (21 days later) week of 12/21
Doses allocated week of 12/28 Doses allocated week of 12/28
Second Dose Shipment (28 days later) week of 12/28 Second Dose Shipment (21 days later) week of 12/28
Doses allocated for distribution week of 01/04 Doses allocated for distribution week of 01/04
Second dose shipment for distribution (28 days later) week of 01/04 Second dose shipment for distribution (21 days later) week of 01/04
Doses allocated for distribution week of 01/10 Doses allocated for distribution week of 01/10
Second dose shipment for distribution (28 days later) week of 01/10 Second dose shipment for distribution (21 days later) week of 01/10
Doses allocated for distribution week of 01/18 Doses allocated for distribution week of 01/18
Second dose shipment for distribution (28 days later) week of 01/18 Second dose shipment for distribution (21 days later) week of 01/18
Doses allocated for distribution week of 01/25
Second dose shipment for distribution (28 days later) week of 01/25
Total Moderna Allocation "First Dose" Shipments Total Pfizer Allocation "First Dose" Shipments
Total Allocation Moderna"Second Dose" Shipments Total Allocation Pfizer "Second Dose" Shipments

COVID-19_Vaccine_Distribution_Allocations_byJurisdiction-_Moderna.csv

Checking if the total of the first doses are identical with the total of the second doses

from row 54 to 59 are islands far away from main land, so I am going to drop those rows.

replace all the punctuation for each cells

https://www.pythondaddy.com/python/how-to-remove-punctuation-from-a-dataframe-in-pandas-and-python/

Nan value

San Antonio and Huston doesn't have any numbers, their doses are probably included in Taxes

convert columns from object to interger


COVID-19_Vaccine_Distribution_Allocations_byJurisdiction-_Pfizer.csv

Total of the first doses are same as total of the second doses

replace all the punctuation for each cells

Nan value

convert columns from object to interger


Merge two data frame and clean more

data dictionary

Column Name Description
jurisdiction States
hhs_region U.S. Department of Health & Human Services Regions
doses_allocated_12_14 Doses allocated week of 12/14
doses_allocated_12_21 Doses allocated week of 12/21
doses_allocated_12_28 Doses allocated week of 12/28
doses_allocated_01_04 Doses allocated for distribution week of 01/04
doses_allocated_01_10 Doses allocated for distribution week of 01/10
doses_allocated_01_18 Doses allocated for distribution week of 01/18
total_first_allocation Total Allocation "First Dose" Shipments from Pfizer and Moderna