Chapter 3 Data transformation

3.1 Percentage data

To investigate the trend in each state, it is necessary to study the change in the percentage of bed occupations and confired/suspected cases besides comparing the real numbers. Therefore, we transform the values of some columns into percentage.

3.1.1 Increase in covid-19 cases

  • The relationship between the covid-19 cases and hospital utilization might be highly related. We are interested in how many of the suspected/confirmed cases are in each state. Therefore, we would transform the counts of newly increased cases into the percentage of admitted covid-19 cases among the total cases in each state.

  • Raw values:

  1. tot_cases: [int]
  2. new_case: [int]
  3. tot_death: [int]
  4. new_death: [int]
  • Transformed values:
  1. new_case_P: [float] new_case / tot_cases
  2. tot_death_P: [int] tot_death / tot_cases
  3. new_death_P: [int] new_death / tot_death

3.1.2 Inpatient Bed Occupation

  • Some real numbers of counting need to be trasformed into the percentage for the use of studying the trends. For example, the number of inpatiend beds used for covid-19 will be transformed into the percetage in order to see if the inpatiend beds of the hospitals are overloaded due to covid-19.

  • Raw values:

  1. inpatient_beds: [int]
  2. inpatient_beds_used: [int]
  3. inpatient_beds_used_covid: [int]
  • Transformed values:
  1. inpatient_beds_used_P: [float] inpatient_beds_used/inpatient_beds
  2. inpatient_beds_used_covid_P: [float] inpatient_beds_used_covid/inpatient_beds

3.1.3 Previous Day Confirmed/Suspected Case for Adult & Pediatric

  • To study the differences between the impact on the adults and pediatrics from covid-19, we will split the data into two groups and study the changes of suspected/confirmed cases respectively in both groups. The impact of covid-19 on two groups might be different in different states so the average changes of two groups in US will be taken as the baseline to compare with the one of each state as well.

  • Raw values: Adult:

  1. previous_day_admission_adult_covid_confirmed: [int]
  2. previous_day_admission_adult_covid_suspected: [int]

Pediatric: 1) previous_day_admission_pediatric_covid_confirmed: [int] 2) previous_day_admission_pediatric_covid_suspected: [int]

  • Transformed values: Adult:
  1. previous_day_admission_adult_covid_cases: [int] previous_day_admission_adult_covid_confirmed + previous_day_admission_adult_covid_suspected
  2. previous_day_admission_adult_covid_confirmed_P: [float] previous_day_admission_adult_covid_confirmed / previous_day_admission_adult_covid_cases
  3. previous_day_admission_adult_covid_suspected_P: [float] previous_day_admission_adult_covid_suspected / previous_day_admission_adult_covid_cases

Pediatric: 1) previous_day_admission_pediatric_covid_cases: [int] previous_day_admission_pediatric_covid_confirmed + previous_day_admission_pediatric_covid_suspected 2) previous_day_admission_pediatric_covid_confirmed_P: [float] previous_day_admission_pediatric_covid_confirmed / previous_day_admission_pediatric_covid_cases 3) previous_day_admission_pediatric_covid_suspected_P: [float] previous_day_admission_pediatric_covid_suspected / previous_day_admission_pediatric_covid_cases

date state new_case_P tot_death_P new_death_P inpatient_beds_used_P inpatient_beds_used_covid_P previous_day_admission_adult_covid_cases previous_day_admission_pediatric_covid_confirmed_P previous_day_admission_adult_covid_suspected_P previous_day_admission_pediatric_covid_cases previous_day_admission_pediatric_covid_confirmed_P.1 previous_day_admission_pediatric_covid_suspected_P
2020-03-05 AK NaN NaN NaN NA NA NA NA NA NA NA NA
2020-03-05 AL NaN NaN NaN 0.2580645 0 NA NA NA NA NA NA
2020-03-05 AR NaN NaN NaN NA NA NA NA NA NA NA NA
2020-03-05 AZ 0.3333333 0.0000000 NaN 0.9800000 0 NA NA NA NA NA NA
2020-03-05 CA 0.2000000 0.0222222 1 0.3636364 0 NA NA NA NA NA NA
2020-03-05 CO 1.0000000 0.0000000 NaN NA NA NA NA NA NA NA NA

3.2 Nationwide data

  • We group-by the data by date and summaries the counting values of some columns to investigate the trend of the nationwide situation. To study the trends of covid-19 over states and its impact on the hospital utilization, we have grouped-by the data by states and look into the situations changing along with time. Since the trends might be different due to the policies, population size and resources of each state, we will also be interested in how the differeces look like by comparing the trend of each state with the one of the whole country.