group by - Misconception regarding Group_by and Summarize function in R[DPLYR Package] -
i had plot graph of fatalities per year. took out year date , grouped , summarized fatalities per year. when run it gives me fatalities throughout dataset.
i don't understand why? , other alternate fatalities per year.
in dataset,fatalities given per incident , every year lot of incidents happened.
crash_data=read.csv("https://raw.githubusercontent.com/gluque/analytics_task2/master/airplane_crashes_and_fatalities_since_1908.csv") > crash_data$date <- as.date(crash_data$date, "%m/%d/%y") > crash_data$date <- format(crash_data$date, '%y') > cd<-subset(crash_data,select = c(fatalities,date)) > ab<-group_by(cd,date) > ef<-summarize(ab,fatalities=sum(fatalities,na.rm = true)) > ef fatalities 1 105479
> group_by(cd,date) %>% summarize(fatalities = sum(fatalities, na.rm = true)) # # tibble: 98 x 2 # date fatalities # <chr> <int> # 1 1908 1 # 2 1912 5 # 3 1913 45 # 4 1915 40 # 5 1916 108 # 6 1917 124 # 7 1918 65 # 8 1919 5 # 9 1920 24 # 10 1921 68 # ... 88 more rows
Comments
Post a Comment