r - Last observation carried forward conditional on multiple columns -
i have dataset structure:
id = c(1,1,1,1,2,2,2,3,3,3,3) l40 = c(1, na, na, na, 1, na, na, na, 1, na, na) k50 = c(na, na, na, na, na, 1, na, na, na, na, 1) df = data.frame(id, l40, k50)
when missing values occur in columns l40 , k50, want carry forward last non-missing value in column, conditional on id being same previous id , values in l40 , k50 in current row being empty. applied following code:
library(tidyr) df2 <- df %>% group_by(id) %>% fill(l40:k50)
this not achieve looking for. want previous non-missing value carried forward next row when other columns (except id) in row empty. want:
id = c(1,1,1,1,2,2,2,3,3,3,3) l40 = c(1, 1, 1, 1, 1, na, na, na, 1, 1, na) k50 = c(na, na, na, na, na, 1, 1, na, na, na, 1) df3 = data.frame(id, l40, k50)
we can use na.locf
library(data.table) library(zoo) setdt(df)[, if(any(is.na(k50[-1]))) lapply(.sd, na.locf) else .sd , = id] # id l40 k50 #1: 1 1 na #2: 1 1 na #3: 1 1 na #4: 1 1 na #5: 2 1 na #6: 2 na 1 #7: 3 na 1 #8: 3 na 1 #9: 3 na 1
an option using dplyr
be
library(dplyr) df %>% mutate(ind = rowsums(is.na(.))) %>% group_by(id) %>% mutate_each(funs(if(any(ind>1)) na.locf(., na.rm=false) else .), l40:k50) %>% select(-ind) # id l40 k50 # <dbl> <dbl> <dbl> #1 1 1 na #2 1 1 na #3 1 1 na #4 1 1 na #5 2 1 na #6 2 na 1 #7 3 na 1 #8 3 na 1 #9 3 na 1
Comments
Post a Comment