R data frame from data stored in a variable length concatenated string -
i have data frame contains number of features against id delimited |:
df = data.frame(id = c("1","2","3"), features = c("1|2|3","4|5","6|7") ) df
my goal have column each feature , indicator of presence id e.g.
id | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
3 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
the features stored in different table complete list of possible features available better if generate dynamically.
my first attempt use horribly slow loop grepl() populate pre created matrix 'm' e.g.
(i in 1:dim(df)[1]){ print(i) if(grepl("1\\|", df$feature[i])) {m[i,1] <- 1} if(grepl("2\\|", df$feature[i])) {m[i,2] <- 1} if(grepl("3\\|", df$feature[i])) {m[i,3] <- 1} if(grepl("4\\|", df$feature[i])) {m[i,4] <- 1} if(grepl("5\\|", df$feature[i])) {m[i,5] <- 1} if(grepl("6\\|", df$feature[i])) {m[i,6] <- 1} if(grepl("7\\|", df$feature[i])) {m[i,7] <- 1} }
ignoring fact regex fall on when features teens. terribly slow on ~400,000 rows need run over. additionally need create if() every single id instead of happening dynamically.
is there way more succinctly dynamic column generation?
the natural object return matrix. here way in base r.
# split features column pipe symbol , subset result, dropping pipes temp <- lapply(strsplit(as.character(df$features), split="|"), function(i) i[i != "|"]) # use %in% return logical vector of desired length, convert integer , rbind list mymat <- do.call(rbind, lapply(temp, function(i) as.integer(1:7 %in% i))) # add id row names rownames(mymat) <- df$id
this returns
mymat [,1] [,2] [,3] [,4] [,5] [,6] [,7] 1 1 1 1 0 0 0 0 2 0 0 0 1 1 0 0 3 0 0 0 0 0 1 1
if want data.frame, can use
temp <- lapply(strsplit(as.character(df$features), split="|"), function(i) i[i != "|"]) mydf <- cbind(id=df$id, data.frame(do.call(rbind, lapply(temp, function(i) as.integer(1:7 %in% i)))))
which returns
mydf df$id x1 x2 x3 x4 x5 x6 x7 1 1 1 1 1 0 0 0 0 2 2 0 0 0 1 1 0 0 3 3 0 0 0 0 0 1 1
Comments
Post a Comment