R data frame from data stored in a variable length concatenated string -


i have data frame contains number of features against id delimited |:

df = data.frame(id = c("1","2","3"),  features = c("1|2|3","4|5","6|7") ) df 

my goal have column each feature , indicator of presence id e.g.

id | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
3 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |

the features stored in different table complete list of possible features available better if generate dynamically.

my first attempt use horribly slow loop grepl() populate pre created matrix 'm' e.g.

  (i in 1:dim(df)[1]){   print(i)   if(grepl("1\\|", df$feature[i])) {m[i,1] <- 1}   if(grepl("2\\|", df$feature[i])) {m[i,2] <- 1}   if(grepl("3\\|", df$feature[i])) {m[i,3] <- 1}   if(grepl("4\\|", df$feature[i])) {m[i,4] <- 1}   if(grepl("5\\|", df$feature[i])) {m[i,5] <- 1}   if(grepl("6\\|", df$feature[i])) {m[i,6] <- 1}   if(grepl("7\\|", df$feature[i])) {m[i,7] <- 1} } 

ignoring fact regex fall on when features teens. terribly slow on ~400,000 rows need run over. additionally need create if() every single id instead of happening dynamically.

is there way more succinctly dynamic column generation?

the natural object return matrix. here way in base r.

# split features column pipe symbol  , subset result, dropping pipes temp <- lapply(strsplit(as.character(df$features), split="|"), function(i) i[i != "|"]) # use %in% return logical vector of desired length, convert integer , rbind list mymat <- do.call(rbind, lapply(temp, function(i) as.integer(1:7 %in% i))) # add id row names  rownames(mymat) <- df$id 

this returns

mymat   [,1] [,2] [,3] [,4] [,5] [,6] [,7] 1    1    1    1    0    0    0    0 2    0    0    0    1    1    0    0 3    0    0    0    0    0    1    1 

if want data.frame, can use

temp <- lapply(strsplit(as.character(df$features), split="|"), function(i) i[i != "|"]) mydf <- cbind(id=df$id, data.frame(do.call(rbind,                                           lapply(temp, function(i) as.integer(1:7 %in% i))))) 

which returns

mydf   df$id x1 x2 x3 x4 x5 x6 x7 1     1  1  1  1  0  0  0  0 2     2  0  0  0  1  1  0  0 3     3  0  0  0  0  0  1  1 

Comments

Popular posts from this blog

magento2 - Magento 2 admin grid add filter to collection -

Android volley - avoid multiple requests of the same kind to the server? -

Combining PHP Registration and Login into one class with multiple functions in one PHP file -