i've csv file. contains output of previous r operations, filled index numbers (such [1], [[1]]). when read r, looks this, example:
v1 1 [1] 789 2 [[1]] 3 [1] "png" "d115" "dx06" "slz" 4 [1] 787 5 [[1]] 6 [1] "d010" "hc" 7 [1] 949 8 [[1]] 9 [1] "hc" "dx06"
(i don't know why wasted space between line number , output data)
i need above data appear follows (without [1] or [[1]] or " " , data placed beside corresponding number, like):
789 png,d115,dx06,slz 787 d010,hc 949 hc,dx06
(possibly 789
, corresponding data png,d115,dx06,slz
should separated tab.. , each row)
how achieve in r?
we create grouping variable ('indx'), split
'v1' column using grouping index after removing parentheses part in beginning quotes within string "
. assuming need first column numeric element, , second column non-numeric part, can use regex replace space ,
(as showed in expected result, , rbind
list elements.
indx <- cumsum(c(grepl('\\[\\[', df1$v1)[-1], false)) do.call(rbind,lapply(split(gsub('"|^.*\\]', '', df1$v1), indx), function(x) data.frame(ind=x[1], val=gsub('\\s+', ',', gsub('^\\s+|\\s+$', '',x[-1][x[-1]!='']))))) # ind val #1 789 png,d115,dx06,slz #2 787 d010,hc #3 949 hc,dx06
data
df1 <- structure(list(v1 = c("[1] 789", "[[1]]", "[1] \"png\" \"d115\" \"dx06\" \"slz\"", "[1] 787", "[[1]]", "[1] \"d010\" \"hc\"", "[1] 949", "[[1]]", "[1] \"hc\" \"dx06\"")), .names = "v1", class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9"))
Comments
Post a Comment