regex - How to remove the [1]s, [[1]]s and double quotes from a csv data in R? -

i've csv file. contains output of previous r operations, filled index numbers (such [1], [[1]]). when read r, looks this, example:

        v1 1                                                                                                           [1] 789 2                                                                                                             [[1]] 3                                                           [1] "png"        "d115"    "dx06"    "slz" 4                                                                                                           [1] 787 5                                                                                                             [[1]] 6                                                                       [1] "d010"           "hc" 7                                                                                                           [1] 949 8                                                                                                             [[1]] 9                                                                       [1] "hc" "dx06"

(i don't know why wasted space between line number , output data)

i need above data appear follows (without [1] or [[1]] or " " , data placed beside corresponding number, like):

789 png,d115,dx06,slz 787 d010,hc 949 hc,dx06

(possibly 789 , corresponding data png,d115,dx06,slz should separated tab.. , each row)

how achieve in r?

we create grouping variable ('indx'), split 'v1' column using grouping index after removing parentheses part in beginning quotes within string ". assuming need first column numeric element, , second column non-numeric part, can use regex replace space , (as showed in expected result, , rbind list elements.

indx <- cumsum(c(grepl('\\[\\[', df1$v1)[-1], false))  do.call(rbind,lapply(split(gsub('"|^.*\\]', '', df1$v1), indx),          function(x) data.frame(ind=x[1],     val=gsub('\\s+', ',', gsub('^\\s+|\\s+$', '',x[-1][x[-1]!=''])))))   #   ind               val  #1  789 png,d115,dx06,slz  #2  787           d010,hc  #3  949           hc,dx06

data

 df1 <- structure(list(v1 = c("[1] 789", "[[1]]",   "[1] \"png\"        \"d115\"    \"dx06\"    \"slz\"",   "[1] 787", "[[1]]", "[1] \"d010\"           \"hc\"", "[1] 949",   "[[1]]", "[1] \"hc\" \"dx06\"")), .names = "v1",   class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6",   "7", "8", "9"))

Fun enginering

Search This Blog

regex - How to remove the [1]s, [[1]]s and double quotes from a csv data in R? -

data

Comments

Post a Comment