i have file comma separated values this:
1.42104e+16,220899,1,,e-page remote auto,,,"allied martian banks, p.l.c.",moon,mn,, 1.42105e+16,637039,1,,e-page remote auto,,,bank of jupiter,europa,io,,
i count number of commas excluding in quotation marks such "allied martian banks, p.l.c.".
i know that:
length(i.data_record)-length(replace(i.data_record,',',''))
would return number of commas, count comma in 1st line compared 2nd when, purposes, should counted having same number.
is there quick , simple way of ignoring commas in quotation marks?
i understand create loop , start breaking string in bits, counting them, , whenever find quotation mark ignore commas until find quotation mark, know if there's simpler, more streamlined way of achieving without resorting loops.
many thanks!
eliminate delimited content first, count afterwards:
regexp_count ( regexp_replace ( regexp_replace ( i.data_record , '(^|,)"[^"]*"(,|$)' , '\1\2' ) , '(^|,)"[^"]*"(,|$)' , '\1\2' ) , ',' )
the nesting of regexp_replace
calls unfortunately necessary in order handle consecutive quote-delimited fields correctly: separating comma consumed regexp pattern , willnot taken account subsequent match.
oracle's regexen not support lookahead operator natural way handle situation.
given performance hit of regexp_... calls might better off use
length(i.data_record) - length ( replace ( regexp_replace ( i.data_record, '(^|,)"[^"]*"(,|$)', '\1\2' ),',','' ) )
caveat
this solution not handle dquotes within field values, represented ""
or \"
.
the former case can handled elegantly: instead of interpreting ""
inside quote-delimited field, consider whole field content juxtaposition of 1 or more dquote-delimited strings not contain dquotes. while wouldn't follow route in processing data (all dquotes lost), may employ perspective sake of counting:
regexp_count ( regexp_replace ( regexp_replace ( i.data_record , '(^|,)("[^"]*")+(,|$)' -- changed , '\1\3' -- changed ) , '(^|,)("[^"]*")+(,|$)' -- changed , '\1\3' -- changed ) , ',' )
test cases
-- works select regexp_count ( regexp_replace ( regexp_replace ( '1,"data,and more so","more data,and more so"', '(^|,)"[^"]*"(,|$)', '\1\2' ), '(^|,)"[^"]*"(,|$)', '\1\2' ), ',' ) dual; select regexp_count ( regexp_replace ( regexp_replace ( '1,"data,and more so",2,"more data,and more so"', '(^|,)"[^"]*"(,|$)', '\1\2' ), '(^|,)"[^"]*"(,|$)', '\1\2' ), ',' ) dual; select regexp_count ( regexp_replace ( regexp_replace ( '1,"""data"",and more so",2,"more data,and more so"', '(^|,)("[^"]*")+(,|$)', '\1\3' ), '(^|,)("[^"]*")+(,|$)', '\1\3' ), ',' ) dual; -- fails select regexp_count ( regexp_replace ( '1,"data,and more so","more data,and more so"', '(^|,)"[^"]*"(,|$)', '\1\2' ), ',' ) dual; select regexp_count ( regexp_replace ( '1,"data,and more so",2,"more data,and more so"', '(^|,)"[^"]*"(,|$)', '\1\2' ), ',' ) dual;
Comments
Post a Comment