python - Having some issues with re.sub -


in program i'm parsing japanese definitions, , need take few things out. there 3 things need take things out between. 「text」 (text) 《text》

to take out things between 「」 i've been doing sentence = re.sub('「[^)]*」','', sentence) problem is, reason if there parentheses within 「」 not replace anything. also, i've tried using same code other 2 things sentence = re.sub('([^)]*)','', sentence) sentence = re.sub('《[^)]*》','', sentence) doesn't work reason. there isn't error or anything, doesn't replace anything.

how can make work, or there better way of doing this?

edit:

i'm having slight problem part of though. before replace check length make sure it's on length.

parse = re.findall(r'「[^」]*」','', match.text) if len(str(parse)) > 8:     sentence = re.sub(r'「[^」]*」','', match.text) 

this seems causing error now:

traceback (most recent call last):   file "c:/users/dominic/pycharmprojects/untitled9/main.py", line 48, in <module>     parse = re.findall(r'「[^」]*」','', match.text)   file "c:\python34\lib\re.py", line 206, in findall     return _compile(pattern, flags).findall(string)   file "c:\python34\lib\re.py", line 275, in _compile     bypass_cache = flags & debug typeerror: unsupported operand type(s) &: 'str' , 'int' 

i sort of understand what's causing this, don't understand why it's not working slight change. know re.sub part fine, it's first 2 lines causing problems.

you should read tutorial on regular expressions understand regexps do.

the regexp '「[^)]*」' matches between angles not closing parenthesis. need this:

sentence = re.sub(r'「[^」]*」','', sentence) 

the second regexp has additional problem: parentheses have special meaning (when not inside square brackets), match parentheses need write \( , \). need this:

'\([^)]*\)' 

finally: should use raw strings python regexps. doesn't happen make difference in case, does, , bugs maddening spot. e.g., use:

r'\([^)]*\)' 

Comments