in program i'm parsing japanese definitions, , need take few things out. there 3 things need take things out between. 「text」
(text)
《text》
to take out things between 「」
i've been doing sentence = re.sub('「[^)]*」','', sentence)
problem is, reason if there parentheses within 「」
not replace anything. also, i've tried using same code other 2 things sentence = re.sub('([^)]*)','', sentence)
sentence = re.sub('《[^)]*》','', sentence)
doesn't work reason. there isn't error or anything, doesn't replace anything.
how can make work, or there better way of doing this?
edit:
i'm having slight problem part of though. before replace check length make sure it's on length.
parse = re.findall(r'「[^」]*」','', match.text) if len(str(parse)) > 8: sentence = re.sub(r'「[^」]*」','', match.text)
this seems causing error now:
traceback (most recent call last): file "c:/users/dominic/pycharmprojects/untitled9/main.py", line 48, in <module> parse = re.findall(r'「[^」]*」','', match.text) file "c:\python34\lib\re.py", line 206, in findall return _compile(pattern, flags).findall(string) file "c:\python34\lib\re.py", line 275, in _compile bypass_cache = flags & debug typeerror: unsupported operand type(s) &: 'str' , 'int'
i sort of understand what's causing this, don't understand why it's not working slight change. know re.sub part fine, it's first 2 lines causing problems.
you should read tutorial on regular expressions understand regexps do.
the regexp '「[^)]*」'
matches between angles not closing parenthesis. need this:
sentence = re.sub(r'「[^」]*」','', sentence)
the second regexp has additional problem: parentheses have special meaning (when not inside square brackets), match parentheses need write \(
, \)
. need this:
'\([^)]*\)'
finally: should use raw strings python regexps. doesn't happen make difference in case, does, , bugs maddening spot. e.g., use:
r'\([^)]*\)'
Comments
Post a Comment