python - Parsing tokens with PLY -


i've been trying parse given text ply while , haven't been able figure out. have these tokens defined:

tokens = ['id', 'int', 'assignment']  

and want classify words find these tokens. example, if scanner given:

var = 5 

it should print this:

id : 'var' assignment : '=' int : 5 

this works fine. problem when program given following text:

9var = 5 

the output be:

int : 9 id : 'var' assignment : '=' int : 5 

this goes wrong. should take 9var id, , according id regex, not valid name id. these regular expressions:

def t_id(t):     r'[a-za-z_][a-za-z_0-9]*'      return t  def t_int(t):     r'\d+'     t.value = int(t.value)     return t  t_assignment = r'\=' 

how can fix this?

your appreciated!

you say: "it should take 9var id". point out 9var doesn't match id regex pattern. why should 9var scanned id?

if want 9var id, easy enough change regex, [a-za-z_][a-za-z_0-9]* [a-za-z_0-9]+. (that match pure integers, you'd need ensure int pattern applied first. alternatively, use [a-za-z_0-9]*[a-za-z_][a-za-z_0-9]*.)

i suspect want 9var recognized lexical error rather parsing error. if going recognized error in case, matter whether lexical error or syntax error?

it's worth mentioning python lexer works way lexer does: scan 9var 2 tokens, , later create syntax error.

of course, possible in language, there syntactically correct construction in id can directly follow int. or, if not, keyword can directly follow int, such python expression 3 if x else 2. (again, python doesn't complain if write 3if x else 2.)

so if really insist on flagging scanner error tokens start digit , continue non-digits, can insert pattern, such [0-9]+[a-za-z_][a-za-z_0-9]*, , have raise error in action.


Comments