in java i'm looking regular expression accepts persian( or arabic ) letters except persian ( or arabic) numbers. in order have letters found regular expression:
[\u0600-\u065f\u066a-\u06ef\u06fa-\u06ff]
although true , works me, know can use \\p{l}+
regular expression accepts letters languages in world, , in case ( arabic - persian ) can modified , use [\\p{inarabic}]+$.
but using [\\p{inarabic}]+$
not arabic(persian) letters going accepted arabic numbers acceptable too, ۱ ۲.
so question how can modify [\\p{inarabic}]+$
accept letters not numbers, or in other word how can restrict [\\p{inarabic}]+$
not accept numbers?
please notice persian(arabic) numbers these: ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹ ۰
you can use following regex:
"[\\p{inarabic}&&\\pn]"
\p{inarabic}
matches character in unicode block arabic (from u+0600 u+06ff)
\pn
matches character not belonging of number category (note capital p
).
intersecting 2 sets give desired result: both digit ranges (u+0660 u+0669) , (u+06f0 u+06f9) excluded.
testing code
for (int = 0x600; <= 0x6ff; i++) { string c = "" + (char) i; system.out.println(integer.tostring(i, 16) + " " + c.matches("[\\p{inarabic}&&\\pn]")); }
Comments
Post a Comment