Wednesday, July 18, 2007

Jython - UnicodeData mirrored is complete!


from test_support import verify, verbose
import sha

encoding = 'utf-8'

def test_mirrored():

h = sha.sha()

for i in range(65536):
c = unichr(i)
h.update(str(unicodedata.mirrored(c)))
print "%i : %i%c" % (i, unicodedata.mirrored(c), unichr(10)),

# Value returned by Python 2.5, which uses Unicode 4.2
#verify('91cd30c6c81911835dbcbed083f99fc9fc073e4a' == h.hexdigest(),
# h.hexdigest())

# Value returned by current Jython implementation, which uses Unicode 5.0
verify('595795a212ca0ac629d6b2dfb09c703a472adb03' == h.hexdigest(),
h.hexdigest())

# Add next test!

if __name__ == '__main__':
import unicodedata
test_mirrored()


OK, it's only for the BMP, but it's a good start. Supporting supplementary characters (in Java terminology) or the other sixteeen planes would need a more fundamental change to PyUnicode, methinks. Now I need to start adding the other unicodedata methods which should be fairly straightforward. Then I'll have a working implementation to post to the dev list. Maybe end of this month, unless Baby comes and I lose my late night hacking time?

No comments: