Wednesday, July 18, 2007

Jython UnicodeData mirroring



for i in range(65536):
c = unichr(i)
print "%i : %i%c" % (i, unicodedata.mirrored(c), unichr(10)) ,



jabley@miq-jabley ~/work/eclipse/workspaces/personal/jython-trunk/jython
$ diff jython-mirrored.txt python-mirrored.txt
10177c10177
< 10176 : 0
---
> 10176 : 1
10180,10183c10180,10183
< 10179 : 0
< 10180 : 0
< 10181 : 0
< 10182 : 0
---
> 10179 : 1
> 10180 : 1
> 10181 : 1
> 10182 : 1
11779,11782c11779,11782
< 11778 : 0
< 11779 : 0
< 11780 : 0
< 11781 : 0
---
> 11778 : 1
> 11779 : 1
> 11780 : 1
> 11781 : 1
11786,11787c11786,11787
< 11785 : 0
< 11786 : 0
---
> 11785 : 1
> 11786 : 1
11789,11790c11789,11790
< 11788 : 0
< 11789 : 0
---
> 11788 : 1
> 11789 : 1
11805,11806c11805,11806
< 11804 : 0
< 11805 : 0
---
> 11804 : 1
> 11805 : 1

I was hoping to use java.lang.Character.isMirrored(char), but the above is the result of diffing the output for jython and python running my test and diff-ing the output. Looking in more detail, Java 1.4 supports UCD 3.2, then Java 5 and Java 6 both only have support for UCD 4.0.

jabley@miq-jabley ~/work/eclipse/workspaces/personal/jython-trunk/jython
$ python
Python 2.5.1 (r251:54863, May 18 2007, 16:56:43)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodedata
>>> unicodedata.unidata_version
'4.1.0'

And I'm getting the sneaking feeling that I've done something like that before, but it's been so long since I did any development in this area that I've forgotten it!

No comments: