I haven't made as much progress with this as I would like, since I'm having some troubles. Not ones of a technical nature, but instead ones of a legal nature. My employment contract is apparently fairly standard and says that my employer owns all of my thoughts (even the ones when I'm writing this!). As what I thought was a courtesy, I asked my manager whether it was OK to contribute to open source projects, so that there would be no shady areas about who owned what code. Well, I'm still waiting for an answer and a bit of paper from my employer. So until I get that, I'm holding off writing any real code.
What I do have is a bit of sketching around the general area. First off, I wasn't sure about some aspects of the CPython implementation, so I asked. It got bounced by the python-dev moderator with the advice that I should post on the python-list. Which I did, but as I expected, it was more of a question for the python-dev list and the only (private) response that I've had is from Martin V. Loewis, who did the last change to that part of CPython. Maybe people are busy with PyCon. I was pointed at the C implementation, which doesn't generate that part from the UnicodeData.txt file, but instead is a horrible case statement, which looks bad. I think I need to raise a bug.
The other thing I've done with it is some Learning Tests about java.lang.Character, to see what it offers me. Obviously, this is attractive since it's a core library, is well tested, debugged and used by millions of people and all the other reasons Josh Bloch enumerates in Effective Java. It seems to have a few little idiosyncracies, which I have captured in my tests. (Note to self: I haven't seen any JUnit (or TestNG - Cedric!) tests in the Jython source tree. Must ask the dev list about that.) Then maybe I should check whether JRuby needs this sort of thing, and make it re-usable, with a Jython wrapper for the API that it needs, etc.