Tuesday, December 29, 2009

Objective-C for Java Developers

I'm coming from an Eclipse on Ubuntu background, but this is equally applicable for IDEA on Windows. What are the equivalents for iPhone development?

Java iPhone Notes
JUnit (unit testing framework) ? It is possible to use TDD for Swing apps, although I've been predominantly a server-side guy with client stuff happening in the browser for quite a while now. Cucumber with iPhone looks worth exploring...
Hudson (continuous integration tool) ? On my first iPhone app, it rapidly became apparent how easy it was for people to do bad merges and delete classes from the Xcode project file / strings from the UTF-16 l14n Localizable.strings file. You can argue that people should take more care; yeah, that'll fix it. git bisect is great, but a tool that builds on each commit is better.

Saturday, December 26, 2009

ScaleCamp - Queue PUBSUB

From some reason I went to this thinking PubSubHubbub, but it was more a refresher about making an app asynchronous, why you'd want to do that and how implementation complexity goes up as you go after certain desirable behaviours.

ScaleCamp - How do you scale Activity Feeds?

Popular session this - standing room only, so no notes from me.

The short answer is Redis, courtesy of Simon Willison. Since the consensus was that Redis would do the trick, we then touched on Simon's other new favourite technology - node.js.

Digression: Alex from mediamolecule made a comment about 100MB of data in a key-data structure store should only require 100MB of memory to store in such a server app (plus a little extra for housekeeping, but it shouldn't be a 1:10 ratio or similar. I didn't take that as a direct criticism of Redis but more of a reminder about choosing good data structures and the importance of CompSci (says this mathematician). I mention that, since it pricked me to investigate a suspected bad data structure in one of our apps, and coupled with the Eclipse Memory Analyser recommended by the Guardian guys, I found something that was using far too much of the heap for our Tomcat nodes; and had a change rolled out within 4 days of attending this conference. That reduced the memory footprint for that data structure from 250MB to 16MB. Ouch, shocker, but great to have found, prioritised and fixed.

Monday, December 07, 2009

ScaleCamp - Scaling Java and Oracle for the Guardian

Guardian.co.uk

Graham and various other people from the development and operations team pitching in.

3 years ago - published static files with apache SSI to fill-in gaps. Moved to a fully dynamic system. Now, they're somewhere in between.

Stack -
  • apache

  • resin

  • spring / hibernate / velocity

  • Oracle DB backend (not recommended!)


Measured the application - 1300 requests to DB just to render homepage.

Added ehcache to hibernate as 2nd level cache and added a warmup script before putting into load balancer

30m unique users per month
270m pages per month
250 requests/second at lunchtime
1500 requests/second peak.

GC tools



Google weakref cache (part of Google Collections)

Eclipse memory analyser - what's using all my memory?

Cacti for monitoring - DB usage was killing it.

8 app servers in each co-lo (London and Manchester).

400MB used by cache - churn meant was pretty ineffective.

Tried or considered ehcache distribution and jboss cache distribution.

Rejected since cache eviction via replication would have thrashed it.

memcached



massive improvement in response times, but DB load still high.

went to caching every query for 5 minutes. DB load vanished and is flat even as more app servers come on-line.

servlet filter writing to memcached made it stink fast.

took a days worth of logs and Hadoop to see how long the cache should be. 1 minute was the sweet spot.

Emergency switch to serve a static copy of the site, minus personalization features.

Daemon or script scrapes the sit; they can handle 700req/s/node when the site's operating in this mode.

new content published in this mode has a copy pressed so it can be served from disk - publish is slower than with the other system but updates still possible

Highly recommend that this sort of emergency degrade read-only mode should be built-in from the off - they've used this approach with the MPs Expenses apps built to crowd-source investigations.

ScaleCamp - Scaling Java with Shared Nothing

Thoughtworks guys again.

Basic Servlet overview - in-memory sessions don't scale, duh!

Preferred options - state goes into cookies and serialized. Security, legal aspects? Pretty well common to most frameworks these days.

Page composition in the server with proxy server holding StringTemplate objects. Interesting idea - seen variants of this in other places. I'm curious as to whether doing this could mean having a poor man's macro system for Java, since XSLTs can be written to create XSLTs; maybe Velocity templates could similarly generate Velocity templates or StringTemplate -> StringTemplate?

Again, application developers need to have a good idea of caching directives for this to work. One objection I had with this approach is that you potentially increase your hardware requirement and can open the app up to liveness failures here. Request A comes in and is serviced by Thread 1. As part of that, it makes a request to the proxy server for a template. At the proxy server, a cache miss means that another request needs to be made to the app server. Then Request A is tying up 2 app server threads. What about applications which parallelize the requests? They might use more than 2 app server request-handling threads at a time, etc.

Thoughtworks seem to do fun, interesting work.

ScaleCamp - LittleBigPlanet

Alex and James from mediamolecule.com. Fascinating perspective of embedded developers coming to server-programming and refusing to accept commonly held views on best practices for doing so. This was the surprise hit of the conference for me; I just elected to go since there wasn't anything else in that slot that I was really passionate about. I'd been talking to them both in the queue for tea earlier and made a poorly judged joke about Map-Reduce (we pretty much had this conversation). The session they ran was an awesome talk about scaling server-side within the games sector - Little Big Planet is theirs.

Written their own C-based key-data structure store, of which we're spoilt for choice just now. Alex commented that he's looked at Redis and it has some nice stuff, but when they came to need it, there wasn't anything that met their needs, and experience with running the recommended Java stack had left them with the impression that they should stick to what they know. What they know is writing very tight code in constrained environments, so applying that mind-set to server-side development seemed to have yielded some very pleasing numbers. Other parts are in Ruby (presumably 1.9, since they're using Fibers?). I didn't get around to asking James how well that works or which implementation they're using. Very happy with that programming model though - James is or was a Java guy - funny how nice Ruby feels coming from there!

ScaleCamp - Varnish

Artur Bergman talking about Varnish; this followed on from the Squid talk and most of the same crowd hung around for this one.

purges gone from multicast to Rabbit MQ (damn can't remember if I got that right or what it means!)
2 8-core servers in London data centre 350MB/s with 5000 requests/sec. Intel X25 SSDs have changed a certain class of application. If disk is the new tape, then it's probably still acceptable to go to disk if you're running those babies. See also Last.FMs experience with them.

Attempt cache hits in all data centres (UK -> US) before going to the app. Much better performance.

CDN gets broken with query string parameters - common misconfiguration which can be defended against.

Varnish protects against thundering herd. Interesting - need to read more about that to better understand it.

ScaleCamp - scaling with Squid

Summarising a recent Thoughtworks experience with this, from Chris Read and Sam Newman.

This was for a high volume retailer. A Proxy / caching solution was supposed to be provided and TW would do the app. At t-4, it turned out that TW would also have to provide the proxy / cache, so this was a hasty investigation into Squid.

1 hardware LB
2 Squid boxes - 8 core machine

1 carp process
lots of child processes

going to 16 app servers.

16,000 requests per second = 5% of traffic

TTL for items ranged from 5 minutes to 1 hour for stable furniture.

whole site does 250 million requests per day

peak 24,000 requests per second

importance of good HTTP Caching directives. Discussion of making the application (and by implication, the application developers) aware of considering ETag / Expires / Vary for all parts of the application, versus just adding it via apache ReWrite or similar. Most people in the room (including me, very strongly; RFC2616 is my favourite RFC) were in the former camp.

heap LFUDA was a good change to make.

Tried Varnish, but couldn't get good numbers out of it, in the timescales available. Artur opined that Varnish should provide better numbers than Squid; he's arguably conflicted, but seemed pretty convincing! Another factor in that was very likely that they were running RHEL old shit, and Varnish works best with a shiny new kernel (cite?)

Update: they also presented a version of this talk at DevOps 2010.

ScaleCamp - State of the Nation for Monitoring

Where are we now and what's broken with it?


People are almost getting to the point of needing more powerful machines to do the monitoring than the app servers! Maybe something's broken somewhere...


RRDTool - overall, the consensus seemed to be that this was a little dated.



  1. Does lots of writes due to the way it stores data

  2. throws away data by the way it aggregates - to see fine-grained data of last years sales, you need to keep a backup of the files / graphs, rather than being able to query it.

  3. can't be cleansed of bad data, or it's a bitch of a job to do so.


Alternative options



  • hbase

  • reconnoiter

  • Tokyo Tyrant / Cabinet

  • timesplicedb looks to be an interesting attempt to provide a replacement. More language bindings needed, don't be shy!

A good start to the conference for me and it gave me a flavour of the depth and breadth of discussions available.

ScaleCamp UK 2009

On Friday I was fortunate enough to attend the inaugural ScaleCamp UK event, organised by Michael Brunton-Spall at the Guardian. This was a great conference. It was a BarCamp-style approach (not that I've been to BarCamp yet!) with the schedule evolving over conversations and planned on a board in the morning. Some of the sessions I took notes at; others were standing room only, so I'll try to remember what was talked about. Obviously, this is a personal perspective focused on my interests; others should be blogging about Javascript and the like.

I met lots of very passionate, smart people doing cool stuff. That bodes well for the economy; if you want to do interesting work, then hooking up with any of the people that attended there wouldn't be a bad place to start.

Thursday, December 03, 2009

iMac - Guide for Linux Users

Got a MacBook Pro recently at work, for doing more iPhone stuff. I've long admired Macs as hardware, but haven't ever owned one due to an irrational distrust of Steve Jobs. Oh well, lots of friends recommend them and have told me it's the best computing experience going. I'm expecting a learning curve, but here goes:

  1. Go through basic setup for my user account. No friction so far apart from the keyboard. I know which keys to check when I'm installing a Linux, so I do the same. SHIFT+2 and SHIFT+' give me @". WTF? Need to remap certain stuff; that's not the British English layout I'm used to; none of my other computers are Macs and I have 10 years of muscle memory when it comes to typing. I'll come back to fixing that. First off, Apple | System Preferences | Keyboard | Modifier Keys and sort out Caps Lock and CTRL, for good emacs usage.

  2. Where's a bloody terminal? Spend 5 minutes learning the nuances of the trackpad (I'm used to a nipple) and then drag one out of Applications | Utilities onto the Dock.

  3. Good, ssh is available. Copy my SSH keys and config off the Dell laptop running Ubuntu 9.10. Test all ssh stuff and grin like a maniac.

  4. Right, best install any system updates before I start configuring the arse off it. For a Unix, Mac OS X seems to need a lot of restarts for simple stuff like iTunes updates.

  5. WTF!? Nothing like aptitude? That seems like a glaring omission. What are my options? Googling seems to point to Macports, Fink and Homebrew as the available options.

  6. IRC client - download Colloquy and start talking to real people about their experiences.

  7. After that very small skewed sample, decide to go with Macports for now with an intention to properly evaluate Homebrew Real Soon Now.

  8. Install git with git-svn support. $ sudo port install git-core +svn gitX

  9. Checkout my github stuff. Git is missing something - completion!

  10. $ sudo port install bash_completion
    $ curl -o git "http://repo.or.cz/w/git.git/blob_plain/HEAD:/contrib/completion/git-completion.bash"
    $ sudo mv git /opt/local/etc/bash_completion.d/

  11. Download the behemoth that is Xcode from the Apple Developer site and start checking out Objective-C



That'll do as a minimally usable system for now. Hardware-wise, it's a delight. Being able to watch all of InfoQ content without the teething problems that I always seem to have on Ubuntu is just a major relief - I've got a lot of stuff in delicious tagged from there that I've never managed to watch, so I can start getting through that backlog as well.