James' Pad

Friday, November 21, 2014

VelocityConf EU 2014 – Day 1

TLDR; Velocity is a great conference for web and operations people, and why didn't you go already?

Is TLS Fast Yet?

This was a talk by Ilya Grigorik full of practical, actionable things that you can do to serve your site over TLS, and make it fast.

My notes on Ilya's talk.

Monitoring: The Math Behind Bad Behavior

Theo Schlossnagle gave an excellent talk (which didn't involve much maths) about the problems that Circonus see with handling massive amounts of data, and reliably detecting anomalies. I found this quite hard to take notes, and it wasn't as practical in my context as the first talk, but still really interesting.

My notes on Theo's talk.

Design Reviews for Operations

Mandi Walls of Chef showed us how operations should be involved early on. She did a great job of emphasising the importance of having the right people having the right conversations at the right time.

I felt a little over-qualified for this talk, given that I've worked with Gareth Rushgrove for most of the last 3 years, and helped write some of the user stories for operations that GDS published on GOV.UK. Not everyone has had that privilege though!

My notes on Mandi's talk.

What Ops Can Learn From Design

Rob Treat of Omniti brought together The Design of Everyday Things and The Art of UNIX Programming to show how designing with empathy to create intuitive interfaces can be easy to overlook, but can have a massive impact on people using your stuff.

My notes on Rob's talk.

Statistical Learning-based Automatic Anomaly Detection @Twitter

Anomaly Detection seemed to be quite popular this year (see Theo's talk and Baron's proposed talk). Here, Arun Kejariwal talked about the state of the art, how it didn't quite fit for Twitter's usage, and what they did about it. The tools and code should be open-sourced in a few weeks, so people can plug it into their own problems.

My notes on Arun's talk.

Your Place or Mine: A Discussion of Where to Host Your Site

This was an emergency panel convened since the originally planned speaker had something come up. Nice end to the day, talking about cloud and similar issues. Michael did a nice job of not answering someone that seemed to be either aggrieved, or trolling quite hard. He's a proper civil servant.

Sunday, April 13, 2014

Things GDS doesn't tell you

Working at the Government Digital Service (GDS) changes a person. It’s not a thing we highlight; just acknowledge internally in furtive conversations. This post should expose a few truths about that.

Pedantry

Words are really important. A common effect of working at GDS is that large parts of the internet become unusable for you, since the writing is so poor. Caring about serial commas is the norm.

New ideas

Content design. User research. These are all things that were new to me, and it turns out they have a massive impact in creating award-winning web sites. Another portion of the internet becomes blacklisted since it fails to meet your minimum standards for user experience.

Intolerance

Working with amazing people every day has a horrible effect on an individual. Working with less talented people becomes very unattractive. This is a deliberate retention policy strategy, and seems to work very well.

Elitism

Presenting well is a learned skill. Once you’ve learned it from one of the best there is, you start to notice things. Bad things. Powerpoint things. You cannot unsee these things.

If these side-effects repulse you, make sure you don’t apply to work here.

Thursday, September 27, 2012

How to check the encryption used in a zip file

Sadly, I've not found a nice CLI way of doing this, but I recently had to validate that a 3rd party was transferring files to us in an approved way (AES 256-bit) and this is what I did:

Open the zip file in emacs.
Use fundamental-mode to stop showing a listing of the zip contents. (M-x fundamental-mode)
Use hexl-mode to get a binary view of the file. (M-x hexl-mode)
Search for the string "0199 0700" to find the AES Extra header field. (C-S 0199 0700)
Check that 2 bytes after the 0700 (skip the 2 vendor bytes; 0200 below) is 4145 (the characters AE) followed by 01, 02 or 03 representing the AES encryption strength. In our case, we wanted 03, or AES-256.

Thursday, September 13, 2012

How to convert an Oracle .dmp into a more portable format

One of the things that I've come across has been legacy applications which use Oracle; where we don't have access to the database, but do get provided with Oracle .dmp files. These aren't very helpful when all you have (or are used to!) is MySQL and PostgreSQL.
One approach that I've had success with is to download a VirtualBox image of Oracle, and then play with the data in there. I chose the Database App Development VM since I wasn't sure what parts of Oracle I'd need, having strenuously, pretty successfully, avoided Oracle for most of my time in the industry.
I then imported this into VirtualBox (running on OSX Lion) and configured networking:

One adapter running NAT so that I can browse the internet in the guest OS. This let me download the .dmp file to the guest OS filesystem.
Set up Port Forwarding for the NAT interface so that I can ssh to port 2022 on the host which will go to port 22 on the guest, thus allowing ssh access.
Optionally, I set up the VM to have another adapter (Host-only) so that I can set up NFS shares to mount part of the host filesystem under the guest.

Next, I needed to get the .dmp data onto the guest OS (and later get the transformed data off the guest). ssh-copy-id is good for this, to put an SSH public key into the authorized_keys for the oracle user on the guest. You can also get data into and out of the guest using python -m SimpleHTTPServer ran in the appropriate directory, which let me browse the host filesystem or guest filesystem as needed. ifconfig in the host or guest lets me know which IP address to use.
Now, I needed to create a tablespace and user to allow me to import the data. I advise doing this, since (for me at least!) importing the data is an iterative process, and creating a separate tablespace (with separate data files) is a good practice since it avoids bloating the system tablespace and means that disk space can be reclaimed. Pretty much the only Oracle knowledge I have! Before you create the tablespace, it's a good idea to check the size of your .dmp and available space on the filesystem. I had a 1.4GB .dmp which didn't fit into the space left on the fs and I burnt a bit of time figuring out Oracle error messages for the failed import before I worked out the filesystem wasn't big enough. In this case, I created a symlink in $ORACLE_HOME/dbs/ which pointed to a large enough partition and set the owner / permissions as required. Creating the tablespace was just a case of running:

$ sqlplus / as sysdba
...
SQL> CREATE BIGFILE TABLESPACE mytablespace DATAFILE 'mytablespace/f1.dat' SIZE 20M AUTOEXTEND ON;

Tablespace created.

SQL> CREATE USER myuser IDENTIFIED BY password DEFAULT TABLESPACE mytablespace;

User created.

SQL> GRANT CREATE SESSION,CREATE SYNONYM,CONNECT,RESOURCE,CREATE VIEW,IMP_FULL_DATABASE to myuser;

Grant succeeded.

SQL> exit

We should now be in a position to try to import the data.

$ time imp myuser/password file=path/to/data.dmp full=yes

If this fails since the user that it was exported as is not the same as the user you created, then stop the import and clear out the user and tablespace.

$ sqlplus / as sysdba

SQL> DROP USER myuser CASCADE;

User dropped.

SQL> DROP TABLESPACE mytablespace INCLUDING CONTENTS AND DATAFILES;

Then re-create the tablespace and the new user and try the import again.
Once the import has succeeded, you want to get the data out of the database into a less proprietary format. One way is to use SQL Developer (a GUI tool included in the VM image).
Open SQL Developer and define a new database connection:

Connection name: myuser
User name: myuser
Password: password
Save Password?: Checked

SID is orcl rather than xe, in the Developer Days VM that I used.
Test the connection. It should work. Then open the connection and examine the tables.

In the menu, click Tools | Database Export
Want to export the data only, into CSV.
Choose the connection, choose the tables, choose the destination file.

For large databases, this can take a while to process (2.5 hours for my case). It may be faster to write your own export routine using Perl, PL/SQL or similar. In the end, that's what I did, so that I could script the entire process like so:

Tuesday, May 08, 2012

Joining GDS

Basically echoing what others have said. I've held off working in London forever, not wanting to spend a large portion of my day on a train. But then something like this comes along, with an opportunity to transform how Government delivers services (work on stuff that matters), and working with a ridiculously talented set of people. Chances like that don't come along often. It's going to be an exciting ride, and I'm grateful to my wife and kids for letting me get on.

Saturday, March 03, 2012

My reaction to Raganwald's "How to do what you love"

You should buy this book. I know (all?) the content is available online already and if you've been reading raganwald's output over the years, you might have already read the articles collated in this slim volume. I still suggest you should buy the book; my only nitpick was that (at the time I purchased it) the maximum payable price seemed lower than what I would have paid.

I guess for me there were 2 reasons to buy it. One is partly a reflection on my evolving personal philosophy, that people who create great stuff should be somehow rewarded, so that they can carry on creating great stuff. In Renaissance times, this would be patronage. These days, tip jars or similar can be simple, low-friction ways of allowing a much larger potential audience to support an artist. Also, I prefer to buy stuff that is free, because I am fortunate enough to be in a position to do that, and to try to ensure that the supply of free stuff doesn't dry up.

The second reason is that I am grateful to Reg for providing me with so many hours of stimulating thought.

I don't believe I had previously read all of the compositions, and 3 things struck me upon reading this book.

First, I don't have a publicly viewable portfolio demonstrating that I am in any way a competent professional. There are the odd normal bunch of patches littered in various libraries that I use or have used, and one former employer released a large chunk of their code as open source (but with all identification / attribution removed) but there is nothing meaty that is mine (apparently, apart from vbunitfree, which is very dead). I have in the past railed against walled gardens in terms of mobile carriers and their view of the web; in this case I have been working with other walled gardens, in terms of writing code that is proprietary, for corporate entities. My github account needs some TLC to showcase my skills.

Second, in recent years I have neglected communication and other soft skills, choosing instead to focus on technical skills for quite some time. That is a mistake. As I've got older, I've come to think that communication is more important; it's all about the conversations you have with people. Reg certainly seems to share that viewpoint. This blog was initially created since all of my blogging output was going onto an internal, employer-owned blog and I wanted to develop those skills further (and stop putting all of the good stuff in a walled garden!). I need to dedicate some time to this.

Finally, NDAs are evil. In that instance, not only is your professional output (in terms of code at least) locked up in a walled garden so that no-one can view it, but neither can you even talk about it. I agonised for a long time about the last NDA that I signed. No more. If you need me to sign an NDA, I suggest that perhaps you need to examine why you are asking me to do that. Surely you should have confidence in your ability to execute on a plan, and the speed at which you will iterate?

Friday, February 24, 2012

Merging Subversion trunk into a branch; how to deal with merge conflicts

TLDR meh.

I'd inherited a 4 month old branch which needed to be merged back into trunk at some point. As a first step, I wanted to merge the (hopefully smaller) changeset from trunk back into the branch. I tried git-svn. It didn't work for me. This has not been a pretty task.


$ pushd path/to/svn/repo
$ svn sw https://example.com/svn/project/branches/my-branch
$ svn up
$ svn merge https://example.com/svn/project/trunk --accept postpone
...
svn: One or more conflicts were produced while merging r3097:4432 into
'.' --
resolve all conflicts and rerun the merge to apply the remaining
unmerged revisions

At this point I have my working copy in a partially merged state with various file-level and tree/directory level conflicts. As an example of how a repository might get into this state, imagine this happening in the branch


$ svn mv dir1 dir2
$ mkdir dir1
...
# add files to dir1
# and commit a few times.

Meanwhile in trunk


...
# add and modify files in dir1
# commit a few times.

Since the changes hadn't been cherrypicked, you get tree conflicts. Let's take a look at those conflicts.


$ svn stat | grep 'C '

I had 46 issues listed for the merge up to this point. File level conflicts can be easily resolved using fmresolve which I've written about previously.


$ fmresolve path/to/conflicted/file

and then


$ svn resolve --accept working


$ svn resolve --accept theirs-full


$ svn resolve --accept mine-full

Tree conflicts can only be resolved using the working copy, so I needed to checkout / copy the relevant file and edit until I was happy with each one, and then mark each conflict as resolved, accepting the working copy. 21 of these needed attention at this stage.
Then you can proceed with the merge.


$ svn merge https://example.com/svn/project/trunk --accept postpone

Repeat until done.
Hopefully merging the branch back into trunk will go a little easier.