Constrained optimisation to allocate modules at York

A week ago today I handed in my final-year project in Computer Science (the infamous dissertation that the fine people of Twitter are completely bored of hearing me talk about). Like most in the department, mine consisted of about 20,000 words over seventy pages and a ten minute presentation describing the work I’d been doing for the last year or so. According to the staff involved, a University software project implemented by a student was something that had never been tried before, and as I had such a fun time doing it I thought I’d write a bit here. The aim was to create software that allocates students to modules.

The project had two fairly distinct halves: a web interface to let students express their preferences for optional modules, and a bit of software to take that input and do the allocation. The end result had to satisfy constraints from the departments (e.g. a class can’t have more than x students), while also trying to keep students as happy as possible – it’s a constrained optimisation problem. The more I think about it, this project could not have been any more perfect for me; half web stuff, half maths. Lovely.

The web development side of things was nothing too outrageously challenging. A lot of research and testing with staff and students to make sure it was as easy to use as possible, though the end product is just a jQuery sortable implementation. Those of you who have used the YUSU site to vote during elections will recognise that this software goes for the same mental model. Students in the Departments of History and Archaeology noticed that too, which was wonderful. Here’s a bit of video of one of the earlier functional prototypes (on an iPad, because iPads make everything cooler) – please excuse the music:

The constrained optimisation and linear programming stuff was where it started to get really interesting from a computer science point of view. The first I’d heard of CO was in about May last year, so having an expert supervisor was more than a little helpful. We used a solver (in this case Gurobi) to save a whole load of work on the implementation, and I’m shocked by how powerful it is. It’s got wonderful interfaces for several languages and is completely free for academic use! Linear programming and all that related gunk is such a huge field that if you try hard enough I bet you’ll be able to think of somewhere you could use it.

Once it was all set up, it’s just a case of creating a ton of (binary) variables, defining the objective function (what the solver should try to achieve), loading in the constraints and hitting the big red button. This system used the following:

  • Binary variables: one for every possible allocation (an allocation is a student, module pair), set to 1 if the student is allocated the module or 0 otherwise.
  • Objective function: a linear function that can be maximised or minimised by the solver. I chose to include every binary variable with a coefficient indicating the “goodness” of that allocation, which is based on the rank the student gave.
  • Constraints: each class had a maximum and minimum size, and each student had to take the required number of classes.

We’re talking about painfully simple code to interact with Gurobi, too. Here’s a bit of it in Python:

from gurobipy import *

model = Model("modalloc") # new Gurobi model
gurobivar = model.addVar(vtype=GRB.BINARY, name=student+"_"+module) # new binary var
model.update()
objfn = LinExpr() # objective function
objfn.addTerms(rank_coeff[ranks[(student, module)]], gurobimavs[student][module])
model.addConstr(num_students <= modules[module]['max'], "classmax") # constraint
model.optimize()

(There’s a slightly more fully-featured gist available, if you’re that way inclined…)

I think what I love most about this software is that it’s a great example of a nice modular web project: none of the parts by themselves are horrendously complex, but they come together to make a system that actually solved a problem – code I wrote was used by 800 students in two departments, and the data it generated is now stored in the central student database. All my fingers are crossed that IT Services in York will be able to take the code on, spruce it up a little and offer it to many more departments next year. The report’s on GitHub, but private at the moment. I’ll make it public once it’s marked if I’m able to.

The web, growing up

Jeremy Keith recently huffduffed a conversation about digital preservation. It revolves around the creation of archiving software that will hopefully help reduce the impact of link rot on the web.

(That conversation is interesting, and you should definitely have a listen. What follows is only tangentially related.)

This audio got me thinking about something that had always niggled at me in the past: the BBC not updating old article styles when they change the appearance of BBC News. Take, for example, what must be one of the most viewed BBC News articles of all time: “US rocked by terror attacks”, published on Tuesday 11 September 2001.

The BBC News site has gone through a thousand and one different looks over the last ten years (most of which spent a few weeks receiving hateful comments), but that article still looks exactly as it did on the day it was published. Contrast that with similar articles dated 2001 from The Guardian, Wired and The New York Times, and you can see they all match the current look of the site they’re a part of.

What the BBC are doing had always bothered me subconsciously, and now something’s changed. I’m not sure if it’s the audio that Jeremy posted which has swayed me, but suddenly I find myself really liking what the BBC are doing. I love that that article is still at exactly the same URL it was when it was published. I love that the source is filled with <table>s, <map>s and <font>s. And I love that the whole thing is 600 pixels wide.

It’s a piece of history. Even though the BBC have been criticised by many for taking historical content offline (including Jeremy, who hosts a sample of that content), I hadn’t realised how much I appreciate everything they put into BBC News.

I’m going to try my best to do the same with everything I publish:

  1. URLs that never change.
  2. Individual articles that look exactly as they did when they were published.

The first will be a challenge, since I’m already falling out of love with this domain name.

As to the second, I’ve made a start using the Custom Post Template plugin for WordPress. My post on York a year before I arrived here (reading that again is so odd) looks almost exactly as it did when it was first posted. (Yorkies: the Louis mentioned in that post is Louis Rose.)

And if I move away from WordPress in the future, it should be relatively easy to just keep shuffling the plain old HTML around so that everything keeps looking the same.

Most of all, I love that that York post now looks as old as it is. It was written by a sixteen-year-old in 2007, and it should look like it was designed by one too.

Site verification

Screenshot of a root directory of a web server

Is it just the root of my server that looks a bit like this? With the upcoming launch of Twitter’s analytics service for website owners (which will presumably require some kind of verification), I was wondering:

Why isn’t there a standard, similar to robots.txt, for verifying that you own a domain? verification.txt?

Screenshot of an example verification.txt file

google: Pt8rpfeQC4SCyzABFilZiJC5Tqw8f
googlehosted: Rb0S6k5si3nOj5rVfPmnzU0pYit
bing: eXB4oWQR1iVLtKp6Yk2hTq7N7JPjmoX5
yahoo: y_key_uyxZucu9dvrzsr3PWaYx9G5gLT
twitter: rqcSmFJIlugbPjXvI6vGuZi7YCiWXYR

I can’t think of a reason each service needs more than a few alphanumeric characters for this.

Contingency

I’ve been thinking quite a lot recently about contingency plans, I think partially spurred on by (re-)finding Alex Payne’s rules for computing happiness. His rules have encouraged me to try and keep things as simple as possible, and to not to rely too much on anything that’s far outside my control.

So, I present a few scenarios that could easily happen over the next few years. While I hope none of them do, thinking about it in advance can’t hurt.

What if Apple stopped making the best OS?

This is becoming more of a worry with news dripping out about OS X Lion – Apple seem to be moving towards a simpler, iOS-like experience on the desktop. And while I hope they never abandon the nerds who love using OS X for development, I’ve got a horrible feeling we might wake up one day and find the command line gone.

I’d still keep using Apple hardware (it’s the best, see below), but I’d probably move to a Linux distribution. I’ve thought about this a bit recently, and I reckon the reason I’m such a fan of OS X is because it’s much more closely related to a LAMP server than Windows 7; making web development and testing that bit easier. Moving to from OS X to Linux would make more sense than moving from OS X to Windows.

Perhaps Ubuntu to start with, and then another distribution if I my needs changed or I felt a little more adventurous. Depending on how easily they installed on Apple hardware, of course, I’ve not looked in to that.

What if Apple stopped making the best hardware?

It’s really hard to imagine a world in which the quality of MacBook Pros drops so far that they’re no longer appealing. Given what I’ve seen recently at work I wouldn’t ever touch a Dell. I think a souped-up Lenovo ThinkPad might do the job quite nicely. An IBM ThinkPad T21 was my first laptop back the early 2000s, don’t cha know?

They seem like they make decent quality, not-beautiful-but-not-ugly laptops. A cursory glance at their online store suggests something like a ThinkPad X220 might work. £1100 for a 12.5 inch 1366×768, Core i3, 8GB RAM, 128GB SSD is comparable to a MacBook Pro on price though; the argument that “Apple is more expensive” seems to have been well and truly eroded.

A couple of months ago, Engadget called it “arguably one of the best laptops [they’ve] ever tested”. That’ll do.

What if Twitter shut up shop?

This would be really tough. Twitter is a few things to me:

  1. It’s a place to meet new people with similar interests. I suppose I’d have to spend more time using Convore, Reddit, IRC? I’ve met quite a few people because of Twitter, and even got a few freelance web development jobs that turned out really well.
  2. It’s a place to socialise with friends; Facebook would probably get that traffic.
  3. It’s a place to find out the latest news that I’m interested in. At a few job interviews the interviewer has asked me how I keep up to date with the latest web goings-on. I haven’t been able to think of a better answer than Twitter and RSS.

I’d start blogging a lot more, and probably investigate installing my own service that would allow me to keep posting brief updates.

What if Spotify closed down?

Simply put, I’d have to find somewhere else to get my music. I’ve recently started paying £5/month for Spotify, and I’m sure there’s another service that would do suitably well as a replacement, though probably not with the same huge catalog. Last.fm (especially the radio) would probably get a lot more use from me.

What if Dropbox turned (more) evil?

I’d head over to Amazon and buy the smallest, cheapest USB flash drive they had going, and put it on my keys. We’ve got a few Sandisk Cruzer Blades at work; they’re currently under £14 for 16GB, and the price of flash memory drops every day. And Verbatim do 16GB in fingernail-size for £24, so that’d be another to go for.

I don’t think there’s a service currently available that offers the same convenience as Dropbox. And if the only file-syncing solutions are stupidly tough to use, I’d rather go back to carrying round my data with me.

What if the iPhone stopped being so damn attractive?

I’d probably go the route of webOS first – there’s something fundamental about the lack of attention to detail of Android that really bothers me.

I’d try the latest HP/Palm Pre in an O2 store, and see whether that works for me. And if not, probably whatever the latest HTC Desire S or Google Nexus S type thing is.

And on, and on, and on

What’s the one service that you completely rely on? And what would you do without it?

Log files, laziness and stupidity

Up until a few months ago, my site was hosted by the lovely people at NearlyFreeSpeech.NET. Although I’ve had a Linode since the end of 2009 for development, sheer laziness had stopped me from moving this WordPress installation from NFS to my virtual server.

With hindsight, I did several stupid hosting-related things over the last year and a bit:

  1. If I had just got around to it and moved the site from NearlyFreeSpeech to my Linode in January 2010, I would’ve saved myself $100 in payments to NFS.
  2. Why so much? Well, if I had kept an eye on what was going on I would have noticed that I had (for some bizarre reason, which we’ll chalk up to ignorance) disabled log file rotation in the NFS admin interface. Which meant that every time somebody visited my site, the log file grew, and grew, and grew. And I paid for it. If I had enabled log file rotation, I still would have been out $40. But not a hundred.

Here’s a graph of my monthly payments to NearlyFreeSpeech:

NearlyFreeSpeech.NET cost per month

This isn’t a knock at NFS: they provide a great service for very reasonable prices, as long as you don’t want to store too much stuff with them. I’d still happily recommend them for small(ish) sites.

Part of the reason I kept putting off the move was a worry that something would break and I wouldn’t be able to repair it. But the reason I rented a Linode in the first place was to learn more about how server maintenance and configuration works; I now know that “it might break” isn’t a good enough excuse.

The next time I procrastinate over something simple that could end up costing me $100, I’m going to slap myself hard and re-read this post.

On Spotify

Spotify’s recent announcement was an interesting one to read; I assume (correctly or not) that they wouldn’t make this change unless it was the only reason they could stay in business, and that the advertising just wasn’t supporting them as much as they hoped it would.

Yesterday was the 1st May, so their new restrictions came into force for all existing customers. They presented this screen to free users:

Please upgrade to get the most out of Spotify

Which is why it was very, very interesting to wake up this morning and see this on Facebook:

"...feels dirty but is buying spotify unlimited"

Aside from the huge sense of entitlement towards being able to get stuff for free, this is odd because Spotify have managed to get students to pay for something online. For those who may not understand the gravity of that statement, the phrase “blood from a stone” comes to mind. University students—at least in the UK—are not particularly accustomed to handing out money all over the place, especially in the form of regular, monthly payments.

Spotify is the only business I can think of that has had phenomenal success and gained millions of users by providing a free service, and moved entirely to a paid model. And they’ve set the price so low that affording it isn’t an issue for a large number of young people.

There’s been a slew of new businesses setting up with no way to make money to be seen (Twitter, of course, is the best known offender). But Spotify have apparently cracked it. They’ve made something that people rely on so much that, once it was taken away, a load of people felt the need to pay.

Searching Facebook and Twitter for mentions of Spotify yields a lot of posts about unlimited/premium sign-ups:

Spotify, what a load of crap! Guess I am forced into having to buy the premium version just to keep it :@

has caved and actually paid for spotify…dark days…

Same, I did it last night…

@Richard28Wood Bloody spotify.. I might buy it at some point. How’s revision going?

Just upgraded to Spotify unlimited, byebye adverts and time-limits, byebye £5 a month.

Might bite the bullet and pay Spotify Unlimited

There’s a lot of anger out there from people who feel like Spotify owe them something, but I’d nevertheless love to see a graph of free/paid conversions for today. I don’t think they’ve done too badly.

“Sucked into drugs by the Internet”

Euan posted a photo to Twitter tonight from a tabloid (though I’m not sure which one). The headline read:

Beautiful Issy was sucked into drugs by the Internet… it killed her

This of course is the story of 15-year-old Isobel Jones-Reilly, who (at the risk of speculating a little) died after taking ecstasy at a friend’s house.

Euan’s dead on with his caption:

Oh FFS

Arthur Martin and Tamara Cohen, writing for The Mail (God, I hate it so), are reporting that she was “sucked in by the drug-taking exploits of the celebrities she idolised” and “was hooked on the internet”. Take it from somebody who, according to his mother (!), does have an unhealthy addiction to “that machine” (as she so affectionately calls it): I may be screwed up in all sorts of ways, but I’ve thus far managed to avoid ingesting lethal doses of various drugs. I’m not particularly sure the two are related.

The way they’re trying to spin this makes me so angry:

Isobel, described as a ‘member of the Myspace Generation’, used at least seven social networking sites

As do a huge number of 15-year-olds, I’ll bet. How many die each year?

Jaye Williamson, who was Isobel’s English teacher at Chiswick Community College, in west London, said: ‘She was into the kind of things that teenagers get into, but she got hooked on the worldwide web. She was part of the Myspace generation. She got caught and we are devastated.’

I’ve seen first hand the difference between the words a journalist hears and what ends up on the page, so I’m hoping that Ms Williamson doesn’t stand by that quote. What on earth does “she got caught” mean?

There’s already going to be a backlash of protective parents stopping their teenage children from leaving the house, at least in the short term. And thanks to articles like this, it’s going to extend to the web as well. Why do journalists (shudder) always feel the need to find something that’s obviously unrelated to blame? I’d love to be a fly on the wall watching articles like this being produced. The little jokes across the office, the one-line emails, the instant messages, would all be so revealing.

I don’t mean for this post to sound insensitive. While what’s happened is obviously incredibly sad, I wish the papers wouldn’t see it as an opportunity to push the bizarre agenda du jour.

Web Application Cracking Workshop #1

Yesterday I was at the Hackspace to learn about web exploits and security testing from Darren McDonald. The following is a write-up of what we did and learnt that afternoon.

The brief warning we were given at the beginning went something like this: if you try and compromise somebody’s machine without their permission, you stand a decent chance of being prosecuted under the Computer Misuse Act. Which probably means never getting a job involving computers again. All the hardware we were trying to break yesterday was owned by Darren.

To start, we were pointed to the (free) Burp Suite and shown how to intercept HTTP requests by proxying traffic through 127.0.0.1:8080. Every request and response is shown in Burp before being passed on, with the option to modify any part of the request before it reaches the server.

As a web developer, I’d always heard that relying on client-side validation was A Bad Idea, but I’d never seen just what can happen if you don’t validate input server-side. The moral of the story?

If you’re relying on the browser to provide decent, well-formed information (including from things like hidden fields): don’t.

We were shown a purposefully badly written online store as an example, where the cost of a product was a parameter in the request (!). Using something like Burp, it’s easy to set a GET or POST parameter (the cost, in this case) to whatever you want.

And then it was time to talk about XSS. The simplest test for a cross-site scripting vulnerability on this site was to enter <script>alert(1)</script> into an input field. The example we were given was a guestbook, and we ended up with dialog boxes, JavaScript redirects and XKCD comics galore. I was reminded of the #cashgordon incident last year.

An hour after the session began, I was thinking about how session cookies are actually a huge headache, and one I hadn’t considered before Firesheep got everybody thinking. Using Darren’s session ID, it was easy to steal his session and abuse his “10% staff discount”.

We worked through some examples of pages with progressively better and better input sanitation, thinking about what the developer had done to protect their site and how we could get around it.

Moving on, we looked at SQL injection attacks and how you can then tell which kind of database the site is using. I’ve been working on limiting the permissions on, for example, the WordPress database user here, and this reinforced just how important it is.

Apparently 10—20% of sites have some kind of SQL injection vulnerability, which is a worryingly large number.

It was great to get a broad overview of the basics of web security, and it’s given me plenty to think about for past and future projects. Brilliant to see a couple of people interested in joining the Hackspace after attending the session, too. Many thanks to Darren for taking the afternoon to share his expertise, and I’m looking forward to the next one.

A long way from home

Last week, Tom wrote about how fairly new online services are affecting the way we communicate overseas. He’s right, of course. To me, the most interesting part is his third sentence:

For instance, it seems completely unreasonable that it should cost 10-20 pence for someone in the UK to send an SMS message’s amount of data to me in the US—of course negating the outlandish prices that are charged for SMS messages already.

The days of mobile operators being able to charge us way over the odds for communicating internationally are long gone. My two week holiday in South Africa is just coming to an end, so this is pretty timely: I’ve used about 400MB of mobile data this trip, which O2 would have liked to charge me something in the region of £2,400 for. Instead, I bought a local SIM card, put it in my unlocked iPhone and spent just under £20.

On previous holidays I might’ve sent a dozen text messages to keep up with friends, which would have cost something like £5. Indeed, my monthly bill during a trip to the US a while back was about double the usual figure. This holiday, I sent zero text messages. In fact, barely anybody has my South African phone number.

The difference, though, is that I did send 21 direct messages on Twitter and nine (slightly longer) messages on Facebook. And seven emails.

And the amount O2 are going to be billing me for roaming this month? £0. I can only imagine mobile companies are losing—or are going to be losing—a hell of a lot of revenue from lost international charges.

One Click Orgs

One Click Orgs is a service to allow organisations to virtualise the way they’re run, providing a legal structure to make it easy to do things like open group bank accounts and organise the decision-making process.

One Click Orgs January Hack day

It’s a Ruby on Rails app with the source available on GitHub, and I’ve been working with the team since last April. Chris and Martin have given me so much help with Rails and Git, and I’ve learnt so much over the course of the last year.

While everyone else has been focused on making the service actually work, I’ve been messing around with the way things look; in this case, using Haml and Sass.

It’s been such a great project to be involved in, especially for an open source first-timer. Everybody has answered every stupid question I’ve had, and the local monthly meetings have made things really easy.

This week version 1 was released and it’s giving me great satisfaction to see something I worked on actually out there. Being used.

Here’s some screenshots of what I helped to sort out, with Charles, Colin and the rest of the team. It begins with version 0.6, which was the latest version when I started, and continues through to the 1.0 that was released last night:

OCO screenshots

I’m very much looking forward to seeing the progress and refinement as we move on.


Looking for something else to read?

Why not try Browsing?