I’ve given up on trying to make DevLogs really comprehensive and complete, and just try to make them – even if they are incomplete. Perhaps this will help me actually write more devlogs, and thus make it more comprehensive…
I started on the Functional Programming in Scala course today. I wanted to do it the last time, but did not find the time. This time will be different – I’m in a different spot mentally, and a big drain on my emotional / time resources (“college”) is no longer a factor. I’ve already gone through the first week’s lectures (most seem straight ports from SICP), and am 2/3 of my way through the assignments for the first week. Though I’m normally not fond of the typical types of exercises (“How many ways can you make change for this scenario?”), doing them functionally seems to be enough of a twist to keep me interested. The regular languages I seem to work with these days (Java, PHP, Puppet, Python and a bit of JS) aren’t exactly well tuned to the functional style of programming (except JS, but I don’t really do that much JS these days) – so going through this course and hopefully building something in Scala can exercise my ‘OMG HOLY SHIT THIS NEW THING IS AWESOME!!1′ muscles.
The course recommends using Scala IDE. Which is Eclipse based. Despite my deep distaste for Eclipse, I’ve given it a shot and it seems fairly stable and much less shitty than the Eclipse I’m used to! Let’s see how that goes.
The way I learn the best seems to be to find a project I want to build, and then build it in a new language. I’m looking around for fun projects that I could build with Scala, and I’m pretty sure I’ll find one at some point. I should also try to diversify the communities I contribute to, so perhaps I should look for a non-Wikimedia project where I can contribute to in Scala. Should be fun :D
Other than that – I’ve been investigating performance problems with the Campaigns API – turns out Mediawiki’s Parser is really, really slow. Who would’ve thought, eh? ;). The ‘solution’, of course – is to just add more caching. Which is a sortof biggish, hairy-ish problem because of the number of changes that can cause cache invalidation. The way Mediawiki handles caching is… rather complex – but that is more due to how much functionality exists rather than anything else. Needs a proper ‘clean’ solution that is not just “Let’s cache everything for 5 minutes!!!1″. Should be fun to write up and fix!
I’ve also been spending time integrating Mediawiki-Vagrant with Wikimedia Labs. This will make it easy for anyone to setup a base mediawiki installation on Labs, and save time dicking around various mundane deployment-of-test-instance issues (did I turn on caching? How do I get this extension on? etc). This is interesting because it merges two different puppet repos – operations/puppet.git and mediawiki/vagrant.git on one machine and provides different ways of managing them. Since this is also using puppet as a sort of ‘deployment’ tool (rather than just a configuration-of-systems tool), that is an interesting / fun aspect too. Should be able to get the patch merged in a few weeks.
I visited San Francisco for the last 2 weeks. I don’t really feel insecure anymore :)
DevLogs have been something I've not been writing much of of late. Time to fix that!
WLM Android App
Spent some time reviving the WLM Android App. Wasn't too hard, and am surprised at how well it still runs :) Some work still needed to update the templates and other metadata to refer to WLM2013 rather than WLM2012 – but that should not be too hard. The fact that it is an issue at all is simply because I ripped out all the Campaign related APIs a few weeks ago with my UploadCampaign rewrite.
multichill was awesome in moving the Monuments API to Tool Labs – hence making it much faster! Initially we thought that the Toollabs DB was too slow for writes – but this turned out to be a mistake, since apparently the Replica Databases had slow writes, but
tools-db itself was fine. There's a bug tracking this now. Toollabs version of the API still seems much faster to me than Toolserver's :)
Mediawiki sucks. Eeeew! Specifically, writing API modules – why can't we just be happy and have everything be JSON? Sigh!
I'm adding a patch that allows UploadCampaigns to be queried selectively, rather than just via the normal page APIs. Right now, this only lets us filter by
enabled status – but in the future, this should be able to also filter on a vast array of other properties. Properties about Geographic location come to mind as the most useful. That patch still has a good way to go before it can be merged (continue support being the foremost one), but it is getting there :)
The ickiest part of the patch is perhaps that it sends out raw JSON data as a… string. So no matter which format you are using on your client, you need to use a JSON parser to deal with the Campaigns data. This sortof makes sense, since that is how the data is stored anyway. Doesn't make it any less icky, though!
Not bad for a lazy Sunday, eh?
Update: After not being able to sleep, I also submitted a patch to make phpcs pass for UploadWizard, and also fought with the UploadCampaigns API patch to have it (brokenly?) support continuing. Yay?
Someone pointed out this happening on Gerrit…
When running Redis in a shared cluster/hosting environment (such as Wikimedia Tool Labs, on which I've been having fun doing a lot of work on), you would want to try to provide at least some guarantee of isolation for your keys from everyone else's keys. Since Redis doesn't do ACLs, this is problematic.
This can be solved in a couple of ways
Run a Redis instance for each user
This is simple enough to do – each user runs their own Redis instance, and has full access to it. Security is handled by setting a secret password, and running
redis-server as the user in question. Boom, secure!
This doesn't really scale with a large number of users, because they each have lesser memory to work with now. Also having users who just want to run their tools have to deal with making sure their Redis instance is up and running fine isn't really good. Having the sysadmins be responsible for users' Redis instance is… not going to work :) This would also require all the redis instances to run on one box and/or have a separate cluster just for them, which isn't good either.
Add ACL support to Redis
Not happening, because I'm not good enough to do that yet :P But more realistically, it won't ever happen, since this will probably add a lot of overhead for what is arguably an edge case.
Build a small server that sits in front of Redis
Such a server would simply authenticate incoming requests via some mechanism (Keystone perhaps), and then enforce ACLs. It will have to speak the exact same protocol as Redis, since users should be able to use any library that connects to Redis. This isn't too hard – just replace the
password functionality of Redis protocol to take in both a username and password (or token, or some other method of auth).
This also I discarded – it will still affect performance, which will now be limited by how fast this server runs, and that is definitely not faster than Redis. And it will also be hard to maintain, since I'll have to completely mimic Redis' protocol and make sure it is kept up to date. Debugging protocol issues with random client libraries is not my idea of fun. Another major disadvantage is that I would now be writing auth code, and I don't think handrolling auth code is a good idea, ever.
Security By Obscurity!
This is what we finally settled on :D It sounds horrible by the title, but I think it is Good EnoughTM.
Since Redis is a key value store at heart, you can do anything once you know the key. So, if an 'attacker' doesn't know the key, there isn't much they can do. So it can be considered SecureEnough for our purposes if we can make it so that other users can not find out or guess your keys.
We essentially did so with the following:
- Disable all Redis commands that let users list all / many keys.
- Have users use a random and long key prefix for all their keys.
(1) prevents someone from just listing all keys to find something interesting. (2) prevents people from brute-forcing or guessing keys. Since all code run on Tool Labs must be open source, guessing keys is super easy. By having a 'secret' prefix, having the actual keys is useless. This also prevents accidental key overwrites from different tools using a common key name.
Disabling commands is easy to do by using Redis'
RENAME COMMAND config feature. I added support for
RENAME COMMAND to Wikimedia's Redis puppet module, and then it was simple enough to configure a specific instance to disable 'list keys' type commands. That's the following commands:
After going through the list of Redis commands, I am guessing this is going to be GoodEnough to prevent key listing. (Note: if there's more that I'm missing, please, please let me know).
We also tell people to use a secure prefix that's at least 64bytes long, saved in a file that is only user readable. Generating that is as simple as:
openssl rand -base64 64
That should be long enough to be hard to brute force, even with Redis being as fast as it is.
The major problem with this is, of course – the fact that humans are involved :) I've heard "I do not care about my keys, do not need security" a fair amount of times already. The fact that the prefix generation is optional means that there will be people who do not use prefixes, and it will work for them for a (probably) long time – until it doesn't, and they have no idea why. This is personally acceptable to me, since they have been made aware of the risks beforehand.
This has now been deployed on toollabs for a month or so, and I've a couple of fun tools already written using it (and other people too). We had a patched memcached server we had that we'll kill in a few weeks, so people who used memcached before are also migrating to redis. And I was able to do all this without even having root! This is mostly thanks to the fact that we try to keep all our configuration in puppet (Wikimedia's Puppet repository) – for both our production cluster and for everything else. So I could re-use our production redis module, make changes to it, and build the new solution – all while being vetted by 'proper' ops people (whom I dearly love and respect). Building infrastructure in such a collaborative manner is a lot of fun, and I think I'm hooked. It's fun!
Mediawiki has a number of humorous bug reports, though not as many as I’d like. Still, they’re quite funny.
I think my favorites are the request for RESOLVED MURDERED, the real reason why MediaWiki runs on MySQL, someone asking where the community consensus is to stop Volcanoes exploding, offers of shamanic help, a ground breaking report on US Government corruption, and a warning about zoning laws required for shipping Ponies,