James Forrester

Tags	Research, Strategy, Tools
Primary Session	Evolving the MediaWiki Architecture
Secondary Sessions	Research, Analytics, and Machine Learning

Fundamentally, Wikimedia's technology are tools to achieve our mission - absolutely vital tools, but not objectives in themselves. Where a tool has dulled we should sharpen it, where it has rusted we should polish it, and where it has blunted we should replace it.

The majority of our tools have sprouted over time in response to immediate needs, and grown ad hoc when we've spotted something they can also do, or been pruned back when they proved too unwieldy to retain. Our communities have taken these tools and built amazing things with them, often despite rather than in line with their intended use. Subsequently these unplanned use patterns have shaped what we think about the tools and how they should be used, when we do so.

This haphazard, tactical development has worked well enough, but has limited us in several ways. We often fail to serve some of our audience because we rush in with a quick fix that listens to a few voices and decides that that's the best thing to build. When we've tried to build more systemic change, it's often been unrooted in serious evidence, and so is like constructing ivory towers into the clouds: baffling, hopeless, and unfamiliar.

We should develop comprehensive methods to collect and monitor actionable data on how well our tools are serving their purposes, and where we can improve. This should come from all stakeholders, covering our great, already-empowered, experienced editors in major languages but also those from whom we rarely hear - those contributing in and speaking smaller languages or not interacting with other users on meta-editing issues, and those with a looser relationship to the movement like readers and casual editors.

We should have numbers clearly attached to our tools as to how we expect them to perform. How these are obtained will differ. Sometimes quick numbers like success rates of false positives against false negatives from anti-abuse features, or how many users having made changes try to press the submit button, will work. Sometimes simple surveys with expected happiness thresholds will be appropriate. In others we may need to work harder to come up with the right way to understand how different tools and experiences interact with each other, like how much "knowledge" readers successfully glean from the article, or whether the burden of allowing logged-out editing is worth the mindshare of "anyone can edit" feeling true.

Ideally, changes to user features and especially introductions of new features should progressively roll out based on these numbers - and if they have adverse effects, they should be automatically rolled back. This is how others operate, but it's very distant from today. It's a far-off dream now, but I believe we can build it.