Evolving the MediaWiki Architecture

From devsummit
Related Phabricator Task T183313
Topic Leaders Daniel Kinzler, Corey Floyd

13 primary statements. 4 secondary statements.

  • Analytics
  • API
  • Architecture
  • Code Review
  • Collaboration
  • Communities
  • Complexity
  • Contributors
  • Cross-wiki
  • Data Center
  • Discussions
  • Documentation
  • Drupal
  • Gadgets
  • Hosting
  • Infrastructure
  • Innovation
  • Installation
  • JavaScript
  • Job Queue
  • Knowledge as a Service
  • Lua
  • Microservices
  • Mobile
  • Multi-Datacenter
  • New Users
  • Open Source
  • Performance
  • Refactoring
  • Research
  • Security
  • Strategy
  • Technical Debt
  • Templates
  • Third Parties
  • Tools
  • User Experience


Author Tags Primary Session Secondary Sessions Position Statement
David Barratt API, Drupal, Mobile, Open Source Evolving the MediaWiki Architecture Supporting Third-Party Use of MediaWiki

The Mission

In a Wikimedia cultural orientation, the moderator instructed the class by explaining that 'technology is not part of our mission, technology is only a means to an end.' While it may be appropriate to use digital technology in order to disseminate free educational content effectively and globally, the existence of Wikimedia Technology is not strictly part of Wikimedia's mission. Wikimedia is therefore stuck in a precarious position of maintaining a large open source software project only as a means to an end. This produces a double-minded mentality within the movement: satisfying the mission versus satiating a massive software operation. It is with this understanding that Wikimedia ought to consider partnering with an existing open source community in order to evolve MediaWiki to support the mission, maintain and grow the technical community, and build technologies necessary for embracing mobility.

While Wikimedia's mission statement does not cover technology, the mission (https://www.drupal.org/about/mission-and-principles) of Drupal 'is to build the best open source content management framework.' In addition, Drupal is more than capable of handling all of Wikimedia's traffic needs and is flexible and modular enough to allow us to implement all of MediaWiki's features in a UI and API backwards-compatible way. In fact, every feature of Drupal core is an extension and every extension is a first-class citizen that has full control over every aspect of Drupal. Perhaps the next major version of MediaWiki ought to be a collection of Drupal extensions that can be run independently and are also available in a pre-configured 'MediaWiki' distribution of Drupal (https://www.drupal.org/project/project_distribution).

The primary user of MediaWiki is, by far, Wikimedia. While the software can be run by others outside of Wikimedia, it's usage outside of Wikimedia is extremely low (https://trends.builtwith.com/cms/MediaWiki). Because of the low adoption rate, it is difficult to gain any users outside of Wikimedia. As a result, there are very few developers outside of the movement that contribute towards its development and almost no outside financial commitment to the project. In contrast, Drupal's usage within the top 1,000 websites is 14 times (https://trends.builtwith.com/cms/Drupal) that of MediaWiki. 'The Drupal community is one of the largest (https://www.drupal.org/about/mission-and-principles) open source communities in the world.' By building MediaWiki on top of Drupal, Wikimedia would be tapping into a user and developer base that is substantially larger than MediaWiki's. Wikimedia would be assisting Drupal in their mission and Drupal would be helping Wikimedia in theirs.

Drupal is committed (https://dri.es/drupal-is-api-first-not-api-only) to an API-first strategy. This strategy has enabled Drupal to expose all of its resources in a consumable, highly-cacheable API. They believe strongly in this strategy, because it's part of their mission, and in doing so, helps others like Wikimedia achieve the mission on a global scale. By embracing the API-first strategy, Wikimedia would propel its mobile development into the future.

To further Wikimedia's mission, the foundation should consider using Drupal as the foundation of its software. Doing so would facilitate evolving MediaWiki to support the mission, maintaining and growing the technical community, and building technologies necessary for embracing mobility.

Jan Dittrich Documentation, Gadgets, Infrastructure, JavaScript, Open Source, Research Evolving the MediaWiki Architecture Research, Analytics, and Machine Learning

I believe that we need to achieve a better separation of concerns - in code as well as in work on product and our communication with the communities to reduce the dept we build up in these areas. Therefore, I want to suggest three interrelated topics:

  • Use of modern MVVM frameworks for our front end code, to develop more efficiently
  • Provision of a modern customization infrastructure, to decouple gadgets from our code
  • Participation beyond code and feature wishes
  1. Use of modern MVVM frameworks for our front end code

Traditionally, Mediawiki has been focused on PHP. Over the last years, more and more interactivity via JavaScript has been used. In connection with the significant growth of the JavaScript ecosystem this could have meant quicker development and a clearer separation of concerns. However, since our solutions are mostly used in MediaWiki context, involving external developers has been difficult and so has been onboarding developers in core teams.

I see a large opportunity in introducing modern MVVM libraries that are open source and not constrained to use in Wikimedia software and could build upon other's experiences as well as documentation - things that have been traditionally problematic in our isolated MediaWiki solutions.

Strong contenders are React and vue.js. While the react ecosystem is larger, I would recommend vue for its better documentation and clear compartmental structure which hopefully helps us to avoid further isolated solutions.

  1. Provision of a modern customization infrastructure

The introduction and larger use of a MVVM could also be a chance to provide clear frontend APIs for Gadgets. They currently use DOM-hacks, which break continuously and would not anyway not possible when using a modern frontend framework (due to DOM flushing).

Why should bother, since we have a large user base in which different tasks are shared using specific tools, just like each manual work has many different specialized, often even customized tools.

Additionally, gadgets/userscripts could provide a low-barrier opportunity to onboard new developers. Other organizations successfully show that user provided extensions can enhance an ecosystem with user driven innovation and help with onboarding developers, e.g. Firefox' and Chrome's WebExtensions as well as LibreOffice.

I would like to work on finding a way fulfil the possibilites of gadgets and extend them while providing sustainable and secure infrastructure for doing so.

  1. Participation beyond code and feature wishes

We already do extensive user research. A large area for expansion and further development is doing this research and sense making *with* the community. This may already be done, often implicitly, based on feature- or UI focused requests of community members. But this has large caveats: The solution may net be feasible or sustainable to implement. Furthermore, without understanding the underlying need, we risk building technical- and UX debt and give away the possibility of learning from our community.

To achieve an active, needs-based involvement of communities in design and research we could build on existing participatory design methods. They could be used and integrated in our research and product planning frameworks. Clearly integrating community in up front research could enable us to gather needed knowledge, have community participation and reach a better understanding between Wikimedia Foundation and communities as well as of the communities among each other. I want to define future participatory design strategies to be used on our way towards 2030.

Eric Evans Architecture, Infrastructure, Strategy Evolving the MediaWiki Architecture

More than just servers

A modern approach to tooling and infrastructure is needed, not only so that we may scale future content and users, but the deployment of new technologies and services as well.

Our way of approaching infrastructure is in need of modernization; What we do we do well, but if we are to grow our capacity, not just for storing and serving content, but our capacity to create the technologies that empower movement strategies, we need a departure from the idea of infrastructre as merely clusters of servers. We need high-level, easily consumed platforms for computation, storage, deployment, and management. We need environments that make experiments cheap, and allow teams to fail-fast or iterate on the next stage quickly. We need systems that are distributed by nature, self-service, secure, and that are able to provide insight into availability and performance.

Some efforts have been (or are being) made here. Examples include recent work toward a Kubernetes deployment, a change-propagation service, and RESTBase. These efforts, while worthwhile, are piecemeal, and no holistic strategy exists. It is my belief that as we discuss the future of our platform, we should consider the requirements in the context of the bigger picture, and discuss a strategy aimed at modernizing our infrastructure.

Matthew Flaschen Discussions, Open Source, Strategy Evolving the MediaWiki Architecture Embracing Open Source Software

We need to re-evaluate scaling, on both the technical community side and the content side.

On the technical side, too often we think as if we were an isolated organization, rather than a respected leader that many wish to collaborate with. This causes us to ask ourselves the wrong questions and get the wrong answers.

For example, we asked ourselves whether we should limit ourselves to existing open source translation tools, or use proprietary translation services to fill in the gaps. Instead, we should have stayed committed to open source, and asked how we can use our engineering and financial resources to advance open source translation. This is a major problem that no organization can solve on its own. However, we have both the motivation and resources to be a major contributor to the solution.

We also asked whether we should support the proprietary MP4 format, or limit ourselves to weak device support for open formats. Instead, we should be staying committed to open standards, and working to support their uptake among software developers and device manufacturers. We already have significant relationships with wireless carriers that give us a foot in the door with such manufacturers.

By seeking important partnerships, where we are prepared to put in significant effort, we can greatly scale both our own efforts and those of the broader movement.

On the content side, to achieve sustained long-term growth, we need to grow every type of user activity, including writing, editing, discussion, organization, curation, maintenance, workflows, and moderation. We have historically provided good (and improving) support for writing, editing, discussion, and moderation.

However, we have neglected the related processes of organization (e.g. categorization, tagging), maintenance (e.g. tracking articles that need fixes, updating them as they become out of date), curation (e.g. quality images, featured articles), and workflows (used in multiple areas, but particularly supporting organization, maintenance, curation, and moderation).

It is vital that we improve discussion, curation, workflows, and moderation tools. Otherwise, we will be unable to keep up with increasing content and activity as our improvements to writing and editing succeed. We should look at past successes (e.g. the Teahouse) and failures (e.g. Article Feedback) and learn lessons. In both cases, we made a very specific product, which then succeeded or failed. This is not scalable to hundreds of wikis, and it is hard to iterate in response to lessons learned.

Instead, we should focus on platforms, such as workflow systems. In order to keep up with the community, we need to give them the flexibility to constantly use the software according to their needs.

Corey Floyd Evolving the MediaWiki Architecture
James Forrester Research, Strategy, Tools Evolving the MediaWiki Architecture Research, Analytics, and Machine Learning

Fundamentally, Wikimedia's technology are tools to achieve our mission - absolutely vital tools, but not objectives in themselves. Where a tool has dulled we should sharpen it, where it has rusted we should polish it, and where it has blunted we should replace it.

The majority of our tools have sprouted over time in response to immediate needs, and grown ad hoc when we've spotted something they can also do, or been pruned back when they proved too unwieldy to retain. Our communities have taken these tools and built amazing things with them, often despite rather than in line with their intended use. Subsequently these unplanned use patterns have shaped what we think about the tools and how they should be used, when we do so.

This haphazard, tactical development has worked well enough, but has limited us in several ways. We often fail to serve some of our audience because we rush in with a quick fix that listens to a few voices and decides that that's the best thing to build. When we've tried to build more systemic change, it's often been unrooted in serious evidence, and so is like constructing ivory towers into the clouds: baffling, hopeless, and unfamiliar.

We should develop comprehensive methods to collect and monitor actionable data on how well our tools are serving their purposes, and where we can improve. This should come from all stakeholders, covering our great, already-empowered, experienced editors in major languages but also those from whom we rarely hear - those contributing in and speaking smaller languages or not interacting with other users on meta-editing issues, and those with a looser relationship to the movement like readers and casual editors.

We should have numbers clearly attached to our tools as to how we expect them to perform. How these are obtained will differ. Sometimes quick numbers like success rates of false positives against false negatives from anti-abuse features, or how many users having made changes try to press the submit button, will work. Sometimes simple surveys with expected happiness thresholds will be appropriate. In others we may need to work harder to come up with the right way to understand how different tools and experiences interact with each other, like how much "knowledge" readers successfully glean from the article, or whether the burden of allowing logged-out editing is worth the mindshare of "anyone can edit" feeling true.

Ideally, changes to user features and especially introductions of new features should progressively roll out based on these numbers - and if they have adverse effects, they should be automatically rolled back. This is how others operate, but it's very distant from today. It's a far-off dream now, but I believe we can build it.

Roan Kattouw Architecture, Contributors, Data Center Evolving the MediaWiki Architecture Advancing the Contributor Experience

Users should not be punished for logging in

WMF wikis are slower for logged-in users than for anonymous users, which is unhelpful for trying to get users to contribute. This is a long standing problem that's hard to solve, but we should have a vision for how we're going to solve it.

WMF has caching data centers in strategic locations around the world (Amsterdam, San Francisco and soon Singapore), which make the wikis faster for users who are not near the primary data center (in Virginia) but are near a caching location. However, this only benefits anonymous users. For logged-in users, every page view contains their user name and other user-specific information in the personal tools area, so logged-in page views are considered uncacheable and are always routed to the primary data center.

This means that if a new user browses the site for a while, then creates an account because they want to contribute (or makes an anonymous edit), the wiki suddenly becomes slower for them. All users are affected, because uncached requests are slower to serve than cached ones, but users outside North/South America are affected the most, because their traffic now has to cross an ocean that it didn't have to cross before. It's not nice that a new user's 'reward' for creating an account is a slower experience, but it's especially not nice that users in emerging communities are affected the most. If we want to encourage readers to become contributors, slowing the site down as soon as someone contributes is not very helpful.

Some requests will always have to go to the primary data center, such as POST requests saving an edit, and those are always going to be slower for users outside North America. But for logged-in page views this isn't fundamentally necessary, and serving them from the edge caches would speed up the site for logged-in users and reduce the load on the app servers. There are different ways that this could be done, each with their own obstacles. For example, a single-page application for MediaWiki could use a content service to retrieve only the new page's contents when navigating, but this would require modifying or rewriting a lot of code in MediaWiki; ESI could be used to have Varnish inject cached page contents into a user-specific chrome, but that would require using advanced and partly unproven Varnish features. In both cases, we'd have to reimplement certain rendering preferences using CSS or a post-processing step. It's far from trivial, but let's start talking seriously about how we can address this problem.

Daniel Kinzler Architecture, Collaboration, Hosting, JavaScript, Mobile, Strategy Evolving the MediaWiki Architecture Research, Analytics, and Machine Learning

I would like to discuss how assumptions drive our day to day work, and how to may sure we properly understand and regularly challenge these assumptions. I'm particularly interested in how technological assumptions shape product decisions, and how product assumptions shape technological decisions. Three major axioms come to mind:

MediaWiki needs to run in a shared hosting environment. This has been an explicit requirement for a long time now, but the baseline product that actually does run in such an environment (LAMP with no root access) is becoming more and more sub-par. We are already struggling to provide a decent mobile browsing experience there, not to mention search or WYSIWYG editing. So we should have a discussion about for how long we want to kep this requirement, what the consequences would be of dropping it, and what alternative platform we should target for the baseline installation of MediaWiki.

Editing has to work with old browsers and without JavaScript. It has long been an explicit requirement that no basic functionality, particularly editing, can require JavaScript to be enabled. However, this causes us to fall behind other sites further and further. With more and more sites requiring JS, it's becoming less and less clear to me that this requirement is still sensible. This is especially true in the light of many developing countries skipping straight from mostly-offline to mobile-only.

The primary medium for knowledge sharing is text. This assumption used to be hard-coded into MediaWiki until the introduction of ContentHandler, and it still seems to be hard coded in the minds of many long term contributors, to the software and to the wikis. I believe that it is high time to invest into exploring other media formats and alternative forms of collaboration. It seems to me like "Beyond Wikitext" is the major technological challenge that has come out of the movement strategy process, and that we should start thinking and talking about it - from the technological side as well as the product side.

Niharika Kohli Communities, Cross-wiki, Gadgets, Lua, New Users, Templates, Tools Evolving the MediaWiki Architecture Next Steps for Languages and Cross Project Collaboration

Investing in our communities

This position statement captures my thoughts about why and how we should be investing in our communities. There are a lot of ways we can encourage and support them, that we currently don't. Prioritizing to build tools for our communities is a crucial step for long term survival of our projects.

It's fairly common knowledge how a lot of our communities suffer from toxicity. It's incredibly hard for newcomers to edit, to stick around and stay engaged in the midst of the existing toxicity in the community. The problem frequently also exists in smaller communities. Just recently, the English wikipedia community has pushed WMF into implementing ACTRIAL and preventing brand new users from being able to create articles on the site. These are signs that all is not well with our communities. If we envision a future with an active, thriving editor community 15 years from now, we've to become more aware of how our communities function and do more to support them than what we do today. The problems also exists on the technical side. Communities without technical resources lose out on gadgets, templates, editing toolbar gadgets and so on. The editors on these wikis are still forced to do a lot of things the hard way. Non-wikipedia projects are probably the worst affected. Quite often our software projects also cater to the bigger projects. Often just wikipedias. I am sure we can't solve everything but I'm sure we can try to help solve at least some of the problems. We can invest in better tools for new users to create articles, to edit and experiment with wikitext markup. We can build a better "on boarding" experience for new users. For example, English wikipedia currently has "Article Creation Wizard"(https://en.wikipedia.org/wiki/Wikipedia:Article_wizard) which is outdated, poorly maintained and very confusing a lot of times. We can think about a more standardized solution which would be useful across wikis. We can also try to showcase user contributions in a better way, to build user engagement. Various wikis have been striving to create and sustain "wikiprojects" since a while with the result that several big wikipedias have come up with their own homegrown solutions for it. These are things the Foundation can help with building and standardize it for all wikis. For the technical problems, there is a big backlog of projects which are long overdue. Global cross-wiki watchlists, Global gadgets, templates, lua modules have been asked for by the community since many many years now. There are a lot more such projects to be found on Phabricator and the wishlist survey. These are projects which can be building blocks in making our communities more sustainable and thriving places. They are big and important enough projects that should make it into the product roadmap of teams outside of Community Tech. Another important thing we should think about is tools. Some tools such as pageviews analysis is one of the most important volunteer-maintained tools out there. What happens when it stops being maintained? When is a tool important enough for the Foundation to start thinking about incorporating that functionality in an extension/core? These are all important discussions to be had.

Giuseppe Lavagetto Analytics, Data Center, Infrastructure, Job Queue, Multi-Datacenter, Performance Evolving the MediaWiki Architecture

The future of the MediaWiki infrastructure at the wikimedia foundation.

MediaWiki is at the core of the infrastructure that serves all of the Wikimedia projects, and the current setup of MediaWiki in production poses various challenges: from the future of our current runtime (HHVM), to the to ability to serve MediaWiki from multiple DataCenters, to long standing issues as resource usage efficiency and flexibility.

Here are some of the things we have to tackle in the future.

Transition off of HHVM:

Since the HHVM team has made it clear they're parting ways with full PHP compatibility, and that maintaining support for both HHVM and PHP in MediaWiki would be arduous, we need to make plans to move off of HHVM, back to PHP 7.x. This transition, while technically necessary, should not come at a cost for our users: page load times should not degrade. We can proceed by marking responses coming from either engine, collecting metrics and analyzing data. In order to achieve this, we should run the two runtimes in parallel on the same servers (which have plenty of capacity, given no MediaWiki cluster has an utilization over 40%), and we will then be able to programmatically route individual users or a percentage of traffic, or even specific wikis, to one or the other. The deadline for this transition is set for the end of 2018 (EOL of the last compatible version of HHVM), and planning and resources should be allocated to this goal.

Multi-Datacenter support:

We currently are using our datacenters in a active/passive setting, as far as MediaWiki is concerned. While this is ok in line of principle, this is a huge waste of resources and means we both have 50% of our servers doing nothing at all, and also limits our ability to expand the number of core datacenters we can use. Diverting the read load to secondary datacenters could also allow us to use caching in a less aggressive way when not needed. There is already a program underway to add first-class multi-DC support to MediaWiki, so we can focus on what specifically needs to be done in order to achieve this longtime goal: our final goal should be to serve reader's traffic from all datacenters, and to be able to switch the "master" datacenter in matter of minutes.

Elasticity, resource usage efficiency

At the moment , our infrastructure is plainly inadequate to react to sudden spikes of non-wiki content production and to changes that generate a lot of asynchrounous jobs, as a change of a popular template. The issues with the current jobqueue are widely known and publicized, but even the current transition to a new model won't solve the starvation of resources that result in a degraded user experience. Moreover, a single editor uploading videos via video2commons can easily overflow our media processing capacity for weeks. This happens because we allocate our resource statically (we have 4 vidoesclaers per datcenter, for example), we have an inefficient resource consumption, and reallocating servers requires time and effort. Modern applications stacks are elastic, meaning the operation of scaling up or down the capacity of a single cluster or functionality can be handled programmatically and/or manually whenever the need occurs, allowing the infrastructure to react to such changes. For economic and privacy/security reasons, Wikimedia doesn't make use of external cloud services, so the only way to achieve such flexibility is to build a serviceable infrastructure that can serve MediaWiki and any other project Wikimedia will support: the effort to do that is underway with the rollout of our Kubernetes-based IaaS in production. I think we should work, sooner than later, at moving the MediaWiki application stack (and maybe its semi-ephemeral caching) to the kubernetes platform. While the advantages of such an approach seem clear, it won't come without costs: specifically habits around code deployment, testing and configuration changes will need to be completely revisited and superseded by new approaches.

Marko Obrovac Architecture, Microservices, Refactoring, Technical Debt Evolving the MediaWiki Architecture

All of the Wikimedia projects have, in technical terms, MediaWiki - the software - at its core. Thanks to this fact, MediaWiki has become a widely-deployed system, drawing many volunteer developers. Alas, there is a disparity in scale between the WMF-run install and other, external set-ups, which hinders the speed with which the platform supporting the Wikimedia projects. On the other hand, architecting microservices has proven as a good way of achieving scalability, increasing developer productivity, improving maintenance and reducing technical debt. Gradually moving towards 'de-monolithising' our core infrastructure will enable developers (both WMF staff as well as volunteers) to start working on all sorts of interesting features, ranging from simple add-ons to full-blown companion sub-systems. While this transition is (arguably) already happening, everything still gravitates around MediaWiki - the software. Instead of focusing our efforts on compatibility in scale (e.g. one JobQueue system for WMF, another for external installs), we should focus on the products and features that allow the projects to grow, both in terms of number of projects and features they offer, as well as in the number (and diversity) of their users. Microservices can greatly help in achieving this goal, since all installs can select the components they want to run based on the available resources at their disposal and their potential reach or scale. Much like the advent of extensions enabled various parties to complete their systems with sought functionality, microservices can refocus our technical community to think about features and components without worrying about scaling them (up and/or down). If we want our developers to assist the Wikimedia projects and their communities, we need to bring our core infrastructure to the 21st century. Let's not leave the technology behind - it is central to the success of the communities we are trying to enable.

Brion Vibber API, Gadgets, JavaScript, Mobile, Security, Templates, Tools Evolving the MediaWiki Architecture

Infrastructure for Open: Safe code sharing in the Wikiverse

Wikipedia has always been a place where people build things, starting with MediaWiki itself... Talk pages were created out of formatting conventions manually followed. Templates and Lua modules grew out of users' need to automate common markup & text blocks. Gadgets came about to let users add new capabilities to their experience.

To scale our users' ability to work, we need to build modern infrastructure and APIs for on-wiki code: templates, gadgets, and custom workflows.

First, gadgets and templates need to be maintainable and sharable in a centralized place; copy-pasting doesn't scale. Integrate with "real developer" tools like git, so complex tools can be edited and archived off-wiki.

Second, they need to be safer and more future-proof. Template & module output is in wikitext, a fragile format; consider separating sanitized "true" templates from the data sources.

JS gadgets can access internal or deprecated APIs that may break ... or hijack a session as malware! We should create narrower APIs and run the gadgets in isolated JS contexts to provide fault isolation -- this would also enable using them in different contexts such as mobile apps, by implementing the same interfaces.

Third, we need to make content "smarter" by giving it the ability to run interactive scripts safely -- a mix of what templates/modules and what gadgets can do. This can be used to make animated widgets for article pages, but more importantly could be used to implement discussion & editing workflows to supplement what you can do with just a talk page and a set of conventions. At a minimum, think of what people do with Google Forms to guide input, and let folks do that on-wiki.

On-wiki tool-building is a "force multiplier" that lets people get more done by organizing themselves. Providing better tools for tool builders will lead to happier, more productive users working for our mission.

Madhumitha Viswanathan API, Complexity, Documentation, Knowledge as a Service, Tools Evolving the MediaWiki Architecture Knowledge as a Service

We are in the business of democratizing knowledge, and I believe that lowering and removing technical barriers to entry, and creating a culture of inclusion in our technical spaces is essential to our success.'

The Knowledge as a Service aspect of our strategic directions focuses on building infrastructure and platforms that help create and share open knowledge. The key to successfully building and scaling such infrastructure, in the context of the Wikimedia Technical spaces, is enabling everyone, irrespective of their experience or backgrounds to be able to utilize and create research, data, and tools on top of our infrastructure. When designing infrastructure and other technical products, we often fail to take into account technical barriers, inessential complexities and social costs that can discourage or prevent people from being able to leverage them. For instance, is it enough to build a dataset and store it in a database, if we do not provide friendly ways for researchers to access and analyze this data? Is it enough to put out a call to contribute to a project, but not provide easy-to-setup development environments to be able to test changes? Is it sufficient to have a state of the art environment to host applications, but not design good, simple processes around gaining access and deploying to them?

These conversations are crucial, because we are not building products for technology's sake, but are in fact trying to build a culture where it is easy to use and contribute to our technical projects, whether you are a volunteer who has a few hours to spare or a paid employee; a newcomer or a long time contributor. We also want our technical communities to be diverse, and these complex systems and processes, and unsaid social constructs around how to interact with our projects, often bias against traditionally underrepresented populations in technology.

I have always worked on or pushed for creating and supporting simple graphical interfaces that provide unified access to data sources, building platforms and processes that lets people just create tools/APIs/dashboards and be able to painlessly host them, developing tutorials and good documentation for getting involved in our projects, and codifying friendly and inclusive social norms and promoting a culture of being excellent to each other in our technical spaces.

When talking about the future directions of new and existing projects, we should take into account the costs and barriers to access, and who we may be failing to include as a result. I hope to be this voice in the Developer Summit.


Author Tags Primary Session Secondary Sessions Position Statement
Cindy Cicalese Innovation, Installation, Open Source, Third Parties Supporting Third-Party Use of MediaWiki Evolving the MediaWiki Architecture, Growing the MediaWiki Technical Community

How and with whom should we partner to create the technologies needed to support the mission?

A substantial, growing community of MediaWiki users and developers outside the Wikimedia movement has evolved, creating wikis that vary in size, number of editors, number of readers, access restrictions, and activity. The Wikimedia movement benefits from this third party MediaWiki developer and user community's technology contributions and innovation. Similarly, this community benefits from the Wikimedia movement's stewardship of MediaWiki as the foundational technology in support of Wikipedia and its sister projects. There are many areas in which the needs of these two groups are identical, including stable, well-performing software that supports community authoring. Partnering with the third party MediaWiki community will result in a platform that is better for all parties.

How should MediaWiki evolve to support the mission?

There is much knowledge in the world that cannot find its place within Wikipedia or its sister projects. MediaWiki is powerful software crafted especially to support the expression of all knowledge. Third party MediaWiki wikis can provide a home for knowledge that does not belong in Wikipedia, supporting the mission of sharing in the sum of all knowledge. In order for the third party MediaWiki community to continue to thrive and to grow, several impediments to MediaWiki adoption that especially affect that community must be addressed:

- Installing and maintaining all but the smallest and most basic MediaWiki installation currently requires a high level of craftsmanship and expertise.

- While a large number of novel MediaWiki extensions exist to support third party applications, it is difficult to ascertain the level of maturity and support of these extensions.

- Some enterprise consumers require a guaranteed level of support and/or service level agreements before adopting a technology.

- The barrier to entry for those wishing to experiment with MediaWiki in production quality environments is high.

How do we maintain and grow the technical community and ready it for the mission ahead?

The third party MediaWiki community already significantly contributes to the code base. In the last two years, 22% of the commits to MediaWiki core were made by third party contributors, and 62% of the authors of commits to MediaWiki core were third party contributors. Even more striking, in the last two years, 40% of the commits to MediaWiki extensions hosted on gerrit were made by third party contributors, and 67% of the authors of the commits were third party contributors. The third party MediaWiki community is a significant training ground for skilled MediaWiki developers used to tackling a diverse set of challenging problems who sit poised to help forge the path ahead for MediaWiki.

Derk-Jan Hartman Complexity, Gadgets, Strategy, Tools, User Experience Advancing the Contributor Experience Evolving the MediaWiki Architecture

Growing and complexity

Our strategy is pointing us towards a bold and inclusive world in terms of projects and people. Almost by definition this will lead to increased complexity, not simply of our technology, but also of how to deliver to and to enable people to make use of our technology.

In the last few years we have spent energy in creating more api's and a more service oriented architecture. An area where we however have not made such major changes is how we design for and work with the front end of the software, which is where the majority of people are actually using all the other stuff we make.

Here we continue to think in larger products and problems to solve, and quite often tend to fail and even clash with our own 'customers'. By taking on a more diverse strategy, we risk being even more vulnerable to this. I have two suggestions:

Smaller engineering. Allowing more time for smaller projects, smaller bugs, smaller tests of ideas and refinement of existing software. Let's embrace the success of Community Wishlists and be closer to our communities by writing more Gadgets or tools (toolforge) when we can, instead of going for 'the big fix'. Have three 1 week tests instead of one 6 month beta. etc. Fix small bugs that annoy many and that make our website feel amateurish, and improve the experience for everyone. Working more often on the needs of smaller projects, giving them a bigger voice and sprucing up our own solutions by gaining a more diverse experience. Be closer to our communities by working nearer to them.

The second point that we should work on, is to stop thinking of our platform as a website. It is a work environment for an increasingly diverse crowd. We have a limited amount of space on the screen and a huge amount of tasks that various people want to do. Gadgets and even more so userscripts are hugely helpful, but have long since become unmanageable.

It is time to think beyond the simple APIs and widget kits. We need to take a step towards becoming an application environment. We need users to be able to install and use complete apps made from recognizable and reusable building blocks. I want to see and use Gadgets as my browser uses extensions. I want those extensions to put apps in recognizable and consistent spots, to allow for fullscreen or splitscreen views, to have a familiar UI, but without having to cram everything into the limited shared space that we have. Apps as gateways for diversifying the specific solutions we build.

Birgit Müller Code Review, Collaboration, Communities, Documentation, Gadgets, Open Source, Refactoring, Tools Growing the MediaWiki Technical Community Evolving the MediaWiki Architecture

Refactoring the Open: First steps to get ready for the next level

Wikimedia's technical environment has grown into a very complex system throughout the past 15 years. Measured in internet years, parts of the software are ancient. When implementing a new feature, refactoring of a piece of the (extended) MediaWiki software is often required first. Following this principle of a.) refactoring and b.) implementation of something new, I suggest to start the discussion of the future technology direction by reflecting (and possibly: refactoring) the current Open Source practices and processes within the Wikimedia context.

A mono perspective won't let us survive (and is less fun, too)

When we talk about "Open Source" within Wikimedia we're not only talking about free licenses and open code repositories. We're talking about global collaboration and the technical contributions of many: Through this, we ensure that the Wikimedia projects stay alive and evolve, that we constantly develop new ideas, that multiple and diverse perspectives shape the development of our infrastructure and tools.

We are great in having ideas, and we are good in trying things out. But we still partly fail at prioritising the problems we know we have and address them accordingly.

I believe that we should

better maintain the Technical Community and find ways to grow by

  • allocating stable code review resources from paid staff for volunteer and 3rd party developers
  • improving the documentation of the code base
  • providing a single entry point that is easy accessible for interested developers
  • building up partnerships with Open Source communities we might share interests in the future with (for example, communities around audio, video or translation technologies)

constantly take diverse perspectives into account by

  • finding better ways to gather and address feedback from smaller language communities and non-Wikipedia sites
  • being less Wikipedia-centric when it comes to research: Not yet existing or emerging communities might not be interested in creating articles, but in contributing data or multimedia content or in building tools to reuse data and multimedia content

build more bridges across local wikis and increase knowledge of local requirements by

  • fostering cross-wiki exchange (example activity: template Hackathon)
  • increasing the knowledge of the requirements that come along with different languages (example activity: multilingual support conference)

Open Source doesn't mean anything is possible - does it?

We have established processes and regulations for contributions to MediaWiki itself. But we lack processes and practices for local developments to ensure both, the freedom and space to experiment for the Technical Community and the stability and reliability of tools for users.

I believe that we should e.g.

  • raise priority for implementing a code review process for JS/CSS pages on Wikimedia sites
  • start thinking about a technical sysop user right
  • make it clear which user scripts/gadgets/tools are maintained, which are stable and which are proofs of concept or prototypes (for example: provide a (central) 'store' of maintained gadgets/tools with different levels: "stable version", "experimental version" ...)

Let's start refactoring.

Timo Tijhof Infrastructure, Open Source, User Experience Embracing Open Source Software Evolving the MediaWiki Architecture
  • Embrace open-source and keep our software to the same standards we hold other open-source software. This would prevent our software from becoming isolated, hard to maintain, or hard to contribute to.
  • Scale the contributor experience. Ensure our content remains of high quality and value to readers; ultimately to avoid failing our mission. I envision this requires a radical shift in how our application is served, by involving a non-static service capable of scaling to the traffic of our CDN and yet vary responses by user.


We must understand the dangers of producing software that isn't reusable. Such software may be hard to maintain, hard to contribute, both for future contributions, and our future selves.

"Current needs" only exist to serve our long-term needs. Losing track of long-term needs can make software too specific to a current need, risking a trend of releasing software that is only open-source as a courtesy, for transparency, and without being re-usable.

Reusable software has a defined purpose and serves it well. It tends be easy to install, well-documented, and easy to contribute to. Re-use between different services, as well as externally. Such as for community tools, cloud services, or other third parties. Having a defined goal also encourages designing APIs in a way that we can agree not to break or change too often, because they are public.


Our current infrastructure is highly optimised for the passive reader that doesn't contribute. We serve a static CDN response to most users. For users having logged-in, or made contributions, we bypass these layers for all page loads. As a result, their document load time increases by 5x-10x (eg. NavTiming metric responseStart).

In 2015, Ori mentioned the danger of this in (<https://blog.wikimedia.org/2014/12/29/how-we-made-editing-wikipedia-twice-as-fast/>), saying optimising our backend will "allow us to dissolve the invisible distinction between passive and active users". And "enable microcontribution [features] that draw [in] passive readers".

Banners (CentralNotice) are a good example of our needs being at odds with our infrastructure. We want banners to show as part of the page, and for banners to vary by user, location, plus random variance. Our current infrastructure could only do so by bypassing the CDN on all requests. As such, the current way is entirely client-side and completes well after page load.

In few cases where we do ask readers for data, it is for statistical purposes or to improve the software. Direct (or indirect) contributions to our content remains limited to complex actions like "edit". Moving our contributors experience to match some of the capabilities and performance of the reader experience, would enable us to start accepting micro contributions that actually produce a change in content (either directly, or e.g. by consensus). It also opens the door to making our web platform work offline (e.g. ServiceWorkers) which further enables high-performant interactions that can be uploaded at a later time.