Architecture

From devsummit

7 statements.

Author Tags Primary Session Secondary Sessions Position Statement
Eric Evans Architecture, Infrastructure, Strategy Evolving the MediaWiki Architecture

More than just servers

A modern approach to tooling and infrastructure is needed, not only so that we may scale future content and users, but the deployment of new technologies and services as well.

Our way of approaching infrastructure is in need of modernization; What we do we do well, but if we are to grow our capacity, not just for storing and serving content, but our capacity to create the technologies that empower movement strategies, we need a departure from the idea of infrastructre as merely clusters of servers. We need high-level, easily consumed platforms for computation, storage, deployment, and management. We need environments that make experiments cheap, and allow teams to fail-fast or iterate on the next stage quickly. We need systems that are distributed by nature, self-service, secure, and that are able to provide insight into availability and performance.

Some efforts have been (or are being) made here. Examples include recent work toward a Kubernetes deployment, a change-propagation service, and RESTBase. These efforts, while worthwhile, are piecemeal, and no holistic strategy exists. It is my belief that as we discuss the future of our platform, we should consider the requirements in the context of the bigger picture, and discuss a strategy aimed at modernizing our infrastructure.

Roan Kattouw Architecture, Contributors, Data Center Evolving the MediaWiki Architecture Advancing the Contributor Experience

Users should not be punished for logging in

WMF wikis are slower for logged-in users than for anonymous users, which is unhelpful for trying to get users to contribute. This is a long standing problem that's hard to solve, but we should have a vision for how we're going to solve it.

WMF has caching data centers in strategic locations around the world (Amsterdam, San Francisco and soon Singapore), which make the wikis faster for users who are not near the primary data center (in Virginia) but are near a caching location. However, this only benefits anonymous users. For logged-in users, every page view contains their user name and other user-specific information in the personal tools area, so logged-in page views are considered uncacheable and are always routed to the primary data center.

This means that if a new user browses the site for a while, then creates an account because they want to contribute (or makes an anonymous edit), the wiki suddenly becomes slower for them. All users are affected, because uncached requests are slower to serve than cached ones, but users outside North/South America are affected the most, because their traffic now has to cross an ocean that it didn't have to cross before. It's not nice that a new user's 'reward' for creating an account is a slower experience, but it's especially not nice that users in emerging communities are affected the most. If we want to encourage readers to become contributors, slowing the site down as soon as someone contributes is not very helpful.

Some requests will always have to go to the primary data center, such as POST requests saving an edit, and those are always going to be slower for users outside North America. But for logged-in page views this isn't fundamentally necessary, and serving them from the edge caches would speed up the site for logged-in users and reduce the load on the app servers. There are different ways that this could be done, each with their own obstacles. For example, a single-page application for MediaWiki could use a content service to retrieve only the new page's contents when navigating, but this would require modifying or rewriting a lot of code in MediaWiki; ESI could be used to have Varnish inject cached page contents into a user-specific chrome, but that would require using advanced and partly unproven Varnish features. In both cases, we'd have to reimplement certain rendering preferences using CSS or a post-processing step. It's far from trivial, but let's start talking seriously about how we can address this problem.

Daniel Kinzler Architecture, Collaboration, Hosting, JavaScript, Mobile, Strategy Evolving the MediaWiki Architecture Research, Analytics, and Machine Learning

I would like to discuss how assumptions drive our day to day work, and how to may sure we properly understand and regularly challenge these assumptions. I'm particularly interested in how technological assumptions shape product decisions, and how product assumptions shape technological decisions. Three major axioms come to mind:

MediaWiki needs to run in a shared hosting environment. This has been an explicit requirement for a long time now, but the baseline product that actually does run in such an environment (LAMP with no root access) is becoming more and more sub-par. We are already struggling to provide a decent mobile browsing experience there, not to mention search or WYSIWYG editing. So we should have a discussion about for how long we want to kep this requirement, what the consequences would be of dropping it, and what alternative platform we should target for the baseline installation of MediaWiki.

Editing has to work with old browsers and without JavaScript. It has long been an explicit requirement that no basic functionality, particularly editing, can require JavaScript to be enabled. However, this causes us to fall behind other sites further and further. With more and more sites requiring JS, it's becoming less and less clear to me that this requirement is still sensible. This is especially true in the light of many developing countries skipping straight from mostly-offline to mobile-only.

The primary medium for knowledge sharing is text. This assumption used to be hard-coded into MediaWiki until the introduction of ContentHandler, and it still seems to be hard coded in the minds of many long term contributors, to the software and to the wikis. I believe that it is high time to invest into exploring other media formats and alternative forms of collaboration. It seems to me like "Beyond Wikitext" is the major technological challenge that has come out of the movement strategy process, and that we should start thinking and talking about it - from the technological side as well as the product side.

Katherine Maher Alternative Interfaces, Architecture, Knowledge as a Service, Strategy, User Experience Knowledge as a Service Advancing the Contributor Experience

This proposal focuses on the "Knowledge as a service" part of the strategic direction.

When I look at the core of what we do, to some extent I see a model that we've mastered, and that we're making incremental improvements to. My concern is that, while that model is incredible and powerful as a community, the model for the interface and the delivery mechanism for the product the community creates are changing, and for us to continue what we're doing today may or may not prepare us for what the future actually looks like. I think it also limits our ability to unlock all of the tremendous knowledge, unstructured and structured, that exists within our projects. And I also believe that it limits us to certain forms of knowledge and a certain hierarchy of creation in a way that is very inward-looking.

Right now much of our information is sitting, unstructured, in a SQL database, rendered through PHP, read through a rendering engine into a browser to read/write in one interface: the browser. While this is amazing for the world of the browser, we're not going to be a browser-based information world for that much longer, any more than anything else. It's not that the browser is going to go away, the browser will be like books: books haven't gone away, radio hasn't gone away, but there will be a transformation to a new interface, and we need to be ready for it. Perhaps we should actually backfill into those older interfaces that we're not currently part of, because people still use those interfaces, and those interfaces are valuable.

Essentially this is about taking the Model-View-Controller paradigm to the next level, and also about extending it to participation and to the "write" part of our read-write system. Even if Alexa is serving Wikimedia content outside the browser, there is no mechanism for contributing trough Alexa. We need to be planning for an architecture of information and architecture of experiences that is independent of the browser.

How do you get the most value out of the existing content? How do you serve a snippet to someone who just needs a quick answer? How do you serve different layers of sophistication to 8th-graders versus the college graduate, versus the PhD? Can we engage in the knowledge ecosystem and leverage what we have as a platform, and our traffic distribution and awareness, to actually open up greater resources of knowledge?

These are some of the topics I would like to see discussed at the Dev Summit.

Marko Obrovac Architecture, Microservices, Refactoring, Technical Debt Evolving the MediaWiki Architecture

All of the Wikimedia projects have, in technical terms, MediaWiki - the software - at its core. Thanks to this fact, MediaWiki has become a widely-deployed system, drawing many volunteer developers. Alas, there is a disparity in scale between the WMF-run install and other, external set-ups, which hinders the speed with which the platform supporting the Wikimedia projects. On the other hand, architecting microservices has proven as a good way of achieving scalability, increasing developer productivity, improving maintenance and reducing technical debt. Gradually moving towards 'de-monolithising' our core infrastructure will enable developers (both WMF staff as well as volunteers) to start working on all sorts of interesting features, ranging from simple add-ons to full-blown companion sub-systems. While this transition is (arguably) already happening, everything still gravitates around MediaWiki - the software. Instead of focusing our efforts on compatibility in scale (e.g. one JobQueue system for WMF, another for external installs), we should focus on the products and features that allow the projects to grow, both in terms of number of projects and features they offer, as well as in the number (and diversity) of their users. Microservices can greatly help in achieving this goal, since all installs can select the components they want to run based on the available resources at their disposal and their potential reach or scale. Much like the advent of extensions enabled various parties to complete their systems with sought functionality, microservices can refocus our technical community to think about features and components without worrying about scaling them (up and/or down). If we want our developers to assist the Wikimedia projects and their communities, we need to bring our core infrastructure to the 21st century. Let's not leave the technology behind - it is central to the success of the communities we are trying to enable.

Adam Shorland Architecture, Open Source, Wikibase Growing the MediaWiki Technical Community Embracing Open Source Software

I believe that the technicaly community should strive to collect and effectively disseminate technical knowledge as per the Wikimedia missions attement.

Ability to grow out technical community can be compared with ones own ability to gain knowledge in technical spaces within the Wikimedia movement. Currently there are many barriers to entry that have been surfaced year after year with some but little movement forward past them.

To scale and ready the community we should push forward and enable the use of emerging trends in technology, such as knowledge retaining Q&A platforms. There are many other organizations and softwares that do this much better than Wikimedia we should learn from them. Looking at Q&A platforms specifically, talk pages have never really been a good place to ask questions and retain knowledge in a searchable way for use in the future. Stackoverfolw, as an example, has proven to be an invaluable resource for people in technical spaces and we can learn from that. MediaWiki is an amazing piece of software, but we should not feel 'boxed in' by it. The Wikimedia foundation is not the MediaWiki foundation, MediaWiki does not have to always just be a wiki page.

Our commitment to Open Source is often something that slows down many actions within the movement, however this is not something that should change as it is integral to what Wikimedia stands for at the core. We should embrace our Open Source commitments and reach out to and engage with organizations using our software more. Wikimedia Germany does this outreach specifically with the Wikibase extension, looking for other users and engaging them to discover how they are using it, why, and how it can be better. The Wikibase extension also specifically the Wikibase Query Service shows us that not everything has to be a wiki page, as the query service disseminates knowledge under a free licence effectively.

I hope that the summit will agree that entry to our technical space, and increasing knowledge persistence within our technical space needs some thought and work, and that we should stay committed to Mediawiki as a software and platform, but that it can look, feel and act different while Wikimedia stays true to its mission.

Tim Starling Architecture, Complexity, Refactoring Growing the MediaWiki Technical Community

A key question for me is how we can maximise the richness of our feature offering despite having entered a period of slow growth in revenue and staff numbers. Wikimedia serves a very large number of users, with a diverse set of needs -- nobody can say that the site as it stands is sufficient to satisfy all of them.

There are two main threats to our goal of providing a rich feature set. One is maintenance burden. We are faced with the prospect of sunsetting features because we find the maintenance burden to be too great. But there is no incontrovertible rule in software engineering which says that code, once written, must constantly be rewritten. Maintenance burden most commonly arises from changes in the platform on which the code is implemented. Minimising maintenance burden for a given feature set thus necessitates choosing a stable platform. We need to consider the programming languages we use, and the libraries we require, through this lens.

The second threat is needless complexity. Concepts which are hard to understand, and which thus restrict related development to highly skilled developers, are appropriate only if hidden behind a module boundary. In order to enable contributions from developers less skilled than ourselves, and to minimise the time required for learning and familiarisation, the bulk of our code should be simple. Complexity is alluring because it provides developers the opportunity to take pride in their work. But for the benefit of the organisation as whole, its efficiency, and thus the richness of its product offering, we should introduce complexity only with due caution.

Code which is complex but stable can be valuable, presenting no great risks. For example, the diff algorithm we currently use in wikidiff2 has its origin in Perl code written in approximately 1998. Only in the last year have we considered adding substantial features to it. We have a PHP port and a C++ port, and neither requires significant maintenance. This is because the requirements are stable and the two respective platforms (C++ and PHP) are stable.

Contrast this to OCG, which is at risk of sunsetting only three years after its original deployment. The reason is that its input and output formats are constantly changing, that is, it has changing requirements; and it was written on a modern and rapidly changing platform. Its main developer wrote "the architecture which was state-of-the-art in 2014 is already looking a little dated in 2016".

My goals for the developer summit are to encourage people to think carefully about writing code on top of a conceptually complex, rapidly changing platform. I want WMF and the MediaWiki community to write code which is stable and long-lasting, and can thus support a richly featured website into the future.