Property:Statement
From devsummit
H
Free Software is Fundamental to Our Mission
Position
MediaWiki is a prominent free software project, and the Wikimedia projects have always run on free and open-source technology, but our relationship to free and open-source software needs clarification. We should formalize that we are committed to making, using, and leading in the development of free software, even when doing so is more difficult or less efficient in delivering user value than adopting closed solutions, as a central part of our educational mission.
Discussion
How does free software relate to the free knowledge movement?
In this movement we are building a body of open knowledge, curated collectively and accessible to all. We develop the software that powers these projects in the open, and we run our backing infrastructure on free and open-source technology.
We choose to do these things not because they are easy, but because they are hard. Existing free software is not always, or even most of the time, practically superior.[1] We work in the open so that others can contribute to and learn from our processes; our work product is educational content in its own right, and in that way directly contributes to our mission.[2] By ensuring that our tools and processes are open, and working through problems with free software projects rather than rejecting them in favor of closed solutions, we empower others everywhere to join us in doing this hard work, or to launch like-minded projects of their own.
It's often tempting to conclude that our users could be better served by adopting closed or proprietary software solutions to our engineering problems, rather than adapting free software to meet our needs or writing our own. This may be true in the short term, but over the long term this contributes to the cloistering of software engineering expertise in closed commercial enterprises.
Our goal is to expand and not restrict the knowledge of software engineering principles and practices, and we are playing a long game.
What could a formal commitment to free software mean in practical terms?
This is intended as an open-ended question for discussion, but here are a few ideas:
* We should take a leadership role in the development of free software languages and technologies on which we depend (e.g., PHP).
* Where we develop software for closed platforms (such as the mobile apps), we should promote free alternatives for their distribution channels (e.g., F-Droid [3]) and ensure they can be run without depending on proprietary software.
* We should encourage and recognize contributions by our engineers in the broader free software community.
[1] https://www.gnu.org/philosophy/when-free-software-isnt-practically-superior.html
[2] https://wikimediafoundation.org/wiki/Mission_statement
[3] https://f-droid.org/
WMF should focus on the technical issues it is uniquely positioned to handle, and let the volunteers have the fun stuff.
When we think about what technical work the WMF is engaging in, I don't believe enough time is spent considering volunteer motivation, and the great potential we are systematically choosing to ignore, or end up devaluing entirely due to the inherent unpredictability of volunteer work. I do believe that there is a long enough history of deeply understaffed WMF engineering teams getting set up to tackle fancy front-facing projects, only to have those teams simultaneously struggle to deliver, and deter everyone else from getting too near decision-making in their territory. It is time to change our approach.
I would like to talk about what it would take, to refocus the majority of WMF's technical work away from taking full ownership of all the 'important' new ideas, and toward making it as easy as possible for momentarily highly motivated outside parties to make meaningful contributions to new features. I imagine many new tools would be required to scale release engineering, security, and the technical community in general. We would have to take a greater role in mentoring interested parties. There are also known big hairy unsolved problems in the way we currently think of maintainership. Major changes would have to be made in our current approach to product timelines and product/project management.
Of course, there will always be things that do require a high level of predictability in the outcomes. Donor money can and should be spent on ensuring predictability around the things we absolutely cannot function without. However, there is a whole world of ideas that absolutely do not have to be accomplished on a strict 'shipping' timeline, and it seems that the WMF will always hold the keys to that door. I would like to figure out how the WMF could start embracing that unpredictability at every level, and move much more deliberately from 'bottleneck' to 'enabler'.
J
My purpose in attending the Dev Summit is to enjoy the benefit of collaborating in person with others who are passionate about technology that brings information to the world in a variety of languages.
When I imagine a world where everyone really can share in all knowledge, I don't imagine all of them doing so in their native language. The most important foundation for language technologies that will reach as many people as possible is informed realism-with insights from both linguistics and computer science.
* The most common estimate of the number of languages is 6,000. An unfortunate number are critically endangered, with only dozens of speakers; 50-90% of them will have no speakers by the end of the century.
Providing knowledge to *everyone* in their own language is unrealistic. We should always seek to support any community working to document, revive, or strengthen a language, but expecting to create and curate extensive knowledge repositories in a language with barely half a dozen octogenarian speakers whose grandchildren have no interest in the language is more fantasy than goal.
* Statistical machine translation has eclipsed rule-based machine translation for unpaid, casual internet use and building it doesn't require linguists or even speakers. But it does require data, in the form of large parallel corpora, which simply aren't available for most languages.
Even providing knowledge in translation is impractical for most of the world's languages.
* English speakers are notoriously monolingual, but in many places multilingualism is the norm, with people speaking a home language and a major world language.
A useful planning tool would be an assessment of the most commonly spoken languages among people whose preferred language does not have an extensive Wikipedia. Whether building on the model of Simple English or increasing the readability of the larger Wikipedias, we can bring more knowledge to more people though Hindi/Urdu, Indonesian, Mandarin, French, Arabic, Russian, Spanish, and Swahili-all of which boast on the order of 100 million non-native speakers or more-than by trying to create a thousand Wikipedias for less commonly spoken languages.
* English is particularly suited to simple computational processing-a fact often lost on English speakers; it uses few characters, has few inflections, and words are conveniently separated.
Navigating copious amounts of knowledge requires search. The simplest form of search just barely works for English, but often fails in Spanish (with dozens of verb forms), Finnish (with thousands of noun forms), Chinese (without spaces), and most other languages. Fortunately, for major world languages we have software that can overcome this by regularizing words for indexing and search.
Again, none of this is to say that we should ever stop or even slow our efforts where there is a passionate language community-or even one passionate individual-working to build knowledge repositories or language-enabling software. But we must be realistic about what it takes to reach the majority of people in a language they understand.
K
Languages in the world of Wikimedia
One of the central topics of Wikimedia's world is languages. Currently, we cover around 290 languages in most projects, more or less well covered.
In theory, all information in Wikipedia can be replicated and connected, so that different culture's knowledge is interlinked and accessible no matter which language you speak. In reality however, this can be tricky. The authors of [1] show, that even English Wikipedia's content is in big parts not represented in other languages, even in other big Wikipedias. And the other way around: The content in underserved languages is often not covered in English Wikipedia.
A possible solution is translation by the community as done with the content translation tool [2]. Nevertheless, that means translation of all language articles into all other languages, which is an effort that's never ending and especially for small language communities barely feasible. And it's not only all about Wikipedia- the other Wikimedia projects will need a similar effort!
Another approach for a better coverage of languages in Wikipedia is the ArticlePlaceholder [3]. Using Wikidata's inherently multi- and cross-lingual structure, AP displays data in a readable format on Wikipedias, in their language.
However, even Wikipedia has a lack of support for languages as we were able to show in [4]. The question is therefore, how can we get more multilingual data into Wikidata, using the tools and resources we already have, and eventually how to reuse Wikidata's data on Wikipedia and other Wikimedia projects in order to support under-resourced language communities and enable them to access information in their language easier. Accessible content in a language will eventually also mean they are encouraged to contribute to the knowledge.
Currently, we investigate machine learning tools in order to support the display of data and the gathering of new multilingual labels for information in Wikidata.
It can be assumed, that over the coming years, language accessibility will be one of the key topics for Wikimedia and its projects and it is therefore important to already invest in the topic and enable an exchange about it.
[1] Hecht, B., & Gergle, D. (2010, April). The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 291-300). ACM.
[2] https://en.wikipedia.org/wiki/Wikipedia:Content_translation_tool
[3] https://commons.wikimedia.org/wiki/File:Generating_Article_Placeholders_from_Wikidata_for_Wikipedia_-_Increasing_Access_to_Free_and_Open_Knowledge.pdf
[4] https://eprints.soton.ac.uk/413433/
How should MediaWiki evolve to support the mission?
One of the greatest barriers to the spread of human knowledge is the barrier of language. While Wikipedia does a great job of supporting hundreds of languages, the amount of content available in most language Wikipedias is still paltry and has a small impact on the knowledge available to speakers of those languages. For a huge percentage of the world's population, the key to unlocking knowledge isn't discovering Wikipedia, but learning new languages. Even for English speakers, the impact of learning a new language can be life-changing and open up many new opportunities.
The Wikimedia Foundation is the steward of one of the greatest repositories of information about language in human history, Wiktionary. Unlike all other dictionaries on Earth, Wiktionary aims to define (in 172 languages) all words from all languages. In other words, not just defining English words in English and French words in French, but also French words in English, English words in French, Latin words in Swahili, Mopan Maya words in Arabic, etc. It's ambitious aim is to be the ultimate Rosetta Stone for the human species.
While Wikipedia is in some respects maturing and gradually yielding diminishing returns for more investment, Wiktionary is still a small and growing project that has yet to fulfill its potential or break into mainstream consciousness the way that Wikipedia has. While one of the impediments to Wiktionary reaching its potential is lack of structured data support, which is being worked on, there are many improvements that could be made in the meantime to improve the usefulness of the site to both readers and editors. These include converting many of the fragile gadgets and site scripts into maintainable extensions, customizing the user interface to more closely match what users expect from a dictionary site, and adding dictionary-specific tools to the editing interface. There is also unexplored potential with building apps around the Wiktionary data, including apps tailored around language learning.
Now that the Wikimedia Foundation has nearly 100 software engineers (and dozens of volunteer developers), it should explore the potential of its lesser known projects, especially Wiktionary, which has the potential to actually make a large impact on the Foundation's mission and bring more of the sum of human knowledge to more people around the globe.
Users should not be punished for logging in
WMF wikis are slower for logged-in users than for anonymous users, which is unhelpful for trying to get users to contribute. This is a long standing problem that's hard to solve, but we should have a vision for how we're going to solve it.
WMF has caching data centers in strategic locations around the world (Amsterdam, San Francisco and soon Singapore), which make the wikis faster for users who are not near the primary data center (in Virginia) but are near a caching location. However, this only benefits anonymous users. For logged-in users, every page view contains their user name and other user-specific information in the personal tools area, so logged-in page views are considered uncacheable and are always routed to the primary data center.
This means that if a new user browses the site for a while, then creates an account because they want to contribute (or makes an anonymous edit), the wiki suddenly becomes slower for them. All users are affected, because uncached requests are slower to serve than cached ones, but users outside North/South America are affected the most, because their traffic now has to cross an ocean that it didn't have to cross before. It's not nice that a new user's 'reward' for creating an account is a slower experience, but it's especially not nice that users in emerging communities are affected the most. If we want to encourage readers to become contributors, slowing the site down as soon as someone contributes is not very helpful.
Some requests will always have to go to the primary data center, such as POST requests saving an edit, and those are always going to be slower for users outside North America. But for logged-in page views this isn't fundamentally necessary, and serving them from the edge caches would speed up the site for logged-in users and reduce the load on the app servers. There are different ways that this could be done, each with their own obstacles. For example, a single-page application for MediaWiki could use a content service to retrieve only the new page's contents when navigating, but this would require modifying or rewriting a lot of code in MediaWiki; ESI could be used to have Varnish inject cached page contents into a user-specific chrome, but that would require using advanced and partly unproven Varnish features. In both cases, we'd have to reimplement certain rendering preferences using CSS or a post-processing step. It's far from trivial, but let's start talking seriously about how we can address this problem.
I would like to discuss how assumptions drive our day to day work, and how to may sure we properly understand and regularly challenge these assumptions. I'm particularly interested in how technological assumptions shape product decisions, and how product assumptions shape technological decisions. Three major axioms come to mind:
''MediaWiki needs to run in a shared hosting environment.'' This has been an explicit requirement for a long time now, but the baseline product that actually does run in such an environment (LAMP with no root access) is becoming more and more sub-par. We are already struggling to provide a decent mobile browsing experience there, not to mention search or WYSIWYG editing. So we should have a discussion about for how long we want to kep this requirement, what the consequences would be of dropping it, and what alternative platform we should target for the baseline installation of MediaWiki.
''Editing has to work with old browsers and without JavaScript.'' It has long been an explicit requirement that no basic functionality, particularly editing, can require JavaScript to be enabled. However, this causes us to fall behind other sites further and further. With more and more sites requiring JS, it's becoming less and less clear to me that this requirement is still sensible. This is especially true in the light of many developing countries skipping straight from mostly-offline to mobile-only.
''The primary medium for knowledge sharing is text.'' This assumption used to be hard-coded into MediaWiki until the introduction of ContentHandler, and it still seems to be hard coded in the minds of many long term contributors, to the software and to the wikis. I believe that it is high time to invest into exploring other media formats and alternative forms of collaboration. It seems to me like "Beyond Wikitext" is the major technological challenge that has come out of the movement strategy process, and that we should start thinking and talking about it - from the technological side as well as the product side.
Our platforms should refocus on collaborating, drafting, and experimenting.
Currently much focus is on polished presentation + restriction, hindering experiments and limiting participation.
__Technical aspects__
* Editing tools focus on fast smooth drafting: multiple simultaneous editors, suggested changes. A terse, readable history highlights major revisions of an article. Discussion is integrated into the draft interface, and toggled on / off.
* Articles can be forked & merged, supporting all sorts of experimentation. Different groups can work on parallel forks, merging later if they like. Newcomers not following a policy can be channeled to an individual branch while sorting it out, avoiding edit wars. Sandboxing helps avoid "deletion": questionable or disputed contributions can be sandboxed to a hidden or low-visibility personal page. [This is also conducive to distributing an online/offline federation of editors, e.g. over IPFS]
* Editing, creation, and uploading are encouraged prominently in every page interface. Matchmaking tools help creators find others with similar interests, learn + collaborate. Tools for similarity checking, merging, metadata / license review, + meta-moderation, help anyone contribute and learn new ways to do so. Deleted material [unless oversighted] is reviewable by all who know where to look.
* The reading experience focuses on contextual connections + human connections. Real-time conversation is available as an overlay while reading. Data-rich interfaces help readers browse multiple versions of an article, and get a sense of persistence, reliability, + interest. For instance, heatmaps for revised / controversial / commented areas; wikiblame for granular provenance; different colors for different sorts of cites; visual cues about how much complementary or conflicting knowledge is available in other articles, files, languages or Projects.
__Cultural aspects (& related tools) __
* Namespaces include every potentially useful topic: completeness, notability, + copyright uncertainty affect how things are presented, not whether they exist. Similarly, media repositories include all useful material that is legal to host.
* File uploads are welcome as contributions to the global commons even when they need work. Files are transcoded to free formats where possible. File formats with no free-codec options, or that cannot be thoroughly checked for malware, are stored in their own flexible repository [such as the Internet Archive]: using the same Wikimedia upload interface + metadata, and providing similar wikilinks to reference files from within the Projects.
* The newcomer experience is simple, flexible, + protected. Contributions from people who "don't know how to do it right" are welcome, and kept separate from the flow of updates from regulars, with their own visibility defaults. Matchmaking tools help newcomers find active work in their area. Blocks, deletions, + warnings happen only for spam / vandalism. Other concerns at worst hide their work from public view, with a friendly review with a peer after the first weeks. A broad group of peers can protect newcomers, for instance by redirecting concerns and complaints about a newcomer to themselves.
==
It is time to move away from a "single latest revision viewable by all" model, and the conservative policies designed around it. We need a more flexible model embracing multiple working copies, long-lived drafts, and a greater freedom to experiment, collaborate, + create.
Investing in our communities
This position statement captures my thoughts about why and how we should be investing in our communities. There are a lot of ways we can encourage and support them, that we currently don't. Prioritizing to build tools for our communities is a crucial step for long term survival of our projects.
It's fairly common knowledge how a lot of our communities suffer from toxicity. It's incredibly hard for newcomers to edit, to stick around and stay engaged in the midst of the existing toxicity in the community. The problem frequently also exists in smaller communities. Just recently, the English wikipedia community has pushed WMF into implementing ACTRIAL and preventing brand new users from being able to create articles on the site. These are signs that all is not well with our communities. If we envision a future with an active, thriving editor community 15 years from now, we've to become more aware of how our communities function and do more to support them than what we do today.
The problems also exists on the technical side. Communities without technical resources lose out on gadgets, templates, editing toolbar gadgets and so on. The editors on these wikis are still forced to do a lot of things the hard way. Non-wikipedia projects are probably the worst affected. Quite often our software projects also cater to the bigger projects. Often just wikipedias.
I am sure we can't solve everything but I'm sure we can try to help solve at least some of the problems. We can invest in better tools for new users to create articles, to edit and experiment with wikitext markup. We can build a better "on boarding" experience for new users. For example, English wikipedia currently has "Article Creation Wizard"(https://en.wikipedia.org/wiki/Wikipedia:Article_wizard) which is outdated, poorly maintained and very confusing a lot of times. We can think about a more standardized solution which would be useful across wikis.
We can also try to showcase user contributions in a better way, to build user engagement. Various wikis have been striving to create and sustain "wikiprojects" since a while with the result that several big wikipedias have come up with their own homegrown solutions for it. These are things the Foundation can help with building and standardize it for all wikis.
For the technical problems, there is a big backlog of projects which are long overdue. Global cross-wiki watchlists, Global gadgets, templates, lua modules have been asked for by the community since many many years now. There are a lot more such projects to be found on Phabricator and the wishlist survey. These are projects which can be building blocks in making our communities more sustainable and thriving places. They are big and important enough projects that should make it into the product roadmap of teams outside of Community Tech.
Another important thing we should think about is tools. Some tools such as [https://tools.wmflabs.org/pageviews/ pageviews analysis] is one of the most important volunteer-maintained tools out there. What happens when it stops being maintained? When is a tool important enough for the Foundation to start thinking about incorporating that functionality in an extension/core? These are all important discussions to be had.
How do we maintain and grow the technical community and ready it for the mission ahead?
Maintaining and growing a technical community is difficult, particularly when the majority of that community is contributing their time and code on a volunteer basis. However, we can look at other successful projects for guidance, to see what we can learn and apply to our own movement:
Clearly articulating the value for participants. It's important that we articulate what participants will get (socially, professionally, personally) from contributing to our projects, and it's important to socialize that value through feedback loops, communication, and positive reinforcement. One of my favorite projects - the Smithsonian Transcription Project - hired a full-time community manager for their volunteer community. It was her role to pair participants with projects, follow up to ensure things were going well, and intervene if changes needed to be made.
Creating feedback loops that reinforce the value for participants. It's not enough to get people in the door; we must continually reinforce why participation is meaningful for both participants and the mission of free and open knowledge. People will have different reasons for participating - some want to build a skilset, others want to contribute to a meaningful project, still others are completely an assignment. The value for all of these participants differs, and the messaging/communication should reflect that.
Finding pathways to participants through non-technical means. GitHub does it particularly well. They want to reach students. So they have a space - https://education.github.com/ - aimed at teachers. This is a particularly smart strategy: How do we reach participants where they are, and think about conduits who might identify possible participants?
#100WikiCodeDays: The project #100WikiDays is successful because it creates a habit for participants, gives them ample feedback, provides them with community support, and gives them a goal. Are there similar efforts that we could think about re: code contributions?
Continually communicate the value: The best open source projects continually communicate to participants and the larger world about what's happening. Someone files their first bug report? Great, maybe they get an email saying 'Here's the next step you can take.' Someone creates a tool for Tool Lab? It's amazing? Send them to the blog for a profile. Let's elevate their work and use it to bring others in.
L
The future of the MediaWiki infrastructure at the wikimedia foundation.
MediaWiki is at the core of the infrastructure that serves all of the Wikimedia projects, and the current setup of MediaWiki in production poses various challenges: from the future of our current runtime (HHVM), to the to ability to serve MediaWiki from multiple DataCenters, to long standing issues as resource usage efficiency and flexibility.
Here are some of the things we have to tackle in the future.
Transition off of HHVM:
Since the HHVM team has made it clear they're parting ways with full PHP compatibility, and that maintaining support for both HHVM and PHP in MediaWiki would be arduous, we need to make plans to move off of HHVM, back to PHP 7.x. This transition, while technically necessary, should not come at a cost for our users: page load times should not degrade. We can proceed by marking responses coming from either engine, collecting metrics and analyzing data. In order to achieve this, we should run the two runtimes in parallel on the same servers (which have plenty of capacity, given no MediaWiki cluster has an utilization over 40%), and we will then be able to programmatically route individual users or a percentage of traffic, or even specific wikis, to one or the other. The deadline for this transition is set for the end of 2018 (EOL of the last compatible version of HHVM), and planning and resources should be allocated to this goal.
Multi-Datacenter support:
We currently are using our datacenters in a active/passive setting, as far as MediaWiki is concerned. While this is ok in line of principle, this is a huge waste of resources and means we both have 50% of our servers doing nothing at all, and also limits our ability to expand the number of core datacenters we can use. Diverting the read load to secondary datacenters could also allow us to use caching in a less aggressive way when not needed. There is already a program underway to add first-class multi-DC support to MediaWiki, so we can focus on what specifically needs to be done in order to achieve this longtime goal: our final goal should be to serve reader's traffic from all datacenters, and to be able to switch the "master" datacenter in matter of minutes.
Elasticity, resource usage efficiency
At the moment , our infrastructure is plainly inadequate to react to sudden spikes of non-wiki content production and to changes that generate a lot of asynchrounous jobs, as a change of a popular template. The issues with the current jobqueue are widely known and publicized, but even the current transition to a new model won't solve the starvation of resources that result in a degraded user experience. Moreover, a single editor uploading videos via video2commons can easily overflow our media processing capacity for weeks. This happens because we allocate our resource statically (we have 4 vidoesclaers per datcenter, for example), we have an inefficient resource consumption, and reallocating servers requires time and effort. Modern applications stacks are elastic, meaning the operation of scaling up or down the capacity of a single cluster or functionality can be handled programmatically and/or manually whenever the need occurs, allowing the infrastructure to react to such changes. For economic and privacy/security reasons, Wikimedia doesn't make use of external cloud services, so the only way to achieve such flexibility is to build a serviceable infrastructure that can serve MediaWiki and any other project Wikimedia will support: the effort to do that is underway with the rollout of our Kubernetes-based IaaS in production. I think we should work, sooner than later, at moving the MediaWiki application stack (and maybe its semi-ephemeral caching) to the kubernetes platform. While the advantages of such an approach seem clear, it won't come without costs: specifically habits around code deployment, testing and configuration changes will need to be completely revisited and superseded by new approaches.
Translation as a way to grow and connect our communities
The Wikimedia movement depends a lot on translation, but I believe we are not currently using the full potential of it. This affects us in many ways - most importantly:
- language barriers isolate communities - but we all need to work together,
- our content is not accessible to every human,
- our movement is massively multilingual, but not the forerunner in using translation and other language technology.
We should improve our translation tools and leverage machine translation in a sustainable way. Translation should be a core part of our infrastructure and integrate into our projects seamlessly. It will help our communities to grow, as demonstrated by the Content Translation tool. I suggest three focus areas.
#1 Find partners to build high quality open-source machine translation
Our projects run on free software. Currently, we depend a lot on proprietary data-driven (statistical) machine translation. For translation to be an essential part of our infrastructure, then this is neither sustainable nor acceptable. We already use expert-driven (rule-based) open-source machine translation software, e.g. Apertium, which provides some high quality language pairs. However, the proprietary services cover a lot more language pairs, albeit with lower quality. Building machine translation engines is hard work, therefore we should find partners to pursue both data-driven and expert-driven engines. The impact of this could be big and extend beyond our movement.
#2 Bring translation everywhere
We already have good translation tools, but we need to move beyond user interface and Wikipedia pages. We should integrate translation tools into our discussion systems to support multilingual discussions as well as to understand discussions in foreign languages. This should be combined with summarizing tools.
We have a lot of (structured) content that can be translated but doesn't have a proper tooling for translation, e.g. Wikidata and Commons image description, labels in SVG files. We should adapt and integrate our existing translation tools to support these types of content.
We should also make language selection available to all users, including those not logged-in in our multilingual projects, such as Wikidata, to show the translations.
#3 Improve our translation tools
Our translation tools have serious issues that result in slower translations or not being translated at all.
Our translation memory is not working well. It often fails to suggest good matches. This is apparent when translating the Weekly Tech News. Translators' time is wasted when they need to re-translate (introducing inconsistencies) or searching previous translation manually. Without improvement our translation memory is not suitable for use in Content Translation either.
When translating documentation pages, announcements, etc. using the Translate extension, a significant amount of extra markup is added to the wikitext. Editors find this markup inconvenient and justifiably resist using this tool. This feature should be improved so that it works with Visual Editor and doesn't require additional mark-up in the wikitext.
M
This proposal focuses on the "Knowledge as a service" part of the strategic direction.
When I look at the core of what we do, to some extent I see a model that we've mastered, and that we're making incremental improvements to. My concern is that, while that model is incredible and powerful as a community, the model for the interface and the delivery mechanism for the product the community creates are changing, and for us to continue what we're doing today may or may not prepare us for what the future actually looks like. I think it also limits our ability to unlock all of the tremendous knowledge, unstructured and structured, that exists within our projects. And I also believe that it limits us to certain forms of knowledge and a certain hierarchy of creation in a way that is very inward-looking.
Right now much of our information is sitting, unstructured, in a SQL database, rendered through PHP, read through a rendering engine into a browser to read/write in one interface: the browser. While this is amazing for the world of the browser, we're not going to be a browser-based information world for that much longer, any more than anything else. It's not that the browser is going to go away, the browser will be like books: books haven't gone away, radio hasn't gone away, but there will be a transformation to a new interface, and we need to be ready for it. Perhaps we should actually backfill into those older interfaces that we're not currently part of, because people still use those interfaces, and those interfaces are valuable.
Essentially this is about taking the Model-View-Controller paradigm to the next level, and also about extending it to participation and to the "write" part of our read-write system. Even if Alexa is serving Wikimedia content outside the browser, there is no mechanism for contributing trough Alexa. We need to be planning for an architecture of information and architecture of experiences that is independent of the browser.
How do you get the most value out of the existing content? How do you serve a snippet to someone who just needs a quick answer? How do you serve different layers of sophistication to 8th-graders versus the college graduate, versus the PhD? Can we engage in the knowledge ecosystem and leverage what we have as a platform, and our traffic distribution and awareness, to actually open up greater resources of knowledge?
These are some of the topics I would like to see discussed at the Dev Summit.
Dependency management for JavaScript packages
I believe we should be using a dependency management tool for JavaScript libraries in MediaWiki.
Currently there is no convenient way to manage JavaScript packages in MediaWiki, even though MediaWiki itself, as well as many extensions widely use them (both "own" JavaScript libraries and third-party libraries). Lack of such a solution in our infrastructure leads to numerous issues. Libraries are duplicated in our code bases, and there is a little control of what version of the package is used by different components. Updating dependencies becomes a complex and error-prone manual process. As each of our components include their dependencies separately, our users might be loading the same package multiple times.
Having a dependency management system for JavaScript packages would be beneficial in multiple areas. Developers would be able to easily control and maintain packages their software depends on. Deploying MediaWiki and extensions would become easier and more transparent with regard to JavaScript packages, both for WMF infrastructure and for non-WMF users of our software. With de-duplication of dependencies, users will be served smaller amount of bytes. Finally, once we have a convenient and standardized way of managing JavaScript packages, our software would be more interesting and welcoming for new developers coming from the dynamically growing JavaScript community.
We have addressed the similar problem for PHP libraries a while ago when we started to use Composer for PHP dependency management. We have been discussing solutions for JavaScript packages for a couple of years. Possible tools like npm, yarn, or even Composer, have been discussed, but we haven't come up with a plausible solution yet.
I hope the summit will be able to collect our requirements, re-evaluate the previous investigations we have made, collect new ideas, and will come up with a solution for JavaScript dependency management. I believe once we have it we will be walking into the future with confidence.
We have well established that volunteers are the lifeblood of the Wikimedia movement. We prioritize their contributions and work to ensure they are given the tools they need to succeed. But in the Wikimedia development community, we've neglected volunteers instead of nurturing them - and this is a serious problem that we need to rectify. There are a lot of areas where we can improve, but I'm going to focus on just one: improving the volunteer developer's code review experience.
While Wikimedia Foundation product teams are building new things, it's usually the volunteers who are keeping critical tools that the community depends upon alive (AbuseFilter, CheckUser, etc.). The MediaWiki codebase has gotten so massive that it's not practical to try and have the Wikimedia Foundation attempt to maintain all of it. It would not be a good use of movement funds either. Instead, I'm proposing that we utilize our volunteer base and ensure they are the valued and respected members of the Wikimedia development community. I think we can do it in three steps: first, set reasonable standards for code and the review process, second, prioritize code review of patches coming from volunteers, and finally empower volunteers to be maintainers and owners of code and create a sustainable community.
1. Set Reasonable Standards for Code and the Review Process
The status quo is that depending on who reviews your code, you will have a wildly different experience. Some will mandate that principles like dependency injection are followed or others will require 100% test coverage. And others might not care for any of that and just ensure the code does what it is supposed to before merging. But the people who face the worst of it are volunteers - WMF staff will have consistent reviewers through teammates who already communicated standards for merging code.
So we need reasonable standards for code we accept, and use those throughout the review process. As an example of 'reasonable', if someone is trying to fix a bug in legacy code that is difficult to test, it would be unreasonable to mandate a test case before merging the fix.
2. Prioritize code review of patches coming from volunteers
Our current process of reviewing volunteers' patches after finishing code review for teammates isn't working - we have a giant pile of unreviewed patches. When you start your day and look through your list of reviews, pick one or two patches from a volunteer and review them first. Most likely it'll take minimal time, but for previously-neglected volunteers, it will make a big difference.
3. Empower volunteers to be maintainers and owners of code
Some of our volunteers have been around for quite a while and are well trusted. Let's give them +2 rights! There's nothing that makes you feel better than getting an email from someone telling you that your contributions are valued and they'd like to nominate you for +2 access (exactly how I got hooked). And quite a few years I'm still around, so it must have worked.
MediaWiki has evolved away from easy installation. Yes, there is still the web-based installer, but it only gives you the most basic version of MediaWiki and the extensions that provide the best features are increasingly more difficult to install. I installed MediaWiki for the first time six years ago. Since then I've become and active developer and system admin, and despite that experience I still find things like RESTBase difficult to install. The barrier to entry for a newbie to set up a fully-functional MediaWiki (e.g. with all the bells and whistles like Wikipedia) is huge. This should not be the case. It should be easy for a newbie to set up a MediaWiki installation with Visual Editor, Cirrussearch, etc, without first gaining years of experience. +
Standing on the shoulders of giants
Mediawiki is built on the basis of many other open source tools, libraries,
packages and other software types. Our ability to write, run and use Mediawiki
depends on their availability, support of the upstream and maintainability.
As a few examples, debian, the OS WMF is running, PHP, or Elasticsearch, our
search back end.
In the light of recent discussions of migration from HHVM, to
zend php as our runtime, I would to raise the discussion point of what is our
position in the open source world of the underlying parts of our stack.
Wether we choose the be just a user of what upstream produces, or we want to
actively influence the decisions made while writing the software.
In order to be able to influence the decisions made while writing the software as
the known phrase says: "decisions are made by those who show up" we will need to
show up in those communities, but an active part in them and contribute, in the
exact same manner we hope third party mediawiki re-users will contribute,
discuss, send patches and show up.
If we are to choose this path, it has resources implication, Time, money and
dedication to involvement in other communities. For instance, having sponsoring a php
developer working on our needs upstream for instance might be a good investment
but might be a waste as whole.
I would like to have an open discussion about this approach, whether it is
desired, feasible, and worth the effort. I think it might affect where our tech
stack will be in the years to come and has a significant statement towards the
outside open source ecosystem.
Thank you +
Refactoring the Open: First steps to get ready for the next level
Wikimedia's technical environment has grown into a very complex system throughout the past 15 years. Measured in internet years, parts of the software are ancient. When implementing a new feature, refactoring of a piece of the (extended) MediaWiki software is often required first. Following this principle of a.) refactoring and b.) implementation of something new, I suggest to start the discussion of the future technology direction by reflecting (and possibly: refactoring) the current Open Source practices and processes within the Wikimedia context.
A mono perspective won't let us survive (and is less fun, too)
When we talk about "Open Source" within Wikimedia we're not only talking about free licenses and open code repositories. We're talking about global collaboration and the technical contributions of many: Through this, we ensure that the Wikimedia projects stay alive and evolve, that we constantly develop new ideas, that multiple and diverse perspectives shape the development of our infrastructure and tools.
We are great in having ideas, and we are good in trying things out. But we still partly fail at prioritising the problems we know we have and address them accordingly.
I believe that we should
better maintain the Technical Community and find ways to grow by
* allocating stable code review resources from paid staff for volunteer and 3rd party developers
* improving the documentation of the code base
* providing a single entry point that is easy accessible for interested developers
* building up partnerships with Open Source communities we might share interests in the future with (for example, communities around audio, video or translation technologies)
constantly take diverse perspectives into account by
* finding better ways to gather and address feedback from smaller language communities and non-Wikipedia sites
* being less Wikipedia-centric when it comes to research: Not yet existing or emerging communities might not be interested in creating articles, but in contributing data or multimedia content or in building tools to reuse data and multimedia content
build more bridges across local wikis and increase knowledge of local requirements by
* fostering cross-wiki exchange (example activity: template Hackathon)
* increasing the knowledge of the requirements that come along with different languages (example activity: multilingual support conference)
Open Source doesn't mean anything is possible - does it?
We have established processes and regulations for contributions to MediaWiki itself. But we lack processes and practices for local developments to ensure both, the freedom and space to experiment for the Technical Community and the stability and reliability of tools for users.
I believe that we should e.g.
* raise priority for implementing a code review process for JS/CSS pages on Wikimedia sites
* start thinking about a technical sysop user right
* make it clear which user scripts/gadgets/tools are maintained, which are stable and which are proofs of concept or prototypes (for example: provide a (central) 'store' of maintained gadgets/tools with different levels: "stable version", "experimental version" ...)
Let's start refactoring.
O
All of the Wikimedia projects have, in technical terms, MediaWiki - the software - at its core. Thanks to this fact, MediaWiki has become a widely-deployed system, drawing many volunteer developers. Alas, there is a disparity in scale between the WMF-run install and other, external set-ups, which hinders the speed with which the platform supporting the Wikimedia projects. On the other hand, architecting microservices has proven as a good way of achieving scalability, increasing developer productivity, improving maintenance and reducing technical debt. Gradually moving towards 'de-monolithising' our core infrastructure will enable developers (both WMF staff as well as volunteers) to start working on all sorts of interesting features, ranging from simple add-ons to full-blown companion sub-systems. While this transition is (arguably) already happening, everything still gravitates around MediaWiki - the software. Instead of focusing our efforts on compatibility in scale (e.g. one JobQueue system for WMF, another for external installs), we should focus on the products and features that allow the projects to grow, both in terms of number of projects and features they offer, as well as in the number (and diversity) of their users. Microservices can greatly help in achieving this goal, since all installs can select the components they want to run based on the available resources at their disposal and their potential reach or scale. Much like the advent of extensions enabled various parties to complete their systems with sought functionality, microservices can refocus our technical community to think about features and components without worrying about scaling them (up and/or down). If we want our developers to assist the Wikimedia projects and their communities, we need to bring our core infrastructure to the 21st century. Let's not leave the technology behind - it is central to the success of the communities we are trying to enable. +
P
The strategic direction that has emerged has two components: "Knowledge as a service" and "Knowledge equity". "Knowledge as a service", which focuses on infrastructure, seems like the one most related to technology, This proposal is about exploring the less obvious intricacies between the two components, and in particular the technology implications of Knowledge Equity.
As a complex socio-technical system, it's not really possible to separate people from technology when talking about Wikimedia. A direction of Knowledge Equity invites the contributors of the Wikimedia movement to take a critical look at themselves and assess their biases and privileges. This, in turn, can help identify structural biases that have been reproduced and ingrained in our technical platform.
For example, MediaWiki is currently doing a great job at providing a localized interface in many languages. However, beyond language, interaction design and UX patterns seem very specific to Western culture. Similarly, when our strategic direction talks about building strong and diverse communities, this invites us to consider whether the current tools available to contributors enable them to provide an environment where newcomers can experiment, be mentored, and fail safely.
Beyond software, little effort has been invested in exploring alternative interfaces beyond the connected browser. Our primary interface for contribution (the web site) may work well for middle-class contributors from Europe and North America, but isn't necessarily what enables people from other backgrounds or geographies from contributing.
These are some of the topics I would like to bring up for discussion at the Developer Summit. +