Main Page

Name	Tags	Statement
C. Scott Ananian	Machine Learning Machine Translation Translation User Experience Censorship Infrastructure Languages	'One World, One Wiki!' Instead of today's many siloed wikis, separated by language and project, our goal should be to re-establish a unified community of collaborators. We will still respect language and cultural differences - there will still be English, German, Hebrew, Arabic, etc. Wikipedias; they will disagree at times - but instead of separate domains, we'll embrace a single user experience with integrated navigation between projects and languages and the possibility of split screen views aligning related content. On a single page we can work on articles in different languages, or simultaneously edit textbook content and encyclopedia articles. Via machine translation we can facilitate conversations and collaborations spanning languages and projects, without forcing a single culture or perspective. Machine translation plays a key role in removing these barriers and enabling new content and collaborators. We should invest in our own engineers and infrastructure supporting machine translation, especially between minority languages and script variants. Our editing community will continually improve our training data and translation engines, both by explicitly authoring parallel texts (as with the Content Translation Tool) and by micro-contributions such as clicking yes/no on a proposed translation or pair of parallel texts ('bandit learning'). Using 'zero-shot translation' models, our training data from 'big' wikis can improve the translation of 'small' wikis. Every contribution further improves the ability of our tools to make additional articles from other languages available. A translation suggestion tool will suggest an edit in one language whenever an edit is made to a parallel text in another language. The correspondences can be manually created (for example, via the Content Translation Tool), but our translation engine can also automatically search for and score potential new correspondences, or prune old entries when the translation has drifted. Again, each new correspondence trains the engine and improves its ability to suggest further correspondences and edits. Red-links and stubs are replaced with article text from one of the user's preferred fallback languages, perhaps split-screened with a machine translation into the user's primary language. This will keep 'small' language wikis sticky, and prevent readers from getting into the habit of searching in a 'big' language first. We should build clusters specifically for training translation (and other) deep learning models. As a supplement to our relationships with statistical translation tools Moses and Apertium, we should partner with the OpenNMT project for modern neural machine translation research. We should investigate whether machine translation can replace LanguageConverter, our script conversion tool; conversely, our editing fluency in ANY language pair should approach what LanguageConverter provides for its supported languages. By embracing unity between projects and erasing barriers between languages, we encourage the flow of diverse content from minority languages around the world into all of our wikis, as well as improving the availability of all of our content into indigenous languages. Language tools route around cultural or governmental censorship: by putting parallel texts and translations in the forefront of our UX we expose our differences and challenge preconceptions, learning from each other.
Dan Andreescu	Analytics Big Data Open Source Research Third Parties Collaboration Documentation	Our strategic goals include scaling our communities to a truly global level, and expanding our understanding of human knowledge. To do this, in my opinion, we need to have a much better understanding of our communities' actual work. We have tens of thousands of people doing millions of hours of work every month, and nobody knows exactly what is being done, what the definition of "done" is, and how fast or slow the progress is. We are the leaders of the free knowledge movement, and we are mostly blind except for some big picture notions like pageviews and edits. It is my opinion that we need to develop a good understanding of the work being done on the wikis. Very capable people have already spent lots of time trying to do this, but I believe we have largely failed because of technical limitations. This is a big data and big compute problem, and we have not yet approached it as such. A close collaboration between our communities, Analytics, Research, and Audiences teams is needed, as well as the power of the WMF Hadoop cluster. I have had sessions on this topic already, and am excited to finish planning and transition to actual work. There are some very valuable implications of taking on and finishing this work. Most importantly, we will all be able to more objectively talk about frustrations in the community over changes that cause "more work". For example, when we launched Visual Editor there was huge backlash about the amount of work this change implied for our community. But because this was largely based on subjective opinions, emotions got involved and it took years to calm the negative effect of those emotions. This effort would also give us, for the first time, a way to celebrate these millions of hours of work. People could see, share, and take pride in their part of building human knowledge (if they wanted to, privacy is of course one of our top priorities). I am also interested in expanding our Open Source efforts, and examining changes that we can make to spur more collaboration. My reading of the strategic goals for 2030 is that the WMF will not have enough resources to execute by itself. That's where collaboration will be crucial, and where problems like in-house developed libraries without true Open Source presence will slow us down. We let documentation and third-party user support lag behind because we're busy with other stuff, and that's arguably fine for our scale so far. But this approach will not allow us to grow the way our Strategy is defined.
David Barratt	API Drupal Open Source Mobile	The Mission In a Wikimedia cultural orientation, the moderator instructed the class by explaining that 'technology is not part of our mission, technology is only a means to an end.' While it may be appropriate to use digital technology in order to disseminate free educational content effectively and globally, the existence of Wikimedia Technology is not strictly part of Wikimedia's mission. Wikimedia is therefore stuck in a precarious position of maintaining a large open source software project only as a means to an end. This produces a double-minded mentality within the movement: satisfying the mission versus satiating a massive software operation. It is with this understanding that Wikimedia ought to consider partnering with an existing open source community in order to evolve MediaWiki to support the mission, maintain and grow the technical community, and build technologies necessary for embracing mobility. While Wikimedia's mission statement does not cover technology, the mission (https://www.drupal.org/about/mission-and-principles) of Drupal 'is to build the best open source content management framework.' In addition, Drupal is more than capable of handling all of Wikimedia's traffic needs and is flexible and modular enough to allow us to implement all of MediaWiki's features in a UI and API backwards-compatible way. In fact, every feature of Drupal core is an extension and every extension is a first-class citizen that has full control over every aspect of Drupal. Perhaps the next major version of MediaWiki ought to be a collection of Drupal extensions that can be run independently and are also available in a pre-configured 'MediaWiki' distribution of Drupal (https://www.drupal.org/project/project_distribution). The primary user of MediaWiki is, by far, Wikimedia. While the software can be run by others outside of Wikimedia, it's usage outside of Wikimedia is extremely low (https://trends.builtwith.com/cms/MediaWiki). Because of the low adoption rate, it is difficult to gain any users outside of Wikimedia. As a result, there are very few developers outside of the movement that contribute towards its development and almost no outside financial commitment to the project. In contrast, Drupal's usage within the top 1,000 websites is 14 times (https://trends.builtwith.com/cms/Drupal) that of MediaWiki. 'The Drupal community is one of the largest (https://www.drupal.org/about/mission-and-principles) open source communities in the world.' By building MediaWiki on top of Drupal, Wikimedia would be tapping into a user and developer base that is substantially larger than MediaWiki's. Wikimedia would be assisting Drupal in their mission and Drupal would be helping Wikimedia in theirs. Drupal is committed (https://dri.es/drupal-is-api-first-not-api-only) to an API-first strategy. This strategy has enabled Drupal to expose all of its resources in a consumable, highly-cacheable API. They believe strongly in this strategy, because it's part of their mission, and in doing so, helps others like Wikimedia achieve the mission on a global scale. By embracing the API-first strategy, Wikimedia would propel its mobile development into the future. To further Wikimedia's mission, the foundation should consider using Drupal as the foundation of its software. Doing so would facilitate evolving MediaWiki to support the mission, maintaining and growing the technical community, and building technologies necessary for embracing mobility.
Adam Baso	Machine Learning Schema.org Templates Wikidata Structured Data Wikibase Mobile	== Structure Most Things with Schema.org == The future of digital information will likely be brokered by major platform providers such as Google, Apple, Amazon, Microsoft, and international equivalents and social networks. We're thankful they extend our reach, even as we seek to help consumers on the platforms join our movement. We could help platform providers, their users, and our users solve problems better through adoption of the open standard Schema.org into Wikipedia pages mapped with templates and, ideally, federated and synchronized Wikidata properties. Benefits: Wikipedia will have even better presentation and placement in search engines and other data rich experiences. We provide an opportunity for a more consistent data model for template authors and people/bots filling template values. And the richly defined Schema.org entities provide a good target to reach on all entities represented in the Wikipedia/Wikimedia corpora. Standardization can reduce duplication of effort and inconsistencies. We introduce an easier vector for mobile contribution, which could include simpler and different data entry, mapping, and modeling. We can elevate an open standard and push its adoption forward while increasing the movement's standing in the open standards community. Schema.org compliant data is more easily amenable to machine learning models that cover data structures, the relations between entities, and the dynamics of sociotechnical systems. This could bolster practical applications like vandalism detection, coverage analysis, and much more. This might provide a means for the education sector to educate students about knowledge creation, and data modeling, and more. It might also afford scientists and other practitioners a further standardized way to model the knowledge in their fields. What would it take? And can this be done in harmony with the existing {{Template}} system? This session will discuss the following: Are we aligned on the benefits, and which ones? Implementation options. Can we extend templates so they could be mapped to Schema.org? Would it be okay to derive the mapping by manual and automated analysis at WMF/WMDE and apply it behind the scenes? Would that be sustainable? Could we make it easy for template authors to mark up their templates for Schema.org compatibility and have some level of enforcement? Could Schema.org attributes and entity types be autosuggested for template creators? Is it easy to relate the most existing and proposed Wikidata entity types and properties to existing Schema.org entities and properties? What would it take to streamline MCR Schema.org data structures or MCR Wikibase property clusters mapped to Schema.org on defined entity types? Furthermore, if we can do #1 and #2, what's to prevent us from letting templates as is merely be the interface for Schema.org compliant Wikibase entities and properties (e.g., by duck typing / autosynthesis)? How could we bidirectionally synchronized between Wikipedia and Wikidata with confidence in a way compatible with patroller expectations? And what storage and event processing would be needed? Can the systems be scaled in a way to accommodate arrival of real-time and increasingly fine grained information?
Erik Bernhardson	Analytics Machine Learning Open Source Privacy Structured Data Collaboration	Title: Empowering Editors with Machine Learning Background: Advances in machine learning, powered by open source libraries, is becoming the foundational backbone of technology organizations the world over. Many tedious, time consuming, tasks that previously required 100% human involvement can now be augmented with human in the loop machine learning to empower editors to get more done with the limited time they have available to contribute to the sum of all human knowledge. Advice: 1) Invest directly in applying known quantity machine learning, such as pre-trained ImageNet classifiers, to add structured data to our multimedia repositories to increase their discoverability. Perhaps via tools that provide editors with lists of appropriate items that they can easily click to add if appropriate to the multimedia. 2) Engage academia to work with Wikimedia data sets and employ developers to move the most promising results from research into production. There is already a significant amount of work being done in academia to test and evaluate machine learning with our data sets, but little to none of that work ever makes it back into Wikimedia sites. With more focus on collaboration we can encourage research that is specifically applicable to deployment goals. 3)Wikimedia has the ability to collect significant amounts of implicit user data via browsing sessions, searches, watchlists, editing histories, etc. that can be used for machine learning purposes. We need to be continuously thoughtful of the privacy implications of how we use this data.
Amanda Bittaker	Social Impact Strategy	Title Frameworks to connect infrastructure to the mission Thesis We will better achieve social impact, succeed in our strategy, and fulfill our mission when the Foundation uses non profit programmatic frameworks when prioritizing and planning improvements to MediaWiki and other technologies. Proposal Impact is an intangible, abstract social benefit and it can be difficult to consider how changes we make in MediaWiki will help or harm it. To illustrate the connections between infrastructure choices and impact and to incorporate those connections into our plans, we can use programmatic frameworks developed in the nonprofit professional communities. Frameworks used by these nonprofit communities for various types of programs and impact can explicitly and concretely link our engineering choices to the movement strategy and the social benefit we create. This increased attention to the social impact of our technical decisions and investments will in turn create increased investment from our communities, partners and potential allies beyond our community towards fulfilling our mission. WMF programs such as New Readers and Structured Data on Commons, and Wikimedia community programs, such as Wiki Loves Monuments, model how building technology for well-defined social impact can structure our engineering and infrastructure choices towards more strategic and mission driven impact. These programmatic frameworks can be helpful during annual planning, quarterly check ins, and throughout the process of deciding on, planning, implementing, and evaluating technological changes. We would be able to weigh and design intentionally for broad-end users while also supporting the targeted and specific organizing communities who use our technology towards our desired social impact. We could expand the impact that we achieve by consulting expert communities, such as educators, librarians, and activists, who will design additional social-impact programs and processes on top of those tools. We could also identify parts of our communities which already create desired impacts, and build technologies and technological services which increase the scale, effectiveness and efficiency of organizing contributors to fulfill our mission. Socio-technological decisions in our movement can be most successfully achieved when considering both social and technological benefits.
David Chan	Real-time Collaboration Contributors Editing Social	Embracing real-time collaboration Real-time collaboration (like Google Docs and Etherpad) has many benefits but also imposes certain workflow requirements. There is prototype code that can enable real-time collaboration within VisualEditor. But rolling out collaborative editing requires more than technical work. It will require a coordinated effort to re-imagine what editing is like. We will need mechanisms to create user groups, real time chat mechanisms, mechanisms to temporarily persist collaborative sessions, perhaps even new core mechanisms for describing revisions. We also need to think about social mechanisms and preventing harassment and vandalism of collaborative sessions. In exchange, we will gain improved mechanisms for mentoring, translating long articles, reporting on current events, and assisting non-native speakers. We should embrace this opportunity to reimagine our platform, starting by organizing a number of trials to gain insights into whether, or how best, a real-time collaborative editing option would benefit our projects. Sessions in previous Wikimanias / Hackathons / Developer summits identified potential uses, including for: - Mentoring - Translating long articles - Current events - Assisting non-native speakers Potential issues identified include: - How to log authorship - Who decides when to publish - Preventing in-session abuse - Coexisting with non-real-time editors References https://wikimania2017.wikimedia.org/wiki/Submissions/Waiting_for_Real-Time_Collaboration (Wikimania 2017 panel discussion on real-time collaboration) https://phabricator.wikimedia.org/T165941 (Wikimedia Hackathon 2017 showcase including live demo of VisualEditor real-time collaboration)
Cindy Cicalese	Open Source Third Parties Installation Innovation	How and with whom should we partner to create the technologies needed to support the mission? A substantial, growing community of MediaWiki users and developers outside the Wikimedia movement has evolved, creating wikis that vary in size, number of editors, number of readers, access restrictions, and activity. The Wikimedia movement benefits from this third party MediaWiki developer and user community's technology contributions and innovation. Similarly, this community benefits from the Wikimedia movement's stewardship of MediaWiki as the foundational technology in support of Wikipedia and its sister projects. There are many areas in which the needs of these two groups are identical, including stable, well-performing software that supports community authoring. Partnering with the third party MediaWiki community will result in a platform that is better for all parties. How should MediaWiki evolve to support the mission? There is much knowledge in the world that cannot find its place within Wikipedia or its sister projects. MediaWiki is powerful software crafted especially to support the expression of all knowledge. Third party MediaWiki wikis can provide a home for knowledge that does not belong in Wikipedia, supporting the mission of sharing in the sum of all knowledge. In order for the third party MediaWiki community to continue to thrive and to grow, several impediments to MediaWiki adoption that especially affect that community must be addressed: - Installing and maintaining all but the smallest and most basic MediaWiki installation currently requires a high level of craftsmanship and expertise. - While a large number of novel MediaWiki extensions exist to support third party applications, it is difficult to ascertain the level of maturity and support of these extensions. - Some enterprise consumers require a guaranteed level of support and/or service level agreements before adopting a technology. - The barrier to entry for those wishing to experiment with MediaWiki in production quality environments is high. How do we maintain and grow the technical community and ready it for the mission ahead? The third party MediaWiki community already significantly contributes to the code base. In the last two years, 22% of the commits to MediaWiki core were made by third party contributors, and 62% of the authors of commits to MediaWiki core were third party contributors. Even more striking, in the last two years, 40% of the commits to MediaWiki extensions hosted on gerrit were made by third party contributors, and 67% of the authors of the commits were third party contributors. The third party MediaWiki community is a significant training ground for skilled MediaWiki developers used to tackling a diverse set of challenging problems who sit poised to help forge the path ahead for MediaWiki.
Jan Dittrich	JavaScript Open Source Research Documentation Gadgets Infrastructure	I believe that we need to achieve a better separation of concerns - in code as well as in work on product and our communication with the communities to reduce the dept we build up in these areas. Therefore, I want to suggest three interrelated topics: Use of modern MVVM frameworks for our front end code, to develop more efficiently Provision of a modern customization infrastructure, to decouple gadgets from our code Participation beyond code and feature wishes Use of modern MVVM frameworks for our front end code Traditionally, Mediawiki has been focused on PHP. Over the last years, more and more interactivity via JavaScript has been used. In connection with the significant growth of the JavaScript ecosystem this could have meant quicker development and a clearer separation of concerns. However, since our solutions are mostly used in MediaWiki context, involving external developers has been difficult and so has been onboarding developers in core teams. I see a large opportunity in introducing modern MVVM libraries that are open source and not constrained to use in Wikimedia software and could build upon other's experiences as well as documentation - things that have been traditionally problematic in our isolated MediaWiki solutions. Strong contenders are React and vue.js. While the react ecosystem is larger, I would recommend vue for its better documentation and clear compartmental structure which hopefully helps us to avoid further isolated solutions. Provision of a modern customization infrastructure The introduction and larger use of a MVVM could also be a chance to provide clear frontend APIs for Gadgets. They currently use DOM-hacks, which break continuously and would not anyway not possible when using a modern frontend framework (due to DOM flushing). Why should bother, since we have a large user base in which different tasks are shared using specific tools, just like each manual work has many different specialized, often even customized tools. Additionally, gadgets/userscripts could provide a low-barrier opportunity to onboard new developers. Other organizations successfully show that user provided extensions can enhance an ecosystem with user driven innovation and help with onboarding developers, e.g. Firefox' and Chrome's WebExtensions as well as LibreOffice. I would like to work on finding a way fulfil the possibilites of gadgets and extend them while providing sustainable and secure infrastructure for doing so. Participation beyond code and feature wishes We already do extensive user research. A large area for expansion and further development is doing this research and sense making with the community. This may already be done, often implicitly, based on feature- or UI focused requests of community members. But this has large caveats: The solution may net be feasible or sustainable to implement. Furthermore, without understanding the underlying need, we risk building technical- and UX debt and give away the possibility of learning from our community. To achieve an active, needs-based involvement of communities in design and research we could build on existing participatory design methods. They could be used and integrated in our research and product planning frameworks. Clearly integrating community in up front research could enable us to gather needed knowledge, have community participation and reach a better understanding between Wikimedia Foundation and communities as well as of the communities among each other. I want to define future participatory design strategies to be used on our way towards 2030.
Eric Evans	Architecture Infrastructure Strategy	More than just servers A modern approach to tooling and infrastructure is needed, not only so that we may scale future content and users, but the deployment of new technologies and services as well. Our way of approaching infrastructure is in need of modernization; What we do we do well, but if we are to grow our capacity, not just for storing and serving content, but our capacity to create the technologies that empower movement strategies, we need a departure from the idea of infrastructre as merely clusters of servers. We need high-level, easily consumed platforms for computation, storage, deployment, and management. We need environments that make experiments cheap, and allow teams to fail-fast or iterate on the next stage quickly. We need systems that are distributed by nature, self-service, secure, and that are able to provide insight into availability and performance. Some efforts have been (or are being) made here. Examples include recent work toward a Kubernetes deployment, a change-propagation service, and RESTBase. These efforts, while worthwhile, are piecemeal, and no holistic strategy exists. It is my belief that as we discuss the future of our platform, we should consider the requirements in the context of the bigger picture, and discuss a strategy aimed at modernizing our infrastructure.
Benoît Evellin	Strategy New Users Discussions User Experience Mobile	== How to built a discussion system that would ease user interactions and content creation on the wikis? == I believe that Structured discussions are a must-have for MediaWiki. Build such a system will reduce communication gap on the wikis, ease newcomers first steps, empower all users and allow powerful interactive tools to be built. It will also increase a lot the adoption of MediaWiki as the knowledge creator system. The MediaWiki community has a strategic priority decision to take on this topic. The Wikimedia communities and organizations, through MediaWiki, wants to give everyone a way to create (free) knowledge collaboratively, for all users from everywhere. Imagine doing it without a powerful discussion tool that would face international interactions, scale and manage to keep everyone aware of the ongoing work. MediaWiki powered experiences have proven that it is not possible. Unstructured messages are based on a blank page which hasn't evolved since 2002. You can do anything using a blank talk page. But Discussions as the are now don't provide basic things people are used to on social networks or Gdocs for example. Among many missing features, users can't reply to a discussion by email, or using mobile the interface; users have to know where to post and how to use a unique technical etiquette to discuss; and more. Current discussion default system is not welcoming everyone. Several communities like Wikimedia and WikiHow create inventive ways to structure discussions a bit: templates, contents preload, mentions, surcharge of discussions with HTML, local scripts and bots... Those are not unified and supported by other than communities themselves. Some wikis have decided to use Flow and expect improvements to have a better experience. Some others communities, often the small ones, prefer to use Facebook or other social networks to discuss, which is not a free, safe and open environment. The approach supported by the Wikimedia Foundation is Structured Discussions extension (re-scoped from Flow) to focus on user-to-user discussions. Consider that extension as a MediaWiki high-priority building block extension is a political decision the MediaWiki community needs to take. It will permit to build strong and diverse communities, decreasing technical barriers. Built that discussion system requires a clear strategy and resources, like it has been done for the visual editor a few years ago. Any important effort will have side effects that will benefit to other projects (like VE project did notably by developing Parsoid), by being used by other extensions or services that would benefit discussions to create very powerful features, like in-articles notes or suggestions, or easier request systems. Work on discussions on the Web is not a new topic. We can benefit of studies made about on-line discussions, both about UX design and technical implementation. The MediaWiki community also have some experience about what is not possible or not desirable, taken from LiquidThreads and Flow.
Matthew Flaschen	Open Source Strategy Discussions	We need to re-evaluate scaling, on both the technical community side and the content side. On the technical side, too often we think as if we were an isolated organization, rather than a respected leader that many wish to collaborate with. This causes us to ask ourselves the wrong questions and get the wrong answers. For example, we asked ourselves whether we should limit ourselves to existing open source translation tools, or use proprietary translation services to fill in the gaps. Instead, we should have stayed committed to open source, and asked how we can use our engineering and financial resources to advance open source translation. This is a major problem that no organization can solve on its own. However, we have both the motivation and resources to be a major contributor to the solution. We also asked whether we should support the proprietary MP4 format, or limit ourselves to weak device support for open formats. Instead, we should be staying committed to open standards, and working to support their uptake among software developers and device manufacturers. We already have significant relationships with wireless carriers that give us a foot in the door with such manufacturers. By seeking important partnerships, where we are prepared to put in significant effort, we can greatly scale both our own efforts and those of the broader movement. On the content side, to achieve sustained long-term growth, we need to grow every type of user activity, including writing, editing, discussion, organization, curation, maintenance, workflows, and moderation. We have historically provided good (and improving) support for writing, editing, discussion, and moderation. However, we have neglected the related processes of organization (e.g. categorization, tagging), maintenance (e.g. tracking articles that need fixes, updating them as they become out of date), curation (e.g. quality images, featured articles), and workflows (used in multiple areas, but particularly supporting organization, maintenance, curation, and moderation). It is vital that we improve discussion, curation, workflows, and moderation tools. Otherwise, we will be unable to keep up with increasing content and activity as our improvements to writing and editing succeed. We should look at past successes (e.g. the Teahouse) and failures (e.g. Article Feedback) and learn lessons. In both cases, we made a very specific product, which then succeeded or failed. This is not scalable to hundreds of wikis, and it is hard to iterate in response to lessons learned. Instead, we should focus on platforms, such as workflow systems. In order to keep up with the community, we need to give them the flexibility to constantly use the software according to their needs.
Corey Floyd
James Forrester	Research Strategy Tools	Fundamentally, Wikimedia's technology are tools to achieve our mission - absolutely vital tools, but not objectives in themselves. Where a tool has dulled we should sharpen it, where it has rusted we should polish it, and where it has blunted we should replace it. The majority of our tools have sprouted over time in response to immediate needs, and grown ad hoc when we've spotted something they can also do, or been pruned back when they proved too unwieldy to retain. Our communities have taken these tools and built amazing things with them, often despite rather than in line with their intended use. Subsequently these unplanned use patterns have shaped what we think about the tools and how they should be used, when we do so. This haphazard, tactical development has worked well enough, but has limited us in several ways. We often fail to serve some of our audience because we rush in with a quick fix that listens to a few voices and decides that that's the best thing to build. When we've tried to build more systemic change, it's often been unrooted in serious evidence, and so is like constructing ivory towers into the clouds: baffling, hopeless, and unfamiliar. We should develop comprehensive methods to collect and monitor actionable data on how well our tools are serving their purposes, and where we can improve. This should come from all stakeholders, covering our great, already-empowered, experienced editors in major languages but also those from whom we rarely hear - those contributing in and speaking smaller languages or not interacting with other users on meta-editing issues, and those with a looser relationship to the movement like readers and casual editors. We should have numbers clearly attached to our tools as to how we expect them to perform. How these are obtained will differ. Sometimes quick numbers like success rates of false positives against false negatives from anti-abuse features, or how many users having made changes try to press the submit button, will work. Sometimes simple surveys with expected happiness thresholds will be appropriate. In others we may need to work harder to come up with the right way to understand how different tools and experiences interact with each other, like how much "knowledge" readers successfully glean from the article, or whether the burden of allowing logged-out editing is worth the mindshare of "anyone can edit" feeling true. Ideally, changes to user features and especially introductions of new features should progressively roll out based on these numbers - and if they have adverse effects, they should be automatically rolled back. This is how others operate, but it's very distant from today. It's a far-off dream now, but I believe we can build it.
Markus Glaser	Software Development Practices Third Parties Code Review Gadgets Templates Documentation	Mediawiki needs a professional ecosystem There is a huge potential for MediaWiki development outside of the Foundation's organized tech world. Thousands of organisations are running MediaWiki on the internet or intranet. They are investing time and money to make it their platform for information sharing, knowledge management or collaborative work. Yet, a lot of the development and design work stays contained on those installations instead of being published and provided to the greater MediaWiki community. I think this is not because of seclusiveness of the authors, but because we make it hard for externals to contribute. So how can we tap into this potential? I think there are a number of measures we can take. Among others, these are: Support standard ways for code contribution. For example, a lot of developers do have a github account, and know the github workflow of forking and requesting pulls. However, there is currently no way for them to contribute their code directly, instead they have to set up with our gerrit infrastructure. This is a hurdle many will not take. Maintain extensions as a community. There are a lot of extensions which are not actively maintained by their authors. In order to get them working, you have to wait for the maintainer to +2 your code. Although I have +2 rights, it is not clear under which circumstances I should actually +2 code, nor is there a general review queue for extensions. We can establish a group of volunteers who review changes to extensions on a regular basis. Create a template and gadget repository. A lot of work goes into site customisation using gadgets, templates or on-site-CSS. There are brilliant solutions out there, but we do not have a structured way to centrally collect this content or even curate it. Make it attractive for professional developers and consultants build their projects on top of MediaWiki. For example by increasing the visibility of highly used extensions on MediaWiki.org, by providing good entry points for technical documentation or by adding automated quality checks to the extensions. There are already some initiatives pursuing the general goal of fostering an ecosystem, e.g. MediaWiki Stakeholders or the recently announced Enterprise MediaWiki Consortium. Together with the Foundation, they can encourage MediaWiki maintainers to contribute their ideas and code and be part of the MediaWiki world.
Ariel Glenn	Censorship	Think like a Pirate: How to beat Internet censorship Universal access to a digital good such as the knowledge curated and made available via Wikimedia projects, presupposes access without censorship. Censorship and circumvention methods become more advanced over time. Censorship ranges from blocks of single articles to targeting DNS providers to seizing servers to shutting off Internet access completely. Some of these methods are in use right now against Wikimedia projects. One form of censorship evasion has proven virtually impossible to stamp out: piracy of copyrighted content, in particular music and movies. Let's look at the methods used by the pirates and adapt them for use by Wikimedia content providers and users. We would like our content to be widely shared, available everywhere. Here is what we need to get started: 1. Content must be downloadable and usable off-line. * Content meant to be used online, that requires contact with an external server, fails this test. Movies and music do not. 2. Content must be partitionable. * You don't grab all alternative music for 2017, but just the albums from the artists you want. Users will likely not need or want all of the English language Wikipedia (for example) but only subsets. 3. Content must be usable off-line by applications everyone has. * Movies and music are downloaded in formats that play in apps that come standard with every OS on every platform. Usability must include navigation and search of content. 4. Downloadable content must be easy to find, both before and after censorship. * You ask Google to find the music or movie you want on YouTube or elsewhere, click and download. Failing that, there is a fallback (see below). 5. Tech-savvy downloaders must be able to seed the distribution of content to everyone else. * For music or movies, folks who download from private torrent trackers make copies to give to all their friends; six degrees of separation later, we have reached saturation. 6. Content must be popular enough to be widely shared. * If a group of consumers cannot locate a content source or redistributor, the distribution chain breaks. Poorly seeded torrents are the classic example. 7. People must not rely solely on the original online content source for access. * If no one has downloaded or mirrored a copy before access to the original content source is blocked, this approach fails. Note that most people will have little incentive to save copies of content for offline use from a reliable site, unless Internet access itself is spotty, or the content bundles for download add value. In some jurisdictions, it may be dangerous to possess certain content, including that of the Wikimedia projects. This issue is outside the scope of this proposal. Related topics: https://www.mediawiki.org/wiki/Wikimedia_Apps/Offline_support, http://www.kiwix.org/, http://xowa.org/ and so on
Anne Gomez	Multimedia Strategy Structured Data	Wikimedia properties need to keep pace with the norms of browsing and information consuming behavior to stay relevant, grow readership, and bring new editors to add their knowledge to repository. We need to support smaller content types - both for contributions and for consumption. At the same time, we need to support multimedia content, from video to interactive graphics to augmented reality. Structured data will allow us to be more flexible in our presentation of information, and create more complex interactions with that information. Video and audio will open the doors to new contributors and new projects. Content consumers online now, whether among the highly connected or using the internet for the first time, are looking for the right information available to them at the right time. They don't necessarily want long, encyclopedic content, but instead prefer snippets of information served to them just when they need it. And they learn through more immersive experiences - video, augmented reality, interactive graphics - rather than long form text. Even beyond that, huge portions of the world can't access our content for a number of reasons: they don't have internet access, they can't read, their languages don't have keyboard support, there isn't content in their language. The internet as a whole is evolving to meet these changing needs. Messaging apps support walkie-talkie like communication, Google serves just the right answer to any question (in English), and language support for smaller languages is growing cross-platform. Our infrastructure needs to meet these needs.
Aaron Halfaker	Artificial Intelligence	The future of responsible AI design is auditing systems. When we deploy AIs that make inherently subjective judgments that affect people and their work, we must also provide a means for them to audit and critique the AI. Did the AI mark the wrong thing as vandalism? Then it can silence a contribution. Did the AI fail to note a high quality article? Then we might direct traffic away from good content? Did the AI recommend the wrong type of thing? Then we might keep people in a filter bubble rather than helping them broaden their knowledge. There's a lot of conversation in the public sphere about how AIs cause ethical and social problems. Google's image search suggests that all CEOs are men. Facebooks feed filter reduces the visibility of conflicting opinions. The general call among researchers and ethicists is for transparency. At Wikimedia, transparency is an old idea. We've always developed our technologies transparently. But this hasn't made us immune from the problems that AIs wreck. Auditing systems are the future. They are a means towards giving users power over the AIs that govern our experience. We should be talking about how to build them.
Derk-Jan Hartman	Strategy User Experience Gadgets Tools Complexity	Growing and complexity Our strategy is pointing us towards a bold and inclusive world in terms of projects and people. Almost by definition this will lead to increased complexity, not simply of our technology, but also of how to deliver to and to enable people to make use of our technology. In the last few years we have spent energy in creating more api's and a more service oriented architecture. An area where we however have not made such major changes is how we design for and work with the front end of the software, which is where the majority of people are actually using all the other stuff we make. Here we continue to think in larger products and problems to solve, and quite often tend to fail and even clash with our own 'customers'. By taking on a more diverse strategy, we risk being even more vulnerable to this. I have two suggestions: Smaller engineering. Allowing more time for smaller projects, smaller bugs, smaller tests of ideas and refinement of existing software. Let's embrace the success of Community Wishlists and be closer to our communities by writing more Gadgets or tools (toolforge) when we can, instead of going for 'the big fix'. Have three 1 week tests instead of one 6 month beta. etc. Fix small bugs that annoy many and that make our website feel amateurish, and improve the experience for everyone. Working more often on the needs of smaller projects, giving them a bigger voice and sprucing up our own solutions by gaining a more diverse experience. Be closer to our communities by working nearer to them. The second point that we should work on, is to stop thinking of our platform as a website. It is a work environment for an increasingly diverse crowd. We have a limited amount of space on the screen and a huge amount of tasks that various people want to do. Gadgets and even more so userscripts are hugely helpful, but have long since become unmanageable. It is time to think beyond the simple APIs and widget kits. We need to take a step towards becoming an application environment. We need users to be able to install and use complete apps made from recognizable and reusable building blocks. I want to see and use Gadgets as my browser uses extensions. I want those extensions to put apps in recognizable and consistent spots, to allow for fullscreen or splitscreen views, to have a familiar UI, but without having to cram everything into the limited shared space that we have. Apps as gateways for diversifying the specific solutions we build.
Mark Hershberger	Third Parties Offline Editing Synchronization	When Marshall McLuhan said "The medium is the message", he was saying that how the message is understood is affected by what is used to present that message. MediaWiki is a fundamental part of the medium used to present Wikimedia's work (the "message"). Because the medium is an integral part of the message, it requires comparable attention to its availability and accessibility. For example, effort is made to ensure that people in remote areas have access to selected content through Kiwix, but a very limited effort has been made to incorporate their knowledge into the "sum of all knowledge." While there are efforts underway that include copying edits into Wikipedia by hand, it should be possible to provide people in remote areas with an editable copy of Wikipedia so that their edits could be incorporated with less intervention. Improvements in the installation and resource consumption of a simple MediaWiki installation could be made without sacrificing the current PHP-based application such that someone could, for example, run a current MediaWiki installation an a un-rooted Android phone. Work could then be done to automate the synchronization of that MediaWiki with the current Wikipedia content. This work on MediaWiki could, of course, be used by other people who use the tool besides the WMF which could create a virtuous cycle that would benefit the Foundation. In fact, deeply incorporating McLuhan's thinking into WMF culture would mean that, while Wikipedia would remain the most visible product of the Foundation, there would be more room to focus on expanding MediaWiki's capabilities beyond what fits into the current focus on GLAM efforts, the website, etc. Most of the world does not use Wikipedia every day, but many people use something they've learned as a result of reading from or contributing to Wikipedia every day. Making it easier for people to deploy MediaWiki where the potential users do not have the resources of the WMF (for example, in a place that doesn't have a stable Internet connection) could encourage more people to actively embrace of Wikimedia's vision of freely sharing knowledge.
Marius Hoch	Third Parties Wikidata	Making use of Wikidata's knowledge on Wikimedia projects and beyond It is foreseeable that the way our content will be consumed is going to change a lot in the next years, as both the demographics of the internet as well as the devices used by our readers are changing. We should try to adapt to this by offering our content in ways best suited for many user scenarios. In order to achieve this, we need to modularize and structure our content so that it can be easily re-interpreted and used in many different ways. Wikidata gives us the possibility to easily cater for this trend, by providing machine readable data about any subject, which can be formatted and presented in a wide variety of ways and languages. Wikidata makes it possible to more easily maintain up to date information on subjects in many languages, without the burden of manual data maintenance. We should strengthen this by improving the integration of Wikidata with other Wikimedia projects, providing easy ways to use and profit from Wikidata especially for small communities and by making the power of Wikidata more visible. While all Wikimedia communities will profit from this, it can be especially worthwhile for small communities, that currently don't have the resources for managing data, like Infoboxes, themselves. Example projects that will help in this area are the 'ArticlePlaceholder', that allows serving Wikidata-data about a certain subject if there's no article about it. Also the plans for automatic Infoboxes derived solely from Wikidata and other means of using Wikidata-data on Wikimedia projects. While both of these projects can have a big community impact, they need to fit in with the current infrastructure. Also they pose certain new scalability and data presentation challenges that need to be addressed. Furthermore, Wikidata's information should be easy to reuse by third party projects to increase visibility and in order to gain contributions and data donations, making Wikidata the true data hub of the internet. This goal raises longstanding issues with the current Wikimedia dump infrastructure, which is neither very flexible nor does it provide a machine readable interface for data consumers. Also bringing more individual editors and organizations into Wikidata poses various infrastructure and scalability issues coping with the sheer amount of data and changes happening, as well as providing convenient tools for establishing and maintaining data quality.
Michael Holloway	Open Source	Free Software is Fundamental to Our Mission Position MediaWiki is a prominent free software project, and the Wikimedia projects have always run on free and open-source technology, but our relationship to free and open-source software needs clarification. We should formalize that we are committed to making, using, and leading in the development of free software, even when doing so is more difficult or less efficient in delivering user value than adopting closed solutions, as a central part of our educational mission. Discussion How does free software relate to the free knowledge movement? In this movement we are building a body of open knowledge, curated collectively and accessible to all. We develop the software that powers these projects in the open, and we run our backing infrastructure on free and open-source technology. We choose to do these things not because they are easy, but because they are hard. Existing free software is not always, or even most of the time, practically superior.[1] We work in the open so that others can contribute to and learn from our processes; our work product is educational content in its own right, and in that way directly contributes to our mission.[2] By ensuring that our tools and processes are open, and working through problems with free software projects rather than rejecting them in favor of closed solutions, we empower others everywhere to join us in doing this hard work, or to launch like-minded projects of their own. It's often tempting to conclude that our users could be better served by adopting closed or proprietary software solutions to our engineering problems, rather than adapting free software to meet our needs or writing our own. This may be true in the short term, but over the long term this contributes to the cloistering of software engineering expertise in closed commercial enterprises. Our goal is to expand and not restrict the knowledge of software engineering principles and practices, and we are playing a long game. What could a formal commitment to free software mean in practical terms? This is intended as an open-ended question for discussion, but here are a few ideas: We should take a leadership role in the development of free software languages and technologies on which we depend (e.g., PHP). Where we develop software for closed platforms (such as the mobile apps), we should promote free alternatives for their distribution channels (e.g., F-Droid [3]) and ensure they can be run without depending on proprietary software. We should encourage and recognize contributions by our engineers in the broader free software community. [1] https://www.gnu.org/philosophy/when-free-software-isnt-practically-superior.html [2] https://wikimediafoundation.org/wiki/Mission_statement [3] https://f-droid.org/
Katie Horn	Third Parties Volunteer Developers	WMF should focus on the technical issues it is uniquely positioned to handle, and let the volunteers have the fun stuff. When we think about what technical work the WMF is engaging in, I don't believe enough time is spent considering volunteer motivation, and the great potential we are systematically choosing to ignore, or end up devaluing entirely due to the inherent unpredictability of volunteer work. I do believe that there is a long enough history of deeply understaffed WMF engineering teams getting set up to tackle fancy front-facing projects, only to have those teams simultaneously struggle to deliver, and deter everyone else from getting too near decision-making in their territory. It is time to change our approach. I would like to talk about what it would take, to refocus the majority of WMF's technical work away from taking full ownership of all the 'important' new ideas, and toward making it as easy as possible for momentarily highly motivated outside parties to make meaningful contributions to new features. I imagine many new tools would be required to scale release engineering, security, and the technical community in general. We would have to take a greater role in mentoring interested parties. There are also known big hairy unsolved problems in the way we currently think of maintainership. Major changes would have to be made in our current approach to product timelines and product/project management. Of course, there will always be things that do require a high level of predictability in the outcomes. Donor money can and should be spent on ensuring predictability around the things we absolutely cannot function without. However, there is a whole world of ideas that absolutely do not have to be accomplished on a strict 'shipping' timeline, and it seems that the WMF will always hold the keys to that door. I would like to figure out how the WMF could start embracing that unpredictability at every level, and move much more deliberately from 'bottleneck' to 'enabler'.
Trey Jones	Languages Machine Translation	My purpose in attending the Dev Summit is to enjoy the benefit of collaborating in person with others who are passionate about technology that brings information to the world in a variety of languages. When I imagine a world where everyone really can share in all knowledge, I don't imagine all of them doing so in their native language. The most important foundation for language technologies that will reach as many people as possible is informed realism-with insights from both linguistics and computer science. The most common estimate of the number of languages is 6,000. An unfortunate number are critically endangered, with only dozens of speakers; 50-90% of them will have no speakers by the end of the century. Providing knowledge to everyone in their own language is unrealistic. We should always seek to support any community working to document, revive, or strengthen a language, but expecting to create and curate extensive knowledge repositories in a language with barely half a dozen octogenarian speakers whose grandchildren have no interest in the language is more fantasy than goal. Statistical machine translation has eclipsed rule-based machine translation for unpaid, casual internet use and building it doesn't require linguists or even speakers. But it does require data, in the form of large parallel corpora, which simply aren't available for most languages. Even providing knowledge in translation is impractical for most of the world's languages. English speakers are notoriously monolingual, but in many places multilingualism is the norm, with people speaking a home language and a major world language. A useful planning tool would be an assessment of the most commonly spoken languages among people whose preferred language does not have an extensive Wikipedia. Whether building on the model of Simple English or increasing the readability of the larger Wikipedias, we can bring more knowledge to more people though Hindi/Urdu, Indonesian, Mandarin, French, Arabic, Russian, Spanish, and Swahili-all of which boast on the order of 100 million non-native speakers or more-than by trying to create a thousand Wikipedias for less commonly spoken languages. English is particularly suited to simple computational processing-a fact often lost on English speakers; it uses few characters, has few inflections, and words are conveniently separated. Navigating copious amounts of knowledge requires search. The simplest form of search just barely works for English, but often fails in Spanish (with dozens of verb forms), Finnish (with thousands of noun forms), Chinese (without spaces), and most other languages. Fortunately, for major world languages we have software that can overcome this by regularizing words for indexing and search. Again, none of this is to say that we should ever stop or even slow our efforts where there is a passionate language community-or even one passionate individual-working to build knowledge repositories or language-enabling software. But we must be realistic about what it takes to reach the majority of people in a language they understand.
Lucie-Aimée Kaffee	Languages Machine Learning Translation Wikidata	Languages in the world of Wikimedia One of the central topics of Wikimedia's world is languages. Currently, we cover around 290 languages in most projects, more or less well covered. In theory, all information in Wikipedia can be replicated and connected, so that different culture's knowledge is interlinked and accessible no matter which language you speak. In reality however, this can be tricky. The authors of [1] show, that even English Wikipedia's content is in big parts not represented in other languages, even in other big Wikipedias. And the other way around: The content in underserved languages is often not covered in English Wikipedia. A possible solution is translation by the community as done with the content translation tool [2]. Nevertheless, that means translation of all language articles into all other languages, which is an effort that's never ending and especially for small language communities barely feasible. And it's not only all about Wikipedia- the other Wikimedia projects will need a similar effort! Another approach for a better coverage of languages in Wikipedia is the ArticlePlaceholder [3]. Using Wikidata's inherently multi- and cross-lingual structure, AP displays data in a readable format on Wikipedias, in their language. However, even Wikipedia has a lack of support for languages as we were able to show in [4]. The question is therefore, how can we get more multilingual data into Wikidata, using the tools and resources we already have, and eventually how to reuse Wikidata's data on Wikipedia and other Wikimedia projects in order to support under-resourced language communities and enable them to access information in their language easier. Accessible content in a language will eventually also mean they are encouraged to contribute to the knowledge. Currently, we investigate machine learning tools in order to support the display of data and the gathering of new multilingual labels for information in Wikidata. It can be assumed, that over the coming years, language accessibility will be one of the key topics for Wikimedia and its projects and it is therefore important to already invest in the topic and enable an exchange about it. [1] Hecht, B., & Gergle, D. (2010, April). The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 291-300). ACM. [2] https://en.wikipedia.org/wiki/Wikipedia:Content_translation_tool [3] https://commons.wikimedia.org/wiki/File:Generating_Article_Placeholders_from_Wikidata_for_Wikipedia_-_Increasing_Access_to_Free_and_Open_Knowledge.pdf [4] https://eprints.soton.ac.uk/413433/
Ryan Kaldari	Languages Wiktionary Gadgets Structured Data	How should MediaWiki evolve to support the mission? One of the greatest barriers to the spread of human knowledge is the barrier of language. While Wikipedia does a great job of supporting hundreds of languages, the amount of content available in most language Wikipedias is still paltry and has a small impact on the knowledge available to speakers of those languages. For a huge percentage of the world's population, the key to unlocking knowledge isn't discovering Wikipedia, but learning new languages. Even for English speakers, the impact of learning a new language can be life-changing and open up many new opportunities. The Wikimedia Foundation is the steward of one of the greatest repositories of information about language in human history, Wiktionary. Unlike all other dictionaries on Earth, Wiktionary aims to define (in 172 languages) all words from all languages. In other words, not just defining English words in English and French words in French, but also French words in English, English words in French, Latin words in Swahili, Mopan Maya words in Arabic, etc. It's ambitious aim is to be the ultimate Rosetta Stone for the human species. While Wikipedia is in some respects maturing and gradually yielding diminishing returns for more investment, Wiktionary is still a small and growing project that has yet to fulfill its potential or break into mainstream consciousness the way that Wikipedia has. While one of the impediments to Wiktionary reaching its potential is lack of structured data support, which is being worked on, there are many improvements that could be made in the meantime to improve the usefulness of the site to both readers and editors. These include converting many of the fragile gadgets and site scripts into maintainable extensions, customizing the user interface to more closely match what users expect from a dictionary site, and adding dictionary-specific tools to the editing interface. There is also unexplored potential with building apps around the Wiktionary data, including apps tailored around language learning. Now that the Wikimedia Foundation has nearly 100 software engineers (and dozens of volunteer developers), it should explore the potential of its lesser known projects, especially Wiktionary, which has the potential to actually make a large impact on the Foundation's mission and bring more of the sum of human knowledge to more people around the globe.
Roan Kattouw	Architecture Contributors Data Center	Users should not be punished for logging in WMF wikis are slower for logged-in users than for anonymous users, which is unhelpful for trying to get users to contribute. This is a long standing problem that's hard to solve, but we should have a vision for how we're going to solve it. WMF has caching data centers in strategic locations around the world (Amsterdam, San Francisco and soon Singapore), which make the wikis faster for users who are not near the primary data center (in Virginia) but are near a caching location. However, this only benefits anonymous users. For logged-in users, every page view contains their user name and other user-specific information in the personal tools area, so logged-in page views are considered uncacheable and are always routed to the primary data center. This means that if a new user browses the site for a while, then creates an account because they want to contribute (or makes an anonymous edit), the wiki suddenly becomes slower for them. All users are affected, because uncached requests are slower to serve than cached ones, but users outside North/South America are affected the most, because their traffic now has to cross an ocean that it didn't have to cross before. It's not nice that a new user's 'reward' for creating an account is a slower experience, but it's especially not nice that users in emerging communities are affected the most. If we want to encourage readers to become contributors, slowing the site down as soon as someone contributes is not very helpful. Some requests will always have to go to the primary data center, such as POST requests saving an edit, and those are always going to be slower for users outside North America. But for logged-in page views this isn't fundamentally necessary, and serving them from the edge caches would speed up the site for logged-in users and reduce the load on the app servers. There are different ways that this could be done, each with their own obstacles. For example, a single-page application for MediaWiki could use a content service to retrieve only the new page's contents when navigating, but this would require modifying or rewriting a lot of code in MediaWiki; ESI could be used to have Varnish inject cached page contents into a user-specific chrome, but that would require using advanced and partly unproven Varnish features. In both cases, we'd have to reimplement certain rendering preferences using CSS or a post-processing step. It's far from trivial, but let's start talking seriously about how we can address this problem.
Daniel Kinzler	Architecture JavaScript Strategy Collaboration Hosting Mobile	I would like to discuss how assumptions drive our day to day work, and how to may sure we properly understand and regularly challenge these assumptions. I'm particularly interested in how technological assumptions shape product decisions, and how product assumptions shape technological decisions. Three major axioms come to mind: MediaWiki needs to run in a shared hosting environment. This has been an explicit requirement for a long time now, but the baseline product that actually does run in such an environment (LAMP with no root access) is becoming more and more sub-par. We are already struggling to provide a decent mobile browsing experience there, not to mention search or WYSIWYG editing. So we should have a discussion about for how long we want to kep this requirement, what the consequences would be of dropping it, and what alternative platform we should target for the baseline installation of MediaWiki. Editing has to work with old browsers and without JavaScript. It has long been an explicit requirement that no basic functionality, particularly editing, can require JavaScript to be enabled. However, this causes us to fall behind other sites further and further. With more and more sites requiring JS, it's becoming less and less clear to me that this requirement is still sensible. This is especially true in the light of many developing countries skipping straight from mostly-offline to mobile-only. The primary medium for knowledge sharing is text. This assumption used to be hard-coded into MediaWiki until the introduction of ContentHandler, and it still seems to be hard coded in the minds of many long term contributors, to the software and to the wikis. I believe that it is high time to invest into exploring other media formats and alternative forms of collaboration. It seems to me like "Beyond Wikitext" is the major technological challenge that has come out of the movement strategy process, and that we should start thinking and talking about it - from the technological side as well as the product side.
SJ Klein	Strategy New Users	Our platforms should refocus on collaborating, drafting, and experimenting. Currently much focus is on polished presentation + restriction, hindering experiments and limiting participation. __Technical aspects__ Editing tools focus on fast smooth drafting: multiple simultaneous editors, suggested changes. A terse, readable history highlights major revisions of an article. Discussion is integrated into the draft interface, and toggled on / off. Articles can be forked & merged, supporting all sorts of experimentation. Different groups can work on parallel forks, merging later if they like. Newcomers not following a policy can be channeled to an individual branch while sorting it out, avoiding edit wars. Sandboxing helps avoid "deletion": questionable or disputed contributions can be sandboxed to a hidden or low-visibility personal page. [This is also conducive to distributing an online/offline federation of editors, e.g. over IPFS] Editing, creation, and uploading are encouraged prominently in every page interface. Matchmaking tools help creators find others with similar interests, learn + collaborate. Tools for similarity checking, merging, metadata / license review, + meta-moderation, help anyone contribute and learn new ways to do so. Deleted material [unless oversighted] is reviewable by all who know where to look. The reading experience focuses on contextual connections + human connections. Real-time conversation is available as an overlay while reading. Data-rich interfaces help readers browse multiple versions of an article, and get a sense of persistence, reliability, + interest. For instance, heatmaps for revised / controversial / commented areas; wikiblame for granular provenance; different colors for different sorts of cites; visual cues about how much complementary or conflicting knowledge is available in other articles, files, languages or Projects. __Cultural aspects (& related tools) __ Namespaces include every potentially useful topic: completeness, notability, + copyright uncertainty affect how things are presented, not whether they exist. Similarly, media repositories include all useful material that is legal to host. File uploads are welcome as contributions to the global commons even when they need work. Files are transcoded to free formats where possible. File formats with no free-codec options, or that cannot be thoroughly checked for malware, are stored in their own flexible repository [such as the Internet Archive]: using the same Wikimedia upload interface + metadata, and providing similar wikilinks to reference files from within the Projects. The newcomer experience is simple, flexible, + protected. Contributions from people who "don't know how to do it right" are welcome, and kept separate from the flow of updates from regulars, with their own visibility defaults. Matchmaking tools help newcomers find active work in their area. Blocks, deletions, + warnings happen only for spam / vandalism. Other concerns at worst hide their work from public view, with a friendly review with a peer after the first weeks. A broad group of peers can protect newcomers, for instance by redirecting concerns and complaints about a newcomer to themselves. == It is time to move away from a "single latest revision viewable by all" model, and the conservative policies designed around it. We need a more flexible model embracing multiple working copies, long-lived drafts, and a greater freedom to experiment, collaborate, + create.
Niharika Kohli	Communities Cross-wiki New Users Tools Gadgets Templates Lua	Investing in our communities This position statement captures my thoughts about why and how we should be investing in our communities. There are a lot of ways we can encourage and support them, that we currently don't. Prioritizing to build tools for our communities is a crucial step for long term survival of our projects. It's fairly common knowledge how a lot of our communities suffer from toxicity. It's incredibly hard for newcomers to edit, to stick around and stay engaged in the midst of the existing toxicity in the community. The problem frequently also exists in smaller communities. Just recently, the English wikipedia community has pushed WMF into implementing ACTRIAL and preventing brand new users from being able to create articles on the site. These are signs that all is not well with our communities. If we envision a future with an active, thriving editor community 15 years from now, we've to become more aware of how our communities function and do more to support them than what we do today. The problems also exists on the technical side. Communities without technical resources lose out on gadgets, templates, editing toolbar gadgets and so on. The editors on these wikis are still forced to do a lot of things the hard way. Non-wikipedia projects are probably the worst affected. Quite often our software projects also cater to the bigger projects. Often just wikipedias. I am sure we can't solve everything but I'm sure we can try to help solve at least some of the problems. We can invest in better tools for new users to create articles, to edit and experiment with wikitext markup. We can build a better "on boarding" experience for new users. For example, English wikipedia currently has "Article Creation Wizard"(https://en.wikipedia.org/wiki/Wikipedia:Article_wizard) which is outdated, poorly maintained and very confusing a lot of times. We can think about a more standardized solution which would be useful across wikis. We can also try to showcase user contributions in a better way, to build user engagement. Various wikis have been striving to create and sustain "wikiprojects" since a while with the result that several big wikipedias have come up with their own homegrown solutions for it. These are things the Foundation can help with building and standardize it for all wikis. For the technical problems, there is a big backlog of projects which are long overdue. Global cross-wiki watchlists, Global gadgets, templates, lua modules have been asked for by the community since many many years now. There are a lot more such projects to be found on Phabricator and the wishlist survey. These are projects which can be building blocks in making our communities more sustainable and thriving places. They are big and important enough projects that should make it into the product roadmap of teams outside of Community Tech. Another important thing we should think about is tools. Some tools such as pageviews analysis is one of the most important volunteer-maintained tools out there. What happens when it stops being maintained? When is a tool important enough for the Foundation to start thinking about incorporating that functionality in an extension/core? These are all important discussions to be had.
Melody Kramer	Communication Communities Retention Volunteer Developers	How do we maintain and grow the technical community and ready it for the mission ahead? Maintaining and growing a technical community is difficult, particularly when the majority of that community is contributing their time and code on a volunteer basis. However, we can look at other successful projects for guidance, to see what we can learn and apply to our own movement: Clearly articulating the value for participants. It's important that we articulate what participants will get (socially, professionally, personally) from contributing to our projects, and it's important to socialize that value through feedback loops, communication, and positive reinforcement. One of my favorite projects - the Smithsonian Transcription Project - hired a full-time community manager for their volunteer community. It was her role to pair participants with projects, follow up to ensure things were going well, and intervene if changes needed to be made. Creating feedback loops that reinforce the value for participants. It's not enough to get people in the door; we must continually reinforce why participation is meaningful for both participants and the mission of free and open knowledge. People will have different reasons for participating - some want to build a skilset, others want to contribute to a meaningful project, still others are completely an assignment. The value for all of these participants differs, and the messaging/communication should reflect that. Finding pathways to participants through non-technical means. GitHub does it particularly well. They want to reach students. So they have a space - https://education.github.com/ - aimed at teachers. This is a particularly smart strategy: How do we reach participants where they are, and think about conduits who might identify possible participants? 100WikiCodeDays: The project #100WikiDays is successful because it creates a habit for participants, gives them ample feedback, provides them with community support, and gives them a goal. Are there similar efforts that we could think about re: code contributions? Continually communicate the value: The best open source projects continually communicate to participants and the larger world about what's happening. Someone files their first bug report? Great, maybe they get an email saying 'Here's the next step you can take.' Someone creates a tool for Tool Lab? It's amazing? Send them to the blog for a profile. Let's elevate their work and use it to bring others in.
Giuseppe Lavagetto	Analytics Infrastructure Job Queue Multi-Datacenter Performance Data Center	The future of the MediaWiki infrastructure at the wikimedia foundation. MediaWiki is at the core of the infrastructure that serves all of the Wikimedia projects, and the current setup of MediaWiki in production poses various challenges: from the future of our current runtime (HHVM), to the to ability to serve MediaWiki from multiple DataCenters, to long standing issues as resource usage efficiency and flexibility. Here are some of the things we have to tackle in the future. Transition off of HHVM: Since the HHVM team has made it clear they're parting ways with full PHP compatibility, and that maintaining support for both HHVM and PHP in MediaWiki would be arduous, we need to make plans to move off of HHVM, back to PHP 7.x. This transition, while technically necessary, should not come at a cost for our users: page load times should not degrade. We can proceed by marking responses coming from either engine, collecting metrics and analyzing data. In order to achieve this, we should run the two runtimes in parallel on the same servers (which have plenty of capacity, given no MediaWiki cluster has an utilization over 40%), and we will then be able to programmatically route individual users or a percentage of traffic, or even specific wikis, to one or the other. The deadline for this transition is set for the end of 2018 (EOL of the last compatible version of HHVM), and planning and resources should be allocated to this goal. Multi-Datacenter support: We currently are using our datacenters in a active/passive setting, as far as MediaWiki is concerned. While this is ok in line of principle, this is a huge waste of resources and means we both have 50% of our servers doing nothing at all, and also limits our ability to expand the number of core datacenters we can use. Diverting the read load to secondary datacenters could also allow us to use caching in a less aggressive way when not needed. There is already a program underway to add first-class multi-DC support to MediaWiki, so we can focus on what specifically needs to be done in order to achieve this longtime goal: our final goal should be to serve reader's traffic from all datacenters, and to be able to switch the "master" datacenter in matter of minutes. Elasticity, resource usage efficiency At the moment , our infrastructure is plainly inadequate to react to sudden spikes of non-wiki content production and to changes that generate a lot of asynchrounous jobs, as a change of a popular template. The issues with the current jobqueue are widely known and publicized, but even the current transition to a new model won't solve the starvation of resources that result in a degraded user experience. Moreover, a single editor uploading videos via video2commons can easily overflow our media processing capacity for weeks. This happens because we allocate our resource statically (we have 4 vidoesclaers per datcenter, for example), we have an inefficient resource consumption, and reallocating servers requires time and effort. Modern applications stacks are elastic, meaning the operation of scaling up or down the capacity of a single cluster or functionality can be handled programmatically and/or manually whenever the need occurs, allowing the infrastructure to react to such changes. For economic and privacy/security reasons, Wikimedia doesn't make use of external cloud services, so the only way to achieve such flexibility is to build a serviceable infrastructure that can serve MediaWiki and any other project Wikimedia will support: the effort to do that is underway with the rollout of our Kubernetes-based IaaS in production. I think we should work, sooner than later, at moving the MediaWiki application stack (and maybe its semi-ephemeral caching) to the kubernetes platform. While the advantages of such an approach seem clear, it won't come without costs: specifically habits around code deployment, testing and configuration changes will need to be completely revisited and superseded by new approaches.
Niklas Laxström	Machine Translation Translation	Translation as a way to grow and connect our communities The Wikimedia movement depends a lot on translation, but I believe we are not currently using the full potential of it. This affects us in many ways - most importantly: - language barriers isolate communities - but we all need to work together, - our content is not accessible to every human, - our movement is massively multilingual, but not the forerunner in using translation and other language technology. We should improve our translation tools and leverage machine translation in a sustainable way. Translation should be a core part of our infrastructure and integrate into our projects seamlessly. It will help our communities to grow, as demonstrated by the Content Translation tool. I suggest three focus areas. 1 Find partners to build high quality open-source machine translation Our projects run on free software. Currently, we depend a lot on proprietary data-driven (statistical) machine translation. For translation to be an essential part of our infrastructure, then this is neither sustainable nor acceptable. We already use expert-driven (rule-based) open-source machine translation software, e.g. Apertium, which provides some high quality language pairs. However, the proprietary services cover a lot more language pairs, albeit with lower quality. Building machine translation engines is hard work, therefore we should find partners to pursue both data-driven and expert-driven engines. The impact of this could be big and extend beyond our movement. 2 Bring translation everywhere We already have good translation tools, but we need to move beyond user interface and Wikipedia pages. We should integrate translation tools into our discussion systems to support multilingual discussions as well as to understand discussions in foreign languages. This should be combined with summarizing tools. We have a lot of (structured) content that can be translated but doesn't have a proper tooling for translation, e.g. Wikidata and Commons image description, labels in SVG files. We should adapt and integrate our existing translation tools to support these types of content. We should also make language selection available to all users, including those not logged-in in our multilingual projects, such as Wikidata, to show the translations. 3 Improve our translation tools Our translation tools have serious issues that result in slower translations or not being translated at all. Our translation memory is not working well. It often fails to suggest good matches. This is apparent when translating the Weekly Tech News. Translators' time is wasted when they need to re-translate (introducing inconsistencies) or searching previous translation manually. Without improvement our translation memory is not suitable for use in Content Translation either. When translating documentation pages, announcements, etc. using the Translate extension, a significant amount of extra markup is added to the wikitext. Editors find this markup inconvenient and justifiably resist using this tool. This feature should be improved so that it works with Visual Editor and doesn't require additional mark-up in the wikitext.
Katherine Maher	Architecture Strategy User Experience Alternative Interfaces Knowledge as a Service	This proposal focuses on the "Knowledge as a service" part of the strategic direction. When I look at the core of what we do, to some extent I see a model that we've mastered, and that we're making incremental improvements to. My concern is that, while that model is incredible and powerful as a community, the model for the interface and the delivery mechanism for the product the community creates are changing, and for us to continue what we're doing today may or may not prepare us for what the future actually looks like. I think it also limits our ability to unlock all of the tremendous knowledge, unstructured and structured, that exists within our projects. And I also believe that it limits us to certain forms of knowledge and a certain hierarchy of creation in a way that is very inward-looking. Right now much of our information is sitting, unstructured, in a SQL database, rendered through PHP, read through a rendering engine into a browser to read/write in one interface: the browser. While this is amazing for the world of the browser, we're not going to be a browser-based information world for that much longer, any more than anything else. It's not that the browser is going to go away, the browser will be like books: books haven't gone away, radio hasn't gone away, but there will be a transformation to a new interface, and we need to be ready for it. Perhaps we should actually backfill into those older interfaces that we're not currently part of, because people still use those interfaces, and those interfaces are valuable. Essentially this is about taking the Model-View-Controller paradigm to the next level, and also about extending it to participation and to the "write" part of our read-write system. Even if Alexa is serving Wikimedia content outside the browser, there is no mechanism for contributing trough Alexa. We need to be planning for an architecture of information and architecture of experiences that is independent of the browser. How do you get the most value out of the existing content? How do you serve a snippet to someone who just needs a quick answer? How do you serve different layers of sophistication to 8th-graders versus the college graduate, versus the PhD? Can we engage in the knowledge ecosystem and leverage what we have as a platform, and our traffic distribution and awareness, to actually open up greater resources of knowledge? These are some of the topics I would like to see discussed at the Dev Summit.
Leszek Manicki	JavaScript	Dependency management for JavaScript packages I believe we should be using a dependency management tool for JavaScript libraries in MediaWiki. Currently there is no convenient way to manage JavaScript packages in MediaWiki, even though MediaWiki itself, as well as many extensions widely use them (both "own" JavaScript libraries and third-party libraries). Lack of such a solution in our infrastructure leads to numerous issues. Libraries are duplicated in our code bases, and there is a little control of what version of the package is used by different components. Updating dependencies becomes a complex and error-prone manual process. As each of our components include their dependencies separately, our users might be loading the same package multiple times. Having a dependency management system for JavaScript packages would be beneficial in multiple areas. Developers would be able to easily control and maintain packages their software depends on. Deploying MediaWiki and extensions would become easier and more transparent with regard to JavaScript packages, both for WMF infrastructure and for non-WMF users of our software. With de-duplication of dependencies, users will be served smaller amount of bytes. Finally, once we have a convenient and standardized way of managing JavaScript packages, our software would be more interesting and welcoming for new developers coming from the dynamically growing JavaScript community. We have addressed the similar problem for PHP libraries a while ago when we started to use Composer for PHP dependency management. We have been discussing solutions for JavaScript packages for a couple of years. Possible tools like npm, yarn, or even Composer, have been discussed, but we haven't come up with a plausible solution yet. I hope the summit will be able to collect our requirements, re-evaluate the previous investigations we have made, collect new ideas, and will come up with a solution for JavaScript dependency management. I believe once we have it we will be walking into the future with confidence.
Kunal Mehta	Code Review Volunteer Developers	We have well established that volunteers are the lifeblood of the Wikimedia movement. We prioritize their contributions and work to ensure they are given the tools they need to succeed. But in the Wikimedia development community, we've neglected volunteers instead of nurturing them - and this is a serious problem that we need to rectify. There are a lot of areas where we can improve, but I'm going to focus on just one: improving the volunteer developer's code review experience. While Wikimedia Foundation product teams are building new things, it's usually the volunteers who are keeping critical tools that the community depends upon alive (AbuseFilter, CheckUser, etc.). The MediaWiki codebase has gotten so massive that it's not practical to try and have the Wikimedia Foundation attempt to maintain all of it. It would not be a good use of movement funds either. Instead, I'm proposing that we utilize our volunteer base and ensure they are the valued and respected members of the Wikimedia development community. I think we can do it in three steps: first, set reasonable standards for code and the review process, second, prioritize code review of patches coming from volunteers, and finally empower volunteers to be maintainers and owners of code and create a sustainable community. 1. Set Reasonable Standards for Code and the Review Process The status quo is that depending on who reviews your code, you will have a wildly different experience. Some will mandate that principles like dependency injection are followed or others will require 100% test coverage. And others might not care for any of that and just ensure the code does what it is supposed to before merging. But the people who face the worst of it are volunteers - WMF staff will have consistent reviewers through teammates who already communicated standards for merging code. So we need reasonable standards for code we accept, and use those throughout the review process. As an example of 'reasonable', if someone is trying to fix a bug in legacy code that is difficult to test, it would be unreasonable to mandate a test case before merging the fix. 2. Prioritize code review of patches coming from volunteers Our current process of reviewing volunteers' patches after finishing code review for teammates isn't working - we have a giant pile of unreviewed patches. When you start your day and look through your list of reviews, pick one or two patches from a volunteer and review them first. Most likely it'll take minimal time, but for previously-neglected volunteers, it will make a big difference. 3. Empower volunteers to be maintainers and owners of code Some of our volunteers have been around for quite a while and are well trusted. Let's give them +2 rights! There's nothing that makes you feel better than getting an email from someone telling you that your contributions are valued and they'd like to nominate you for +2 access (exactly how I got hooked). And quite a few years I'm still around, so it must have worked.
James Montalvo	Installation Third Parties	MediaWiki has evolved away from easy installation. Yes, there is still the web-based installer, but it only gives you the most basic version of MediaWiki and the extensions that provide the best features are increasingly more difficult to install. I installed MediaWiki for the first time six years ago. Since then I've become and active developer and system admin, and despite that experience I still find things like RESTBase difficult to install. The barrier to entry for a newbie to set up a fully-functional MediaWiki (e.g. with all the bells and whistles like Wikipedia) is huge. This should not be the case. It should be easy for a newbie to set up a MediaWiki installation with Visual Editor, Cirrussearch, etc, without first gaining years of experience.
Matanya Moses	Communities Open Source	Standing on the shoulders of giants Mediawiki is built on the basis of many other open source tools, libraries, packages and other software types. Our ability to write, run and use Mediawiki depends on their availability, support of the upstream and maintainability. As a few examples, debian, the OS WMF is running, PHP, or Elasticsearch, our search back end. In the light of recent discussions of migration from HHVM, to zend php as our runtime, I would to raise the discussion point of what is our position in the open source world of the underlying parts of our stack. Wether we choose the be just a user of what upstream produces, or we want to actively influence the decisions made while writing the software. In order to be able to influence the decisions made while writing the software as the known phrase says: "decisions are made by those who show up" we will need to show up in those communities, but an active part in them and contribute, in the exact same manner we hope third party mediawiki re-users will contribute, discuss, send patches and show up. If we are to choose this path, it has resources implication, Time, money and dedication to involvement in other communities. For instance, having sponsoring a php developer working on our needs upstream for instance might be a good investment but might be a waste as whole. I would like to have an open discussion about this approach, whether it is desired, feasible, and worth the effort. I think it might affect where our tech stack will be in the years to come and has a significant statement towards the outside open source ecosystem. Thank you
Birgit Müller	Code Review Communities Documentation Open Source Refactoring Tools Collaboration Gadgets	Refactoring the Open: First steps to get ready for the next level Wikimedia's technical environment has grown into a very complex system throughout the past 15 years. Measured in internet years, parts of the software are ancient. When implementing a new feature, refactoring of a piece of the (extended) MediaWiki software is often required first. Following this principle of a.) refactoring and b.) implementation of something new, I suggest to start the discussion of the future technology direction by reflecting (and possibly: refactoring) the current Open Source practices and processes within the Wikimedia context. A mono perspective won't let us survive (and is less fun, too) When we talk about "Open Source" within Wikimedia we're not only talking about free licenses and open code repositories. We're talking about global collaboration and the technical contributions of many: Through this, we ensure that the Wikimedia projects stay alive and evolve, that we constantly develop new ideas, that multiple and diverse perspectives shape the development of our infrastructure and tools. We are great in having ideas, and we are good in trying things out. But we still partly fail at prioritising the problems we know we have and address them accordingly. I believe that we should better maintain the Technical Community and find ways to grow by allocating stable code review resources from paid staff for volunteer and 3rd party developers improving the documentation of the code base providing a single entry point that is easy accessible for interested developers building up partnerships with Open Source communities we might share interests in the future with (for example, communities around audio, video or translation technologies) constantly take diverse perspectives into account by finding better ways to gather and address feedback from smaller language communities and non-Wikipedia sites being less Wikipedia-centric when it comes to research: Not yet existing or emerging communities might not be interested in creating articles, but in contributing data or multimedia content or in building tools to reuse data and multimedia content build more bridges across local wikis and increase knowledge of local requirements by fostering cross-wiki exchange (example activity: template Hackathon) increasing the knowledge of the requirements that come along with different languages (example activity: multilingual support conference) Open Source doesn't mean anything is possible - does it? We have established processes and regulations for contributions to MediaWiki itself. But we lack processes and practices for local developments to ensure both, the freedom and space to experiment for the Technical Community and the stability and reliability of tools for users. I believe that we should e.g. raise priority for implementing a code review process for JS/CSS pages on Wikimedia sites start thinking about a technical sysop user right make it clear which user scripts/gadgets/tools are maintained, which are stable and which are proofs of concept or prototypes (for example: provide a (central) 'store' of maintained gadgets/tools with different levels: "stable version", "experimental version" ...) Let's start refactoring.
Marko Obrovac	Architecture Microservices Refactoring Technical Debt	All of the Wikimedia projects have, in technical terms, MediaWiki - the software - at its core. Thanks to this fact, MediaWiki has become a widely-deployed system, drawing many volunteer developers. Alas, there is a disparity in scale between the WMF-run install and other, external set-ups, which hinders the speed with which the platform supporting the Wikimedia projects. On the other hand, architecting microservices has proven as a good way of achieving scalability, increasing developer productivity, improving maintenance and reducing technical debt. Gradually moving towards 'de-monolithising' our core infrastructure will enable developers (both WMF staff as well as volunteers) to start working on all sorts of interesting features, ranging from simple add-ons to full-blown companion sub-systems. While this transition is (arguably) already happening, everything still gravitates around MediaWiki - the software. Instead of focusing our efforts on compatibility in scale (e.g. one JobQueue system for WMF, another for external installs), we should focus on the products and features that allow the projects to grow, both in terms of number of projects and features they offer, as well as in the number (and diversity) of their users. Microservices can greatly help in achieving this goal, since all installs can select the components they want to run based on the available resources at their disposal and their potential reach or scale. Much like the advent of extensions enabled various parties to complete their systems with sought functionality, microservices can refocus our technical community to think about features and components without worrying about scaling them (up and/or down). If we want our developers to assist the Wikimedia projects and their communities, we need to bring our core infrastructure to the 21st century. Let's not leave the technology behind - it is central to the success of the communities we are trying to enable.
Guillaume Paumier	Alternative Interfaces Knowledge as a Service Knowledge Equity New Users	The strategic direction that has emerged has two components: "Knowledge as a service" and "Knowledge equity". "Knowledge as a service", which focuses on infrastructure, seems like the one most related to technology, This proposal is about exploring the less obvious intricacies between the two components, and in particular the technology implications of Knowledge Equity. As a complex socio-technical system, it's not really possible to separate people from technology when talking about Wikimedia. A direction of Knowledge Equity invites the contributors of the Wikimedia movement to take a critical look at themselves and assess their biases and privileges. This, in turn, can help identify structural biases that have been reproduced and ingrained in our technical platform. For example, MediaWiki is currently doing a great job at providing a localized interface in many languages. However, beyond language, interaction design and UX patterns seem very specific to Western culture. Similarly, when our strategic direction talks about building strong and diverse communities, this invites us to consider whether the current tools available to contributors enable them to provide an environment where newcomers can experiment, be mentored, and fail safely. Beyond software, little effort has been invested in exploring alternative interfaces beyond the connected browser. Our primary interface for contribution (the web site) may work well for middle-class contributors from Europe and North America, but isn't necessarily what enables people from other backgrounds or geographies from contributing. These are some of the topics I would like to bring up for discussion at the Developer Summit.
Thomas Pellissier Tanon	Analytics API Multimedia Structured Data User Experience Wikidata Collaboration JavaScript Wikibase Mobile Lua	Title: Content structuration and metadata are keys to fulfil our strategy Content: The Wikimedia mouvement strategy is making a focus on serving more different kinds of knowledge and sharing them with allies and partners [1]. I believe that the most important ground work for reaching these goals is to focus on the outgoing project of moving MediaWiki from a "wikitext plus media file" collaboration system to a platform allowing people to be able to collaborate on many kind of contents and to organise them in a cohesive way. Two axes seem important to me to pursue this goal: 1. Support a broader set of different contents, not just wikitext, Wikibase items and Lua/JavaScript/CSS contents but also images, sounds, movies, books, etc., that should bee editable just like wiki text pages in order to allow people to improve them in a collaborative way. 2. Build platforms and tools allowing contributors to create and clean metadata about these contents in order to build together the broadest cohesive set of knowledge ever available and increase its reusability. Going in these directions would allow us to: Allow sister projects (and possible new ones) to use relevant content structure for their projects instead of a designed-for-everything wiki markup. It should lead to an increase of their reusability and their user-friendliness, just like what the Structured Metadata on Commons project is aiming for. Build powerful APIs to retrieve and edit content just like Wikidata has and so, make working with partners easier. Increase the connections between our contents and their discovery using their metadata Build better tasked-focused mobile viewing and edit interfaces Be more ready for the possible new environment changes like voice-powered interfaces Some examples of projects we could work on in order to move in this direction: Use the multiple content revision facility to migrate progressively the data that could be structured out of Wikitext on all our projects (like the structured metadata on Commons project is aiming for files) Federate all our structured content into a "Wikimedia Query Service" that would allow to do unified and powerful analytics and to ease the discoverability of all our contents The logical granularity of Wikisource, Wikibooks and Wikiversity contents (and maybe other projects?) is not the wiki page but the set of wiki pages storing a book or a course. MediaWiki should be able to support such use cases by providing a "collection" system allowing us to add metadata and to do operations (renaming, add to watchlist) on sets of wiki pages. Switch projects that stores fairly structured data in wiki text templates (like Wiktionary or Wikispecies) from a Wikitext storage to a structured one. Build on top of them user interfaces to edit their local contents (and maybe also the relevant data from Wikidata) and provide nice displays and APIs to make humans and machine both able to retrieve these contents. ... [1] https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction#We_will_advance_our_world_through_knowledge.
Lydia Pintscher	Collaboration Wikidata	Breaking Down Barriers To Cross-Project Collaboration "... a world in which every single human being can freely share in the sum of all knowledge." Wikipedia has been our flagship for many years now and our main means of achieving our vision. As information consumption and expectations of our readers are changing, Wikipedia needs to adapt. One crucial building block for this adaption is re-using and integrating more content from the other Wikimedia projects and other language versions of Wikipedia. Connecting our projects more is vital for helping especially our smaller communities serve their readers. Surfacing more content from the other Wikimedia projects also gives them a chance to shine, find their audience and do their part in sharing in the sum of all knowledge. This integration comes with a lot of challenges. Over many years the Wikipedias have lived largely independent from each other and the other Wikimedia projects. This is changing. Sharing and benefiting for example from data on Wikidata means collaborating with people from potentially very different projects, speaking different languages. It brings a perceived loss of local control. Editors see them-self first and foremost as editors of "their" Wikipedia at the moment and often don't perceive this integration as worth the effort - especially on the larger projects. We need to address this on both the social and technical level if we want to bring our projects closer together and have them benefit from each other's strength and compensate their weaknesses. We need to think about and find answers to these questions: What can we do in order to bring our projects together more closely? How can we help break down perceived and real barriers for cross-project work? What can we do to make cross-project collaboration easier?
Sam Reed	Documentation Security Testing	Security is important. Although Wikimedia/MediaWiki has a generally good track record, we should always be striving for better, ideally, in an automated fashion via testing of the code, and providing a decent and easy to use framework inside MediaWiki to allow people to do this without causing them excess work or effort. In some cases, we can do this through documentation; we have some options that allow for complex things to be done, but it's not very clear to developers that what they are doing may lead to security issues down the road. In the same way we run phpcs, and are moving towards running phan, it would be very nice to improve our automated testing with a security focus. Helping point people to potential pain points in the future, or just general bad practices now. As Tim Starling said a few years ago, people shouldn't be doing CR for code style etc. It doesn't make sense, it's a waste of time. This should be automated away as much as possible. Which is happening, slowly improving the MediaWiki codebase. How can we use this basic idea, and improve the MediaWiki software and it's extensions for security best practices?
Keerthana S	Contributors Machine Learning	Breaking the ice and catering to the could be Student Wikipedia contributors Most of the valuable contributors especially in technically advanced articles comes from people in academia. So my paper is going to discuss why it can be valuable to expose University students about contributing to Wikipedia and give enough guidance for them to stick around, existing infrastructure that helps this cause and some points on how this can improve. As the infrastructure of mediawiki evolves and becomes a platform where beginners to Open Source find it easy to contribute to the project with a really well documented code base, a friendly community and the many outreach programs we should also think about introducing to the University Students about contributing to WIkipedia. Wikipedia serves as an invaluable tool for students worldwide helping them to assimilate their course content. They write term papers as a part of their course work so it only makes sense that giving an awareness to students about contributing wikipedia and giving them guidance can be a source of reliable and high quality contributions to Wikipedia. Existing Infrastructure WikiEduDashboard which is a project of the WikiEducation Foundation caters to universities where students are required to contribute to Wikipedia articles as part of their course assessment and provides tools for the instructors to guide the students in it. Machine Learning tools for Guidance There are existing automatic mechanisms in wikipedia to find out plagiarism/promotional content or any form of spam in the edits. Automatically rating wikipedia articles is to an extent achieved by the Scoring Platform Team. This is being utilised by many bots in wikipedia to spot potential vandalism. This score prediction tool can also be used to give some immediate feedback to the newbie editors in a more friendly manner and point out to the faux paus in their edits.
Amir Sarabadani	Machine Learning	Machine learning and scaling the knowledge There are lots of ways to contribute to Wikimedia movement and all are tedious and time-consuming, and sometimes frustrating. Let's use the classic example of fighting vandalism, Wikimedians are frustrated by flow of vandalism but at the same time they enjoy fighting it. By using scalable machine learning platform for Wikis we are moving towards giving more power to our users without making them tired of the work needed. To come back to our example, ORES is filtering out most good edits, leaving the rest to the community to take care of (so they enjoy the gratification of doing the job) at the same time handling the huge backlog of edits needing review. But if we use a bot to revert bad edits, it damages the motivation of the users. We need to continue moving at this direction in other areas like creating articles, improving quality, categorization, and so on.
Subramanya Sastry	Knowledge as a Service Structural Semantics Templates API Schema.org Wikidata	PROBLEM To satisfy the 'Knowledge as a service' theme, in addition to providing access to full page content, Wikimedia APIs should provide access to semantic units at: - an abstract document level (sections, headings, tables, etc.) - a domain specific level (infoboxes, geolocation, taxoboxes, etc.) Wikitext, the core content creation technology on wikis, evolved as a string-processing language where one set of strings is replaced with another set of strings mostly via regular expression matches to yield the output HTML string. There is no notion of document structures here. This lack of structural semantics gets in the way of being able to robustly identify semantic units and developing tools and features that operate on a page structurally at sub-page granularities. SOLUTION: TRANSPARENT TYPING LAYER OVER WIKITEXT Types improve abstraction, reasoning, and tooling abilities in programming languages. A transparent typing layer on top of wikitext can provide similar benefits. A: ENFORCE STRUCTURAL TYPES ON OUTPUT OF WIKITEXT CONSTRUCTS INCLUDING TEMPLATES AND EXTENSIONS - Specify that all wikitext constructs have an output with type: String, DOM, CSS property, HTML-attribute, or a List of one of those - Extensions and templates can specify the expected output type. All other core wikitext constructs have the DOM output type. - Parser enforces the output type of all wikitext constructs. Examples: For DOM types, unclosed tags and misnested tags are fixed up. For String types, HTML tags are escaped, wikitext strings are nowikied. For CSS types, values are sanitized. Among other benefits, this basic typing mechanism enables MediaWiki to provide an API to extract and edit document fragments without introducing adverse side-effects on the rest of the page. B: UNIFIED TYPING MECHANISM TO EXPOSE DOMAIN-SPECIFIC SEMANTIC INFORMATION Editors impose structure in documents through a rich library of templates, policies, and maintainance processes they have developed over the years. If this semantic information (infoboxes, navboxes, sports rankings, railway timetables, etc) is mapped to a centralized ontology system (wikidata, schema.org, something else), the parser can expose this information in HTML and via MediaWiki APIs can expose this information in a wiki-neutral way. There are multiple disparate mechanisms today wherein template authors specify metadata about templates (templatadata, templatestyles, possibly others?) Instead of creating newer mechanisms for specifying structural output types and semantic information types for templates, it is better to provide a consolidated mechanism that unifies all this template metadata into a single user-defined type declaration. This lets newer applications and capabilities to be developed in the future without code changes to the core mechanism. FEASIBILITY This typing layer only affects template authors. Editors that use source editing won't see any impact (besides fewer markup errors). Editors that use visual editing might see improved tooling. Even for template authors, this is meant to be an opt-in mechanism with gradual migration over to the new model. The proposal here is a logical extension of what Parsoid does today. Parsoid provides an illusion of structured wikitext and demonstrates what is possible (VE, CX, Linter, Flow among others) by embracing structured semantics.
Moriel Schottlender	Accessibility Languages Open Source Tools	The Wikimedia Foundation is a leader in many fields, but none as so obvious and otherwise so underserved anywhere else than that of language and accessibility. We are not just the fifth biggest site online, or one of the biggest open source endeavors available, we are the de facto leaders of technology that other commercial companies consider 'edge case' and 'less profitable'. This gives us an advantage of developing tools that don't just help our own audience, but could - and should - serve as a repository for allowing everyone online to reach, support, and embrace these audiences with minimal effort. We have many of the tools available already, for our own users and products, but they are still limited when it comes to sharing and using them outside the movement. And why? Developing our tools to be accessible to outside projects - and to cloud tools, to bots and to other Open Source organizations - is a doable task that is not just worthy in general, it also follows our mission. What better way to empower 'every single human being [to] freely share in the sum of all knowledge' than to share our own powerful tools with others to allow everyone to prioritize support for language, accessibility and right-to-left technologies and push these relevant technology forward? I suggest we look across our technologies and libraries - from OOjs UI to CSSJanus, ResourceLoader to wfMessage(), and many others - and work to better generalize these to serve our own users better in their projects, bots, and cloud tools - and to place ourselves firmly and officially as the leaders of this technology that we already are. Now that my position is known, my direction is unknowable - Heisenberg Uncertainty Principle. So let's break reality, and figure out both.
Moritz Schubotz	Code Review Testing Volunteer Developers Artificial Intelligence	Title Developing software in a wiki way Background Over the last 15 years, MediaWiki evolved from a simple PHP script to a complex and highly integrated family of products and services, serving knowledge to billions of humans. Every change might cause an instability or a complete failure of the system. Thus, measures including code review, automated unit testing and code/ product ownership, have been established to guarantee the stability of the software. The drawback of this approach is that improving the software became very challenging for volunteer contributors. This proposal seeks to lower the barriers for volunteer contributors while maintaining the stability of the system. Advice (1) Reduce the effort of code review by applying Artificial Intelligence methods. Thus, reviewers can focus on non-formal comments. (2) Develop a dialog platform that ensures that volunteer contributors are aware of the next steps and the roadmap for their change on the way to production. (3) Establish a team that supports volunteer developers, who want to make a difference that is not listed in the annual plan by providing temporary code or product ownership. (4) Improve testing and evaluation to measure the effect of every single change and to identify code or even whole services that are no longer neccary and can be switched off.
Max Semenik	Technical Debt	I'm highly interested in having a deep discussion about our technical debt. I've been active in operating on this front myself and I really care how we fare in this aspect, however I see a lack of consensus on tech debt from the development community at large. We deprecate things and then continue using them. People get irritated when their extensions break due to slightest core changes, even when the extensions themselves are misusing the core interfaces. We can't really run lots of types of static analysis against our code base because the sheer amount of problems detected would make the signal to noise ratio unacceptable. Developing skins for MediaWiki is incredibly painful. Our tests access database a lot, as a result they're slow and fragile. These are just a few examples of pain points haunting our code base and extracting their daily toll from everyone working on it. I would like to have Tech Debt SIG work in person on addressing these issues. We should define code quality metrics, identify problem areas and create some actionables to address them. We should also discuss approaches to handling this without causing too much discontent from broader developer community. I believe this would be an important step towards making MediaWiki a better ecosystem and improve our development pace.
... further results