Property:Statement
From devsummit
A
'One World, One Wiki!' Instead of today's many siloed wikis, separated by language and project, our goal should be to re-establish a unified community of collaborators. We will still respect language and cultural differences - there will still be English, German, Hebrew, Arabic, etc. Wikipedias; they will disagree at times - but instead of separate domains, we'll embrace a single user experience with integrated navigation between projects and languages and the possibility of split screen views aligning related content. On a single page we can work on articles in different languages, or simultaneously edit textbook content and encyclopedia articles. Via machine translation we can facilitate conversations and collaborations spanning languages and projects, without forcing a single culture or perspective.
Machine translation plays a key role in removing these barriers and enabling new content and collaborators. We should invest in our own engineers and infrastructure supporting machine translation, especially between minority languages and script variants. Our editing community will continually improve our training data and translation engines, both by explicitly authoring parallel texts (as with the Content Translation Tool) and by micro-contributions such as clicking yes/no on a proposed translation or pair of parallel texts ('bandit learning'). Using 'zero-shot translation' models, our training data from 'big' wikis can improve the translation of 'small' wikis. Every contribution further improves the ability of our tools to make additional articles from other languages available.
A translation suggestion tool will suggest an edit in one language whenever an edit is made to a parallel text in another language. The correspondences can be manually created (for example, via the Content Translation Tool), but our translation engine can also automatically search for and score potential new correspondences, or prune old entries when the translation has drifted. Again, each new correspondence trains the engine and improves its ability to suggest further correspondences and edits.
Red-links and stubs are replaced with article text from one of the user's preferred fallback languages, perhaps split-screened with a machine translation into the user's primary language. This will keep 'small' language wikis sticky, and prevent readers from getting into the habit of searching in a 'big' language first.
We should build clusters specifically for training translation (and other) deep learning models. As a supplement to our relationships with statistical translation tools Moses and Apertium, we should partner with the OpenNMT project for modern neural machine translation research. We should investigate whether machine translation can replace LanguageConverter, our script conversion tool; conversely, our editing fluency in ANY language pair should approach what LanguageConverter provides for its supported languages.
By embracing unity between projects and erasing barriers between languages, we encourage the flow of diverse content from minority languages around the world into all of our wikis, as well as improving the availability of all of our content into indigenous languages. Language tools route around cultural or governmental censorship: by putting parallel texts and translations in the forefront of our UX we expose our differences and challenge preconceptions, learning from each other.
Our strategic goals include scaling our communities to a truly global level, and expanding our understanding of human knowledge. To do this, in my opinion, we need to have a much better understanding of our communities' actual work. We have tens of thousands of people doing millions of hours of work every month, and nobody knows exactly what is being done, what the definition of "done" is, and how fast or slow the progress is. We are the leaders of the free knowledge movement, and we are mostly blind except for some big picture notions like pageviews and edits. It is my opinion that we need to develop a good understanding of the work being done on the wikis. Very capable people have already spent lots of time trying to do this, but I believe we have largely failed because of technical limitations. This is a big data and big compute problem, and we have not yet approached it as such. A close collaboration between our communities, Analytics, Research, and Audiences teams is needed, as well as the power of the WMF Hadoop cluster. I have had sessions on this topic already, and am excited to finish planning and transition to actual work. There are some very valuable implications of taking on and finishing this work. Most importantly, we will all be able to more objectively talk about frustrations in the community over changes that cause "more work". For example, when we launched Visual Editor there was huge backlash about the amount of work this change implied for our community. But because this was largely based on subjective opinions, emotions got involved and it took years to calm the negative effect of those emotions. This effort would also give us, for the first time, a way to celebrate these millions of hours of work. People could see, share, and take pride in their part of building human knowledge (if they wanted to, privacy is of course one of our top priorities).
I am also interested in expanding our Open Source efforts, and examining changes that we can make to spur more collaboration. My reading of the strategic goals for 2030 is that the WMF will not have enough resources to execute by itself. That's where collaboration will be crucial, and where problems like in-house developed libraries without true Open Source presence will slow us down. We let documentation and third-party user support lag behind because we're busy with other stuff, and that's arguably fine for our scale so far. But this approach will not allow us to grow the way our Strategy is defined.
B
The Mission
In a Wikimedia cultural orientation, the moderator instructed the class by explaining that 'technology is not part of our mission, technology is only a means to an end.' While it may be appropriate to use digital technology in order to disseminate free educational content effectively and globally, the existence of Wikimedia Technology is not strictly part of Wikimedia's mission. Wikimedia is therefore stuck in a precarious position of maintaining a large open source software project only as a means to an end. This produces a double-minded mentality within the movement: satisfying the mission versus satiating a massive software operation. It is with this understanding that Wikimedia ought to consider partnering with an existing open source community in order to evolve MediaWiki to support the mission, maintain and grow the technical community, and build technologies necessary for embracing mobility.
While Wikimedia's mission statement does not cover technology, the mission (https://www.drupal.org/about/mission-and-principles) of Drupal 'is to build the best open source content management framework.' In addition, Drupal is more than capable of handling all of Wikimedia's traffic needs and is flexible and modular enough to allow us to implement all of MediaWiki's features in a UI and API backwards-compatible way. In fact, every feature of Drupal core is an extension and every extension is a first-class citizen that has full control over every aspect of Drupal. Perhaps the next major version of MediaWiki ought to be a collection of Drupal extensions that can be run independently and are also available in a pre-configured 'MediaWiki' distribution of Drupal (https://www.drupal.org/project/project_distribution).
The primary user of MediaWiki is, by far, Wikimedia. While the software can be run by others outside of Wikimedia, it's usage outside of Wikimedia is extremely low (https://trends.builtwith.com/cms/MediaWiki). Because of the low adoption rate, it is difficult to gain any users outside of Wikimedia. As a result, there are very few developers outside of the movement that contribute towards its development and almost no outside financial commitment to the project. In contrast, Drupal's usage within the top 1,000 websites is 14 times (https://trends.builtwith.com/cms/Drupal) that of MediaWiki. 'The Drupal community is one of the largest (https://www.drupal.org/about/mission-and-principles) open source communities in the world.' By building MediaWiki on top of Drupal, Wikimedia would be tapping into a user and developer base that is substantially larger than MediaWiki's. Wikimedia would be assisting Drupal in their mission and Drupal would be helping Wikimedia in theirs.
Drupal is committed (https://dri.es/drupal-is-api-first-not-api-only) to an API-first strategy. This strategy has enabled Drupal to expose all of its resources in a consumable, highly-cacheable API. They believe strongly in this strategy, because it's part of their mission, and in doing so, helps others like Wikimedia achieve the mission on a global scale. By embracing the API-first strategy, Wikimedia would propel its mobile development into the future.
To further Wikimedia's mission, the foundation should consider using Drupal as the foundation of its software. Doing so would facilitate evolving MediaWiki to support the mission, maintaining and growing the technical community, and building technologies necessary for embracing mobility.
== Structure Most Things with Schema.org ==
The future of digital information will likely be brokered by major platform providers such as Google, Apple, Amazon, Microsoft, and international equivalents and social networks. We're thankful they extend our reach, even as we seek to help consumers on the platforms join our movement.
We could help platform providers, their users, and our users solve problems better through adoption of the open standard Schema.org into Wikipedia pages mapped with templates and, ideally, federated and synchronized Wikidata properties.
Benefits:
# Wikipedia will have even better presentation and placement in search engines and other data rich experiences.
# We provide an opportunity for a more consistent data model for template authors and people/bots filling template values. And the richly defined Schema.org entities provide a good target to reach on all entities represented in the Wikipedia/Wikimedia corpora. Standardization can reduce duplication of effort and inconsistencies.
# We introduce an easier vector for mobile contribution, which could include simpler and different data entry, mapping, and modeling.
# We can elevate an open standard and push its adoption forward while increasing the movement's standing in the open standards community.
# Schema.org compliant data is more easily amenable to machine learning models that cover data structures, the relations between entities, and the dynamics of sociotechnical systems. This could bolster practical applications like vandalism detection, coverage analysis, and much more.
# This might provide a means for the education sector to educate students about knowledge creation, and data modeling, and more. It might also afford scientists and other practitioners a further standardized way to model the knowledge in their fields.
What would it take? And can this be done in harmony with the existing {{Template}} system?
This session will discuss the following:
# Are we aligned on the benefits, and which ones?
# Implementation options.
## Can we extend templates so they could be mapped to Schema.org?
### Would it be okay to derive the mapping by manual and automated analysis at WMF/WMDE and apply it behind the scenes? Would that be sustainable?
### Could we make it easy for template authors to mark up their templates for Schema.org compatibility and have some level of enforcement? Could Schema.org attributes and entity types be autosuggested for template creators?
## Is it easy to relate the most existing and proposed Wikidata entity types and properties to existing Schema.org entities and properties?
## What would it take to streamline MCR Schema.org data structures or MCR Wikibase property clusters mapped to Schema.org on defined entity types?
## Furthermore, if we can do #1 and #2, what's to prevent us from letting templates as is merely be the interface for Schema.org compliant Wikibase entities and properties (e.g., by duck typing / autosynthesis)?
## How could we bidirectionally synchronized between Wikipedia and Wikidata with confidence in a way compatible with patroller expectations? And what storage and event processing would be needed? Can the systems be scaled in a way to accommodate arrival of real-time and increasingly fine grained information?
Title:
Empowering Editors with Machine Learning
Background:
Advances in machine learning, powered by open source libraries, is becoming the foundational backbone of technology organizations the world over. Many tedious, time consuming, tasks that previously required 100% human involvement can now be augmented with human in the loop machine learning to empower editors to get more done with the limited time they have available to contribute to the sum of all human knowledge.
Advice:
1) Invest directly in applying known quantity machine learning, such as pre-trained ImageNet classifiers, to add structured data to our multimedia repositories to increase their discoverability. Perhaps via tools that provide editors with lists of appropriate items that they can easily click to add if appropriate to the multimedia.
2) Engage academia to work with Wikimedia data sets and employ developers to move the most promising results from research into production. There is already a significant amount of work being done in academia to test and evaluate machine learning with our data sets, but little to none of that work ever makes it back into Wikimedia sites. With more focus on collaboration we can encourage research that is specifically applicable to deployment goals.
3)Wikimedia has the ability to collect significant amounts of implicit user data via browsing sessions, searches, watchlists, editing histories, etc. that can be used for machine learning purposes. We need to be continuously thoughtful of the privacy implications of how we use this data. +
Title
Frameworks to connect infrastructure to the mission
Thesis
We will better achieve social impact, succeed in our strategy, and fulfill our mission when the Foundation uses non profit programmatic frameworks when prioritizing and planning improvements to MediaWiki and other technologies.
Proposal
Impact is an intangible, abstract social benefit and it can be difficult to consider how changes we make in MediaWiki will help or harm it. To illustrate the connections between infrastructure choices and impact and to incorporate those connections into our plans, we can use programmatic frameworks developed in the nonprofit professional communities. Frameworks used by these nonprofit communities for various types of programs and impact can explicitly and concretely link our engineering choices to the movement strategy and the social benefit we create. This increased attention to the social impact of our technical decisions and investments will in turn create increased investment from our communities, partners and potential allies beyond our community towards fulfilling our mission.
WMF programs such as New Readers and Structured Data on Commons, and Wikimedia community programs, such as Wiki Loves Monuments, model how building technology for well-defined social impact can structure our engineering and infrastructure choices towards more strategic and mission driven impact.
These programmatic frameworks can be helpful during annual planning, quarterly check ins, and throughout the process of deciding on, planning, implementing, and evaluating technological changes. We would be able to weigh and design intentionally for broad-end users while also supporting the targeted and specific organizing communities who use our technology towards our desired social impact. We could expand the impact that we achieve by consulting expert communities, such as educators, librarians, and activists, who will design additional social-impact programs and processes on top of those tools. We could also identify parts of our communities which already create desired impacts, and build technologies and technological services which increase the scale, effectiveness and efficiency of organizing contributors to fulfill our mission. Socio-technological decisions in our movement can be most successfully achieved when considering both social and technological benefits.
C
Embracing real-time collaboration
Real-time collaboration (like Google Docs and Etherpad) has many benefits but also imposes certain workflow requirements. There is prototype code that can enable real-time collaboration within VisualEditor. But rolling out collaborative editing requires more than technical work. It will require a coordinated effort to re-imagine what editing is like. We will need mechanisms to create user groups, real time chat mechanisms, mechanisms to temporarily persist collaborative sessions, perhaps even new core mechanisms for describing revisions. We also need to think about social mechanisms and preventing harassment and vandalism of collaborative sessions. In exchange, we will gain improved mechanisms for mentoring, translating long articles, reporting on current events, and assisting non-native speakers.
We should embrace this opportunity to reimagine our platform, starting by organizing a number of trials to gain insights into whether, or how best, a real-time collaborative editing option would benefit our projects.
Sessions in previous Wikimanias / Hackathons / Developer summits identified potential uses, including for:
- Mentoring
- Translating long articles
- Current events
- Assisting non-native speakers
Potential issues identified include:
- How to log authorship
- Who decides when to publish
- Preventing in-session abuse
- Coexisting with non-real-time editors
References
https://wikimania2017.wikimedia.org/wiki/Submissions/Waiting_for_Real-Time_Collaboration
(Wikimania 2017 panel discussion on real-time collaboration)
https://phabricator.wikimedia.org/T165941
(Wikimedia Hackathon 2017 showcase including live demo of VisualEditor real-time collaboration) +
How and with whom should we partner to create the technologies needed to support the mission?
A substantial, growing community of MediaWiki users and developers outside the Wikimedia movement has evolved, creating wikis that vary in size, number of editors, number of readers, access restrictions, and activity. The Wikimedia movement benefits from this third party MediaWiki developer and user community's technology contributions and innovation. Similarly, this community benefits from the Wikimedia movement's stewardship of MediaWiki as the foundational technology in support of Wikipedia and its sister projects. There are many areas in which the needs of these two groups are identical, including stable, well-performing software that supports community authoring. Partnering with the third party MediaWiki community will result in a platform that is better for all parties.
How should MediaWiki evolve to support the mission?
There is much knowledge in the world that cannot find its place within Wikipedia or its sister projects. MediaWiki is powerful software crafted especially to support the expression of all knowledge. Third party MediaWiki wikis can provide a home for knowledge that does not belong in Wikipedia, supporting the mission of sharing in the sum of all knowledge. In order for the third party MediaWiki community to continue to thrive and to grow, several impediments to MediaWiki adoption that especially affect that community must be addressed:
- Installing and maintaining all but the smallest and most basic MediaWiki installation currently requires a high level of craftsmanship and expertise.
- While a large number of novel MediaWiki extensions exist to support third party applications, it is difficult to ascertain the level of maturity and support of these extensions.
- Some enterprise consumers require a guaranteed level of support and/or service level agreements before adopting a technology.
- The barrier to entry for those wishing to experiment with MediaWiki in production quality environments is high.
How do we maintain and grow the technical community and ready it for the mission ahead?
The third party MediaWiki community already significantly contributes to the code base. In the last two years, 22% of the commits to MediaWiki core were made by third party contributors, and 62% of the authors of commits to MediaWiki core were third party contributors. Even more striking, in the last two years, 40% of the commits to MediaWiki extensions hosted on gerrit were made by third party contributors, and 67% of the authors of the commits were third party contributors. The third party MediaWiki community is a significant training ground for skilled MediaWiki developers used to tackling a diverse set of challenging problems who sit poised to help forge the path ahead for MediaWiki.
D
I believe that we need to achieve a better separation of concerns - in code as well as in work on product and our communication with the communities to reduce the dept we build up in these areas. Therefore, I want to suggest three interrelated topics:
* Use of modern MVVM frameworks for our front end code, to develop more efficiently
* Provision of a modern customization infrastructure, to decouple gadgets from our code
* Participation beyond code and feature wishes
# Use of modern MVVM frameworks for our front end code
Traditionally, Mediawiki has been focused on PHP. Over the last years, more and more interactivity via JavaScript has been used. In connection with the significant growth of the JavaScript ecosystem this could have meant quicker development and a clearer separation of concerns. However, since our solutions are mostly used in MediaWiki context, involving external developers has been difficult and so has been onboarding developers in core teams.
I see a large opportunity in introducing modern MVVM libraries that are open source and not constrained to use in Wikimedia software and could build upon other's experiences as well as documentation - things that have been traditionally problematic in our isolated MediaWiki solutions.
Strong contenders are React and vue.js. While the react ecosystem is larger, I would recommend vue for its better documentation and clear compartmental structure which hopefully helps us to avoid further isolated solutions.
# Provision of a modern customization infrastructure
The introduction and larger use of a MVVM could also be a chance to provide clear frontend APIs for Gadgets. They currently use DOM-hacks, which break continuously and would not anyway not possible when using a modern frontend framework (due to DOM flushing).
Why should bother, since we have a large user base in which different tasks are shared using specific tools, just like each manual work has many different specialized, often even customized tools.
Additionally, gadgets/userscripts could provide a low-barrier opportunity to onboard new developers. Other organizations successfully show that user provided extensions can enhance an ecosystem with user driven innovation and help with onboarding developers, e.g. Firefox' and Chrome's WebExtensions as well as LibreOffice.
I would like to work on finding a way fulfil the possibilites of gadgets and extend them while providing sustainable and secure infrastructure for doing so.
# Participation beyond code and feature wishes
We already do extensive user research. A large area for expansion and further development is doing this research and sense making *with* the community. This may already be done, often implicitly, based on feature- or UI focused requests of community members. But this has large caveats: The solution may net be feasible or sustainable to implement. Furthermore, without understanding the underlying need, we risk building technical- and UX debt and give away the possibility of learning from our community.
To achieve an active, needs-based involvement of communities in design and research we could build on existing participatory design methods. They could be used and integrated in our research and product planning frameworks. Clearly integrating community in up front research could enable us to gather needed knowledge, have community participation and reach a better understanding between Wikimedia Foundation and communities as well as of the communities among each other. I want to define future participatory design strategies to be used on our way towards 2030.
E
More than just servers
A modern approach to tooling and infrastructure is needed, not only so that we may scale future content and users, but the deployment of new technologies and services as well.
Our way of approaching infrastructure is in need of modernization; What we do we do well, but if we are to grow our capacity, not just for storing and serving content, but our capacity to create the technologies that empower movement strategies, we need a departure from the idea of infrastructre as merely clusters of servers. We need high-level, easily consumed platforms for computation, storage, deployment, and management. We need environments that make experiments cheap, and allow teams to fail-fast or iterate on the next stage quickly. We need systems that are distributed by nature, self-service, secure, and that are able to provide insight into availability and performance.
Some efforts have been (or are being) made here. Examples include recent work toward a Kubernetes deployment, a change-propagation service, and RESTBase. These efforts, while worthwhile, are piecemeal, and no holistic strategy exists. It is my belief that as we discuss the future of our platform, we should consider the requirements in the context of the bigger picture, and discuss a strategy aimed at modernizing our infrastructure. +
== How to built a discussion system that would ease user interactions and content creation on the wikis? ==
I believe that Structured discussions are a must-have for MediaWiki. Build such a system will reduce communication gap on the wikis, ease newcomers first steps, empower all users and allow powerful interactive tools to be built. It will also increase a lot the adoption of MediaWiki as the knowledge creator system. The MediaWiki community has a strategic priority decision to take on this topic.
The Wikimedia communities and organizations, through MediaWiki, wants to give everyone a way to create (free) knowledge collaboratively, for all users from everywhere. Imagine doing it without a powerful discussion tool that would face international interactions, scale and manage to keep everyone aware of the ongoing work. MediaWiki powered experiences have proven that it is not possible.
Unstructured messages are based on a blank page which hasn't evolved since 2002. You can do anything using a blank talk page. But Discussions as the are now don't provide basic things people are used to on social networks or Gdocs for example. Among many missing features, users can't reply to a discussion by email, or using mobile the interface; users have to know where to post and how to use a unique technical etiquette to discuss; and more. Current discussion default system is not welcoming everyone.
Several communities like Wikimedia and WikiHow create inventive ways to structure discussions a bit: templates, contents preload, mentions, surcharge of discussions with HTML, local scripts and bots... Those are not unified and supported by other than communities themselves. Some wikis have decided to use Flow and expect improvements to have a better experience. Some others communities, often the small ones, prefer to use Facebook or other social networks to discuss, which is not a free, safe and open environment.
The approach supported by the Wikimedia Foundation is Structured Discussions extension (re-scoped from Flow) to focus on user-to-user discussions. Consider that extension as a MediaWiki high-priority building block extension is a political decision the MediaWiki community needs to take. It will permit to build strong and diverse communities, decreasing technical barriers.
Built that discussion system requires a clear strategy and resources, like it has been done for the visual editor a few years ago. Any important effort will have side effects that will benefit to other projects (like VE project did notably by developing Parsoid), by being used by other extensions or services that would benefit discussions to create very powerful features, like in-articles notes or suggestions, or easier request systems.
Work on discussions on the Web is not a new topic. We can benefit of studies made about on-line discussions, both about UX design and technical implementation. The MediaWiki community also have some experience about what is not possible or not desirable, taken from LiquidThreads and Flow.
F
We need to re-evaluate scaling, on both the technical community side and the content side.
On the technical side, too often we think as if we were an isolated organization, rather than a respected leader that many wish to collaborate with. This causes us to ask ourselves the wrong questions and get the wrong answers.
For example, we asked ourselves whether we should limit ourselves to existing open source translation tools, or use proprietary translation services to fill in the gaps. Instead, we should have stayed committed to open source, and asked how we can use our engineering and financial resources to advance open source translation. This is a major problem that no organization can solve on its own. However, we have both the motivation and resources to be a major contributor to the solution.
We also asked whether we should support the proprietary MP4 format, or limit ourselves to weak device support for open formats. Instead, we should be staying committed to open standards, and working to support their uptake among software developers and device manufacturers. We already have significant relationships with wireless carriers that give us a foot in the door with such manufacturers.
By seeking important partnerships, where we are prepared to put in significant effort, we can greatly scale both our own efforts and those of the broader movement.
On the content side, to achieve sustained long-term growth, we need to grow every type of user activity, including writing, editing, discussion, organization, curation, maintenance, workflows, and moderation. We have historically provided good (and improving) support for writing, editing, discussion, and moderation.
However, we have neglected the related processes of organization (e.g. categorization, tagging), maintenance (e.g. tracking articles that need fixes, updating them as they become out of date), curation (e.g. quality images, featured articles), and workflows (used in multiple areas, but particularly supporting organization, maintenance, curation, and moderation).
It is vital that we improve discussion, curation, workflows, and moderation tools. Otherwise, we will be unable to keep up with increasing content and activity as our improvements to writing and editing succeed. We should look at past successes (e.g. the Teahouse) and failures (e.g. Article Feedback) and learn lessons. In both cases, we made a very specific product, which then succeeded or failed. This is not scalable to hundreds of wikis, and it is hard to iterate in response to lessons learned.
Instead, we should focus on platforms, such as workflow systems. In order to keep up with the community, we need to give them the flexibility to constantly use the software according to their needs.
Fundamentally, Wikimedia's technology are tools to achieve our mission - absolutely vital tools, but not objectives in themselves. Where a tool has dulled we should sharpen it, where it has rusted we should polish it, and where it has blunted we should replace it.
The majority of our tools have sprouted over time in response to immediate needs, and grown ad hoc when we've spotted something they can also do, or been pruned back when they proved too unwieldy to retain. Our communities have taken these tools and built amazing things with them, often despite rather than in line with their intended use. Subsequently these unplanned use patterns have shaped what we think about the tools and how they should be used, when we do so.
This haphazard, tactical development has worked well enough, but has limited us in several ways. We often fail to serve some of our audience because we rush in with a quick fix that listens to a few voices and decides that that's the best thing to build. When we've tried to build more systemic change, it's often been unrooted in serious evidence, and so is like constructing ivory towers into the clouds: baffling, hopeless, and unfamiliar.
We should develop comprehensive methods to collect and monitor actionable data on how well our tools are serving their purposes, and where we can improve. This should come from all stakeholders, covering our great, already-empowered, experienced editors in major languages but also those from whom we rarely hear - those contributing in and speaking smaller languages or not interacting with other users on meta-editing issues, and those with a looser relationship to the movement like readers and casual editors.
We should have numbers clearly attached to our tools as to how we expect them to perform. How these are obtained will differ. Sometimes quick numbers like success rates of false positives against false negatives from anti-abuse features, or how many users having made changes try to press the submit button, will work. Sometimes simple surveys with expected happiness thresholds will be appropriate. In others we may need to work harder to come up with the right way to understand how different tools and experiences interact with each other, like how much "knowledge" readers successfully glean from the article, or whether the burden of allowing logged-out editing is worth the mindshare of "anyone can edit" feeling true.
Ideally, changes to user features and especially introductions of new features should progressively roll out based on these numbers - and if they have adverse effects, they should be automatically rolled back. This is how others operate, but it's very distant from today. It's a far-off dream now, but I believe we can build it.
G
Mediawiki needs a professional ecosystem
There is a huge potential for MediaWiki development outside of the Foundation's organized tech world. Thousands of organisations are running MediaWiki on the internet or intranet. They are investing time and money to make it their platform for information sharing, knowledge management or collaborative work. Yet, a lot of the development and design work stays contained on those installations instead of being published and provided to the greater MediaWiki community. I think this is not because of seclusiveness of the authors, but because we make it hard for externals to contribute.
So how can we tap into this potential? I think there are a number of measures we can take. Among others, these are:
* Support standard ways for code contribution. For example, a lot of developers do have a github account, and know the github workflow of forking and requesting pulls. However, there is currently no way for them to contribute their code directly, instead they have to set up with our gerrit infrastructure. This is a hurdle many will not take.
* Maintain extensions as a community. There are a lot of extensions which are not actively maintained by their authors. In order to get them working, you have to wait for the maintainer to +2 your code. Although I have +2 rights, it is not clear under which circumstances I should actually +2 code, nor is there a general review queue for extensions. We can establish a group of volunteers who review changes to extensions on a regular basis.
* Create a template and gadget repository. A lot of work goes into site customisation using gadgets, templates or on-site-CSS. There are brilliant solutions out there, but we do not have a structured way to centrally collect this content or even curate it.
* Make it attractive for professional developers and consultants build their projects on top of MediaWiki. For example by increasing the visibility of highly used extensions on MediaWiki.org, by providing good entry points for technical documentation or by adding automated quality checks to the extensions.
There are already some initiatives pursuing the general goal of fostering an ecosystem, e.g. MediaWiki Stakeholders or the recently announced Enterprise MediaWiki Consortium. Together with the Foundation, they can encourage MediaWiki maintainers to contribute their ideas and code and be part of the MediaWiki world.
Think like a Pirate: How to beat Internet censorship
Universal access to a digital good such as the knowledge curated and made available via Wikimedia projects, presupposes access without censorship.
Censorship and circumvention methods become more advanced over time. Censorship ranges from blocks of single articles to targeting DNS providers to seizing servers to shutting off Internet access completely. Some of these methods are in use right now against Wikimedia projects.
One form of censorship evasion has proven virtually impossible to stamp out: piracy of copyrighted content, in particular music and movies. Let's look at the methods used by the pirates and adapt them for use by Wikimedia content providers and users. We would like our content to be widely shared, available everywhere. Here is what we need to get started:
1. Content must be downloadable and usable off-line.
* Content meant to be used online, that requires contact with an external server, fails this test. Movies and music do not.
2. Content must be partitionable.
* You don't grab all alternative music for 2017, but just the albums from the artists you want. Users will likely not need or want all of the English language Wikipedia (for example) but only subsets.
3. Content must be usable off-line by applications everyone has.
* Movies and music are downloaded in formats that play in apps that come standard with every OS on every platform. Usability must include navigation and search of content.
4. Downloadable content must be easy to find, both before and after censorship.
* You ask Google to find the music or movie you want on YouTube or elsewhere, click and download. Failing that, there is a fallback (see below).
5. Tech-savvy downloaders must be able to seed the distribution of content to everyone else.
* For music or movies, folks who download from private torrent trackers make copies to give to all their friends; six degrees of separation later, we have reached saturation.
6. Content must be popular enough to be widely shared.
* If a group of consumers cannot locate a content source or redistributor, the distribution chain breaks. Poorly seeded torrents are the classic example.
7. People must not rely solely on the original online content source for access.
* If no one has downloaded or mirrored a copy before access to the original content source is blocked, this approach fails. Note that most people will have little incentive to save copies of content for offline use from a reliable site, unless Internet access itself is spotty, or the content bundles for download add value.
In some jurisdictions, it may be dangerous to possess certain content, including that of the Wikimedia projects. This issue is outside the scope of this proposal.
Related topics: https://www.mediawiki.org/wiki/Wikimedia_Apps/Offline_support, http://www.kiwix.org/, http://xowa.org/ and so on
Wikimedia properties need to keep pace with the norms of browsing and information consuming behavior to stay relevant, grow readership, and bring new editors to add their knowledge to repository. We need to support smaller content types - both for contributions and for consumption. At the same time, we need to support multimedia content, from video to interactive graphics to augmented reality. Structured data will allow us to be more flexible in our presentation of information, and create more complex interactions with that information. Video and audio will open the doors to new contributors and new projects.
Content consumers online now, whether among the highly connected or using the internet for the first time, are looking for the right information available to them at the right time. They don't necessarily want long, encyclopedic content, but instead prefer snippets of information served to them just when they need it. And they learn through more immersive experiences - video, augmented reality, interactive graphics - rather than long form text. Even beyond that, huge portions of the world can't access our content for a number of reasons: they don't have internet access, they can't read, their languages don't have keyboard support, there isn't content in their language. The internet as a whole is evolving to meet these changing needs. Messaging apps support walkie-talkie like communication, Google serves just the right answer to any question (in English), and language support for smaller languages is growing cross-platform. Our infrastructure needs to meet these needs. +
H
The future of responsible AI design is auditing systems. When we deploy AIs that make inherently subjective judgments that affect people and their work, we must also provide a means for them to audit and critique the AI. Did the AI mark the wrong thing as vandalism? Then it can silence a contribution. Did the AI fail to note a high quality article? Then we might direct traffic away from good content? Did the AI recommend the wrong type of thing? Then we might keep people in a filter bubble rather than helping them broaden their knowledge.
There's a lot of conversation in the public sphere about how AIs cause ethical and social problems. Google's image search suggests that all CEOs are men. Facebooks feed filter reduces the visibility of conflicting opinions. The general call among researchers and ethicists is for transparency. At Wikimedia, transparency is an old idea. We've always developed our technologies transparently. But this hasn't made us immune from the problems that AIs wreck. Auditing systems are the future. They are a means towards giving users power over the AIs that govern our experience. We should be talking about how to build them. +
Growing and complexity
Our strategy is pointing us towards a bold and inclusive world in terms of projects and people. Almost by definition this will lead to increased complexity, not simply of our technology, but also of how to deliver to and to enable people to make use of our technology.
In the last few years we have spent energy in creating more api's and a more service oriented architecture. An area where we however have not made such major changes is how we design for and work with the front end of the software, which is where the majority of people are actually using all the other stuff we make.
Here we continue to think in larger products and problems to solve, and quite often tend to fail and even clash with our own 'customers'. By taking on a more diverse strategy, we risk being even more vulnerable to this. I have two suggestions:
Smaller engineering. Allowing more time for smaller projects, smaller bugs, smaller tests of ideas and refinement of existing software. Let's embrace the success of Community Wishlists and be closer to our communities by writing more Gadgets or tools (toolforge) when we can, instead of going for 'the big fix'. Have three 1 week tests instead of one 6 month beta. etc. Fix small bugs that annoy many and that make our website feel amateurish, and improve the experience for everyone. Working more often on the needs of smaller projects, giving them a bigger voice and sprucing up our own solutions by gaining a more diverse experience. Be closer to our communities by working nearer to them.
The second point that we should work on, is to stop thinking of our platform as a website. It is a work environment for an increasingly diverse crowd. We have a limited amount of space on the screen and a huge amount of tasks that various people want to do. Gadgets and even more so userscripts are hugely helpful, but have long since become unmanageable.
It is time to think beyond the simple APIs and widget kits. We need to take a step towards becoming an application environment. We need users to be able to install and use complete apps made from recognizable and reusable building blocks. I want to see and use Gadgets as my browser uses extensions. I want those extensions to put apps in recognizable and consistent spots, to allow for fullscreen or splitscreen views, to have a familiar UI, but without having to cram everything into the limited shared space that we have. Apps as gateways for diversifying the specific solutions we build.
When Marshall McLuhan said "The medium is the message", he was saying that how the message is understood is affected by what is used to present that message.
MediaWiki is a fundamental part of the medium used to present Wikimedia's work (the "message"). Because the medium is an integral part of the message, it requires comparable attention to its availability and accessibility.
For example, effort is made to ensure that people in remote areas have access to selected content through Kiwix, but a very limited effort has been made to incorporate their knowledge into the "sum of all knowledge."
While there are efforts underway that include copying edits into Wikipedia by hand, it should be possible to provide people in remote areas with an editable copy of Wikipedia so that their edits could be incorporated with less intervention.
Improvements in the installation and resource consumption of a simple MediaWiki installation could be made without sacrificing the current PHP-based application such that someone could, for example, run a current MediaWiki installation an a un-rooted Android phone. Work could then be done to automate the synchronization of that MediaWiki with the current Wikipedia content.
This work on MediaWiki could, of course, be used by other people who use the tool besides the WMF which could create a virtuous cycle that would benefit the Foundation.
In fact, deeply incorporating McLuhan's thinking into WMF culture would mean that, while Wikipedia would remain the most visible product of the Foundation, there would be more room to focus on expanding MediaWiki's capabilities beyond what fits into the current focus on GLAM efforts, the website, etc.
Most of the world does not use Wikipedia every day, but many people use something they've learned as a result of reading from or contributing to Wikipedia every day. Making it easier for people to deploy MediaWiki where the potential users do not have the resources of the WMF (for example, in a place that doesn't have a stable Internet connection) could encourage more people to actively embrace of Wikimedia's vision of freely sharing knowledge.
Making use of Wikidata's knowledge on Wikimedia projects and beyond
It is foreseeable that the way our content will be consumed is going to change a lot in the next years, as both the demographics of the internet as well as the devices used by our readers are changing. We should try to adapt to this by offering our content in ways best suited for many user scenarios. In order to achieve this, we need to modularize and structure our content so that it can be easily re-interpreted and used in many different ways.
Wikidata gives us the possibility to easily cater for this trend, by providing machine readable data about any subject, which can be formatted and presented in a wide variety of ways and languages. Wikidata makes it possible to more easily maintain up to date information on subjects in many languages, without the burden of manual data maintenance.
We should strengthen this by improving the integration of Wikidata with other Wikimedia projects, providing easy ways to use and profit from Wikidata especially for small communities and by making the power of Wikidata more visible. While all Wikimedia communities will profit from this, it can be especially worthwhile for small communities, that currently don't have the resources for managing data, like Infoboxes, themselves.
Example projects that will help in this area are the 'ArticlePlaceholder', that allows serving Wikidata-data about a certain subject if there's no article about it. Also the plans for automatic Infoboxes derived solely from Wikidata and other means of using Wikidata-data on Wikimedia projects. While both of these projects can have a big community impact, they need to fit in with the current infrastructure. Also they pose certain new scalability and data presentation challenges that need to be addressed.
Furthermore, Wikidata's information should be easy to reuse by third party projects to increase visibility and in order to gain contributions and data donations, making Wikidata the true data hub of the internet. This goal raises longstanding issues with the current Wikimedia dump infrastructure, which is neither very flexible nor does it provide a machine readable interface for data consumers. Also bringing more individual editors and organizations into Wikidata poses various infrastructure and scalability issues coping with the sheer amount of data and changes happening, as well as providing convenient tools for establishing and maintaining data quality.