Property:Statement
From devsummit
P
Title: Content structuration and metadata are keys to fulfil our strategy
Content:
The Wikimedia mouvement strategy is making a focus on serving more different kinds of knowledge and sharing them with allies and partners [1].
I believe that the most important ground work for reaching these goals is to focus on the outgoing project of moving MediaWiki from a "wikitext plus media file" collaboration system to a platform allowing people to be able to collaborate on many kind of contents and to organise them in a cohesive way.
Two axes seem important to me to pursue this goal:
1. Support a broader set of different contents, not just wikitext, Wikibase items and Lua/JavaScript/CSS contents but also images, sounds, movies, books, etc., that should bee editable just like wiki text pages in order to allow people to improve them in a collaborative way.
2. Build platforms and tools allowing contributors to create and clean metadata about these contents in order to build together the broadest cohesive set of knowledge ever available and increase its reusability.
Going in these directions would allow us to:
* Allow sister projects (and possible new ones) to use relevant content structure for their projects instead of a designed-for-everything wiki markup. It should lead to an increase of their reusability and their user-friendliness, just like what the Structured Metadata on Commons project is aiming for.
* Build powerful APIs to retrieve and edit content just like Wikidata has and so, make working with partners easier.
* Increase the connections between our contents and their discovery using their metadata
* Build better tasked-focused mobile viewing and edit interfaces
* Be more ready for the possible new environment changes like voice-powered interfaces
Some examples of projects we could work on in order to move in this direction:
* Use the multiple content revision facility to migrate progressively the data that could be structured out of Wikitext on all our projects (like the structured metadata on Commons project is aiming for files)
* Federate all our structured content into a "Wikimedia Query Service" that would allow to do unified and powerful analytics and to ease the discoverability of all our contents
* The logical granularity of Wikisource, Wikibooks and Wikiversity contents (and maybe other projects?) is not the wiki page but the set of wiki pages storing a book or a course. MediaWiki should be able to support such use cases by providing a "collection" system allowing us to add metadata and to do operations (renaming, add to watchlist) on sets of wiki pages.
* Switch projects that stores fairly structured data in wiki text templates (like Wiktionary or Wikispecies) from a Wikitext storage to a structured one. Build on top of them user interfaces to edit their local contents (and maybe also the relevant data from Wikidata) and provide nice displays and APIs to make humans and machine both able to retrieve these contents.
* ...
[1] https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction#We_will_advance_our_world_through_knowledge.
Breaking Down Barriers To Cross-Project Collaboration
"... a world in which every single human being can freely share in the sum of all knowledge."
Wikipedia has been our flagship for many years now and our main means of achieving our vision. As information consumption and expectations of our readers are changing, Wikipedia needs to adapt. One crucial building block for this adaption is re-using and integrating more content from the other Wikimedia projects and other language versions of Wikipedia. Connecting our projects more is vital for helping especially our smaller communities serve their readers. Surfacing more content from the other Wikimedia projects also gives them a chance to shine, find their audience and do their part in sharing in the sum of all knowledge.
This integration comes with a lot of challenges. Over many years the Wikipedias have lived largely independent from each other and the other Wikimedia projects. This is changing. Sharing and benefiting for example from data on Wikidata means collaborating with people from potentially very different projects, speaking different languages. It brings a perceived loss of local control. Editors see them-self first and foremost as editors of "their" Wikipedia at the moment and often don't perceive this integration as worth the effort - especially on the larger projects.
We need to address this on both the social and technical level if we want to bring our projects closer together and have them benefit from each other's strength and compensate their weaknesses.
We need to think about and find answers to these questions: What can we do in order to bring our projects together more closely? How can we help break down perceived and real barriers for cross-project work? What can we do to make cross-project collaboration easier? +
R
Security is important. Although Wikimedia/MediaWiki has a generally good track record, we should always be striving for better, ideally, in an automated fashion via testing of the code, and providing a decent and easy to use framework inside MediaWiki to allow people to do this without causing them excess work or effort. In some cases, we can do this through documentation; we have some options that allow for complex things to be done, but it's not very clear to developers that what they are doing may lead to security issues down the road.
In the same way we run phpcs, and are moving towards running phan, it would be very nice to improve our automated testing with a security focus. Helping point people to potential pain points in the future, or just general bad practices now.
As Tim Starling said a few years ago, people shouldn't be doing CR for code style etc. It doesn't make sense, it's a waste of time. This should be automated away as much as possible. Which is happening, slowly improving the MediaWiki codebase.
How can we use this basic idea, and improve the MediaWiki software and it's extensions for security best practices? +
S
Breaking the ice and catering to the could be Student Wikipedia contributors
Most of the valuable contributors especially in technically advanced articles comes from people in academia. So my paper is going to discuss why it can be valuable to expose University students about contributing to Wikipedia and give enough guidance for them to stick around, existing infrastructure that helps this cause and some points on how this can improve. As the infrastructure of mediawiki evolves and becomes a platform where beginners to Open Source find it easy to contribute to the project with a really well documented code base, a friendly community and the many outreach programs we should also think about introducing to the University Students about contributing to WIkipedia.
Wikipedia serves as an invaluable tool for students worldwide helping them to assimilate their course content. They write term papers as a part of their course work so it only makes sense that giving an awareness to students about contributing wikipedia and giving them guidance can be a source of reliable and high quality contributions to Wikipedia.
Existing Infrastructure
WikiEduDashboard which is a project of the WikiEducation Foundation caters to universities where students are required to contribute to Wikipedia articles as part of their course assessment and provides tools for the instructors to guide the students in it.
Machine Learning tools for Guidance
There are existing automatic mechanisms in wikipedia to find out plagiarism/promotional content or any form of spam in the edits. Automatically rating wikipedia articles is to an extent achieved by the Scoring Platform Team. This is being utilised by many bots in wikipedia to spot potential vandalism. This score prediction tool can also be used to give some immediate feedback to the newbie editors in a more friendly manner and point out to the faux paus in their edits. +
Machine learning and scaling the knowledge
There are lots of ways to contribute to Wikimedia movement and all are tedious and time-consuming, and sometimes frustrating. Let's use the classic example of fighting vandalism, Wikimedians are frustrated by flow of vandalism but at the same time they enjoy fighting it. By using scalable machine learning platform for Wikis we are moving towards giving more power to our users without making them tired of the work needed. To come back to our example, ORES is filtering out most good edits, leaving the rest to the community to take care of (so they enjoy the gratification of doing the job) at the same time handling the huge backlog of edits needing review. But if we use a bot to revert bad edits, it damages the motivation of the users. We need to continue moving at this direction in other areas like creating articles, improving quality, categorization, and so on. +
PROBLEM
To satisfy the 'Knowledge as a service' theme, in addition to providing
access to full page content, Wikimedia APIs should provide access to
semantic units at:
- an abstract document level (sections, headings, tables, etc.)
- a domain specific level (infoboxes, geolocation, taxoboxes, etc.)
Wikitext, the core content creation technology on wikis, evolved as a
string-processing language where one set of strings is replaced with
another set of strings mostly via regular expression matches to yield
the output HTML string. There is no notion of document structures here.
This lack of structural semantics gets in the way of being able to robustly
identify semantic units and developing tools and features that operate on
a page structurally at sub-page granularities.
SOLUTION: TRANSPARENT TYPING LAYER OVER WIKITEXT
Types improve abstraction, reasoning, and tooling abilities in programming
languages. A transparent typing layer on top of wikitext can provide similar
benefits.
A: ENFORCE STRUCTURAL TYPES ON OUTPUT OF WIKITEXT CONSTRUCTS INCLUDING
TEMPLATES AND EXTENSIONS
- Specify that all wikitext constructs have an output with type:
String, DOM, CSS property, HTML-attribute, or a List of one of those
- Extensions and templates can specify the expected output type. All other
core wikitext constructs have the DOM output type.
- Parser enforces the output type of all wikitext constructs. Examples:
For DOM types, unclosed tags and misnested tags are fixed up.
For String types, HTML tags are escaped, wikitext strings are nowikied.
For CSS types, values are sanitized.
Among other benefits, this basic typing mechanism enables MediaWiki to
provide an API to extract and edit document fragments without introducing
adverse side-effects on the rest of the page.
B: UNIFIED TYPING MECHANISM TO EXPOSE DOMAIN-SPECIFIC SEMANTIC INFORMATION
Editors impose structure in documents through a rich library of templates,
policies, and maintainance processes they have developed over the years.
If this semantic information (infoboxes, navboxes, sports rankings,
railway timetables, etc) is mapped to a centralized ontology system (wikidata,
schema.org, something else), the parser can expose this information in HTML
and via MediaWiki APIs can expose this information in a wiki-neutral way.
There are multiple disparate mechanisms today wherein template authors specify
metadata about templates (templatadata, templatestyles, possibly others?)
Instead of creating newer mechanisms for specifying structural output types
and semantic information types for templates, it is better to provide
a consolidated mechanism that unifies all this template metadata into a single
user-defined type declaration. This lets newer applications and capabilities
to be developed in the future without code changes to the core mechanism.
FEASIBILITY
This typing layer only affects template authors. Editors that use source editing
won't see any impact (besides fewer markup errors). Editors that use visual editing
might see improved tooling. Even for template authors, this is meant to be an
opt-in mechanism with gradual migration over to the new model.
The proposal here is a logical extension of what Parsoid does today. Parsoid
provides an illusion of structured wikitext and demonstrates what is
possible (VE, CX, Linter, Flow among others) by embracing structured semantics.
The Wikimedia Foundation is a leader in many fields, but none as so obvious and otherwise so underserved anywhere else than that of language and accessibility. We are not just the fifth biggest site online, or one of the biggest open source endeavors available, we are the de facto leaders of technology that other commercial companies consider 'edge case' and 'less profitable'. This gives us an advantage of developing tools that don't just help our own audience, but could - and should - serve as a repository for allowing everyone online to reach, support, and embrace these audiences with minimal effort.
We have many of the tools available already, for our own users and products, but they are still limited when it comes to sharing and using them outside the movement. And why? Developing our tools to be accessible to outside projects - and to cloud tools, to bots and to other Open Source organizations - is a doable task that is not just worthy in general, it also follows our mission.
What better way to empower 'every single human being [to] freely share in the sum of all knowledge' than to share our own powerful tools with others to allow everyone to prioritize support for language, accessibility and right-to-left technologies and push these relevant technology forward?
I suggest we look across our technologies and libraries - from OOjs UI to CSSJanus, ResourceLoader to wfMessage(), and many others - and work to better generalize these to serve our own users better in their projects, bots, and cloud tools - and to place ourselves firmly and officially as the leaders of this technology that we already are.
Now that my position is known, my direction is unknowable - Heisenberg Uncertainty Principle.
So let's break reality, and figure out both. +
Title
Developing software in a wiki way
Background
Over the last 15 years, MediaWiki evolved from a simple PHP script to a complex and highly integrated family of products and services, serving knowledge to billions of humans. Every change might cause an instability or a complete failure of the system. Thus, measures including code review, automated unit testing and code/ product ownership, have been established to guarantee the stability of the software. The drawback of this approach is that improving the software became very challenging for volunteer contributors. This proposal seeks to lower the barriers for volunteer contributors while maintaining the stability of the system.
Advice
(1) Reduce the effort of code review by applying Artificial Intelligence methods. Thus, reviewers can focus on non-formal comments.
(2) Develop a dialog platform that ensures that volunteer contributors are aware of the next steps and the roadmap for their change on the way to production.
(3) Establish a team that supports volunteer developers, who want to make a difference that is not listed in the annual plan by providing temporary code or product ownership.
(4) Improve testing and evaluation to measure the effect of every single change and to identify code or even whole services that are no longer neccary and can be switched off. +
I'm highly interested in having a deep discussion about our technical debt. I've been active in operating on this front myself and I really care how we fare in this aspect, however I see a lack of consensus on tech debt from the development community at large.
We deprecate things and then continue using them. People get irritated when their extensions break due to slightest core changes, even when the extensions themselves are misusing the core interfaces. We can't really run lots of types of static analysis against our code base because the sheer amount of problems detected would make the signal to noise ratio unacceptable. Developing skins for MediaWiki is incredibly painful. Our tests access database a lot, as a result they're slow and fragile. These are just a few examples of pain points haunting our code base and extracting their daily toll from everyone working on it.
I would like to have Tech Debt SIG work in person on addressing these issues. We should define code quality metrics, identify problem areas and create some actionables to address them. We should also discuss approaches to handling this without causing too much discontent from broader developer community.
I believe this would be an important step towards making MediaWiki a better ecosystem and improve our development pace. +
I believe that the technicaly community should strive to collect and effectively disseminate technical knowledge as per the Wikimedia missions attement.
Ability to grow out technical community can be compared with ones own ability to gain knowledge in technical spaces within the Wikimedia movement.
Currently there are many barriers to entry that have been surfaced year after year with some but little movement forward past them.
To scale and ready the community we should push forward and enable the use of emerging trends in technology, such as knowledge retaining Q&A platforms.
There are many other organizations and softwares that do this much better than Wikimedia we should learn from them.
Looking at Q&A platforms specifically, talk pages have never really been a good place to ask questions and retain knowledge in a searchable way for use in the future.
Stackoverfolw, as an example, has proven to be an invaluable resource for people in technical spaces and we can learn from that.
MediaWiki is an amazing piece of software, but we should not feel 'boxed in' by it.
The Wikimedia foundation is not the MediaWiki foundation, MediaWiki does not have to always just be a wiki page.
Our commitment to Open Source is often something that slows down many actions within the movement, however this is not something that should change as it is integral to what Wikimedia stands for at the core.
We should embrace our Open Source commitments and reach out to and engage with organizations using our software more.
Wikimedia Germany does this outreach specifically with the Wikibase extension, looking for other users and engaging them to discover how they are using it, why, and how it can be better.
The Wikibase extension also specifically the Wikibase Query Service shows us that not everything has to be a wiki page, as the query service disseminates knowledge under a free licence effectively.
I hope that the summit will agree that entry to our technical space, and increasing knowledge persistence within our technical space needs some thought and work, and that we should stay committed to Mediawiki as a software and platform, but that it can look, feel and act different while Wikimedia stays true to its mission.
tl;dr: Empower more Wikipedians [1] in using Lua, incl. invoking data from Wikidata, and data from the Data namespace on Commons.
In the past 15 years it was relatively easy for every Wikipedian to create templates, and with the ParserFunctions to write some kind of simple program code.
In 2013 we got Lua as real and powerful programming language. Since a short time it is possible to invoke data from Wikidata and a very short time to store/read data in/from the Data namespace on Wikimedia Commons.
These are all good improvements and increases the possibilities in adding valuable data/information to the (often) high quality content of the German Wikipedia. But these techniques increases the requirements in technical knowledge to the Wikipedians. In other words: Substantial less Wikipedians are able to write Lua module than creating templates.
As an active community member of the German Wikipedia and Wikimedia Commons (and Wikidata) I see a real bottleneck in the situation, that we do not have enough Wikipedians with skills in Lua and the possibitiy how to invoke data from Wikidata & Co.
Checking the Module namespace via RecentChanges for September 2017 on the German Wikipedia: Only 13 Wikipedians contributed to Lua code. Often I read on the Village Pump and related pages: "Sorry, but my to-do-list as volunteer is full until end of the year, I do not have the capacity to solve your problem." and so on. This is very frustrating for all sides.
Other cool [3] projects, like integration of Maps via Kartographer, stucks because the German Wikipedia has not enough man power for these works
As a result of this notice we have to create some help to empower more Wikipedians in using Lua, incl. invoking data from Wikidata, and data from the Data namespace on Commons.
In Germany, Austria, and Switzerland it probably easier than in other countries because we can offer real-live workshops. In 2018 the team of the "Wikipedia:Lokal K" [2], a real-live supporting base for Wikipedia & Co, is planning, with the support of WMDE, some workshops for Wikidata. Same could be done for supporting Lua & Co.
As originator of the "Technical Wishlist" [4] I am thinking about adding support for Lua modules & Co to the wishlist.
On the summit I would like to discuss these and other solutions for this bottleneck.
[1] to be read as users of all sister projects
[2] https://de.wikipedia.org/wiki/Wikipedia:Lokal_K
[3] My POV as active OpenStreetMap community member
[4] https://meta.wikimedia.org/wiki/Community_Tech/Community_Wishlist_Survey_description
Originated by me in September 2013: https://de.wikipedia.org/wiki/Wikipedia:Technische_W%C3%BCnsche/%C3%9Cber_das_Projekt
A key question for me is how we can maximise the richness of our feature offering despite having entered a period of slow growth in revenue and staff numbers. Wikimedia serves a very large number of users, with a diverse set of needs -- nobody can say that the site as it stands is sufficient to satisfy all of them.
There are two main threats to our goal of providing a rich feature set. One is maintenance burden. We are faced with the prospect of sunsetting features because we find the maintenance burden to be too great. But there is no incontrovertible rule in software engineering which says that code, once written, must constantly be rewritten. Maintenance burden most commonly arises from changes in the platform on which the code is implemented. Minimising maintenance burden for a given feature set thus necessitates choosing a stable platform. We need to consider the programming languages we use, and the libraries we require, through this lens.
The second threat is needless complexity. Concepts which are hard to understand, and which thus restrict related development to highly skilled developers, are appropriate only if hidden behind a module boundary. In order to enable contributions from developers less skilled than ourselves, and to minimise the time required for learning and familiarisation, the bulk of our code should be simple. Complexity is alluring because it provides developers the opportunity to take pride in their work. But for the benefit of the organisation as whole, its efficiency, and thus the richness of its product offering, we should introduce complexity only with due caution.
Code which is complex but stable can be valuable, presenting no great risks. For example, the diff algorithm we currently use in wikidiff2 has its origin in Perl code written in approximately 1998. Only in the last year have we considered adding substantial features to it. We have a PHP port and a C++ port, and neither requires significant maintenance. This is because the requirements are stable and the two respective platforms (C++ and PHP) are stable.
Contrast this to OCG, which is at risk of sunsetting only three years after its original deployment. The reason is that its input and output formats are constantly changing, that is, it has changing requirements; and it was written on a modern and rapidly changing platform. Its main developer wrote "the architecture which was state-of-the-art in 2014 is already looking a little dated in 2016".
My goals for the developer summit are to encourage people to think carefully about writing code on top of a conceptually complex, rapidly changing platform. I want WMF and the MediaWiki community to write code which is stable and long-lasting, and can thus support a richly featured website into the future.
T
Mediawiki is one of the rare software system where the i18n is done right. This infrastructure need timely improvement and maintenance. The technology and resources for supporting that technology is important as 2017 movement strategy states: 'We will build the technical infrastructures that enable us to collect free knowledge in all forms and languages.'. But most of these infrastructure is running under volunteer capacity and no official team responsible.
1. Opensource strategy - Mediawiki language technology is isolated and a less known in general opensource ecosystem. There is a need to have proper ownership, maintenance and feature enhancements as good open source project, so that our contributions comes from other multi lingual projects, while we help with our expertise.
(a) Our localization file formats and the libraries on top of them are very advanced and supports languages more than any other system. But it was when the libraries made mediawiki independent, other projects started noticing it. We use that independent library(jquery.i18n) for VE, ULS, OOJS-UI and present in mediawiki core. But it is not actively maintained, issues and pull requests not addressed because there is nobody in foundation now in charge of it, except volunteer time. There is a lot of demand for its non jquery, general purpose js library. Code is aged, tech debt is increasing.
(b) We developed one of the largest repository of input methods(100+ languages) to support inputting in various languages and an input method library. This is a critical piece of software for many small wikis - jquery.ime. The code is aged, not actively maintained by anybody from foundation, except some in their volunteer time. Not updated to take advantage of browser technology updates about IMEs. This is a mediawiki independent open source library.
(c) Universal Language selector - a language selection, switching mechanism for our large list of languages, also delivering input methods, fonts, need ownership and tech debt removal. Navigating between different wikis is done using this and now the team authored this system does not exist.This is a also mediawiki independent open source library. VE, Translate, ContentTranslation, Wikidata depends on this library.
(d) Mediawiki core i18n features(php) are also started showing its age. There were plans to make some of them as standalone opensource libraries. Not happened. Nobody officially responsible for this infrastructure too.
2. The Translate extension - helping to have mediawiki interface available in 300 languages - something that we always proud of - is not officially maintained by foundation now. The localization happens because of volunteers and volunteer maintaining Translate extension code. Moreover, the translatewiki.net, which hosts the Translate where localization happens by volunteers also outside foundation infrastructure.
3. It is time to have machine translation infrastructure within wikimedia. Content translation used machine translation - but that is an isolated product. Translation of content can be used in various contexts for readers. CX tries to provide a service api for MT, There are lot of potential for that. Multiple MT services, even proprietary services might be needed to cover all the languages. At the same time, our content and translations are important for training new opensource MT engines.
4. Wikipedia follows very traditional approach for typography and layout. Language team had limited webfont delivery to aim missing font issue, but too old code not got any updates in last 3 years or so. Language team plans to abandon that feature due to maintenance burden, but not happened and no team now owns it. Other than this a few wikis does common.css hacks to have customization of default fonts. Typography refresh attempt from reading team was for Latin. Every script has its own characteristics about font size, preferred font family sequences, line heights etc. Presenting knowledge in all these language wikis, in 2017 or later need serious thoughts about readability, typography and general aesthetic of wikipedia in a language compared to other websites in that language.
* Embrace open-source and keep our software to the same standards we hold other open-source software. This would prevent our software from becoming isolated, hard to maintain, or hard to contribute to.
* Scale the contributor experience. Ensure our content remains of high quality and value to readers; ultimately to avoid failing our mission. I envision this requires a radical shift in how our application is served, by involving a non-static service capable of scaling to the traffic of our CDN and yet vary responses by user.
Open-source:
We must understand the dangers of producing software that isn't reusable. Such software may be hard to maintain, hard to contribute, both for future contributions, and our future selves.
"Current needs" only exist to serve our long-term needs. Losing track of long-term needs can make software too specific to a current need, risking a trend of releasing software that is only open-source as a courtesy, for transparency, and without being re-usable.
Reusable software has a defined purpose and serves it well. It tends be easy to install, well-documented, and easy to contribute to. Re-use between different services, as well as externally. Such as for community tools, cloud services, or other third parties. Having a defined goal also encourages designing APIs in a way that we can agree not to break or change too often, because they are public.
Scaling:
Our current infrastructure is highly optimised for the passive reader that doesn't contribute. We serve a static CDN response to most users. For users having logged-in, or made contributions, we bypass these layers for all page loads. As a result, their document load time increases by 5x-10x (eg. NavTiming metric responseStart).
In 2015, Ori mentioned the danger of this in (<https://blog.wikimedia.org/2014/12/29/how-we-made-editing-wikipedia-twice-as-fast/>), saying optimising our backend will "allow us to dissolve the invisible distinction between passive and active users". And "enable microcontribution [features] that draw [in] passive readers".
Banners (CentralNotice) are a good example of our needs being at odds with our infrastructure. We want banners to show as part of the page, and for banners to vary by user, location, plus random variance. Our current infrastructure could only do so by bypassing the CDN on all requests. As such, the current way is entirely client-side and completes well after page load.
In few cases where we do ask readers for data, it is for statistical purposes or to improve the software. Direct (or indirect) contributions to our content remains limited to complex actions like "edit". Moving our contributors experience to match some of the capabilities and performance of the reader experience, would enable us to start accepting micro contributions that actually produce a change in content (either directly, or e.g. by consensus). It also opens the door to making our web platform work offline (e.g. ServiceWorkers) which further enables high-performant interactions that can be uploaded at a later time.
We need a wikiproject incubator to enable innovation
We need infrastructure for cheap creation of new wikis and experimentation with new project types and their different technical and social needs.
Wikipedia was an unlikely bet that paid off extremely well. It does not cover the entirety of human knowledge however - it's limited to long-form, text-based representations of encyclopedic topics on subjects well covered by reliable sources. There have been recurring discussions in the past about covering new types of knowledge (genealogy, fact checking, 3D models, oral history, collaborative video etc. etc.) but they never went anywhere, because creating new projects is a lot of work and the outcome is unpredictable (some things only work in practice), so the WMF was unwilling to take the risk. If we want to make meaningful progress on our vision of curating the sum of *all* knowledge, and truly become the infrastructure of free knowledge, we need to overcome this barrier.
We need infrastructure for low-effort, low-risk experimentation with new projects - a "wiki nursery". A system where new wikis can be created at minimal cost, technical and social experiments can be performed on them without undue risk to other projects, and they can be discarded if they prove unsuccessful. Other organizations have used such mechanisms with great success (Wikia has tens of thousands of wikis; Stack Exchange has nearly two hundred sites).
Such a system has to be somewhat separate from the existing wiki cluster; a closely integrated approach like incubator.wikimedia.org is both too inflexible and too insecure. It needs to have some level of operational and legal maturity (more than what Cloud VPS offers). For risk segregation and eficiency, it would have to be different from our production cluster in a number of ways:
* single sign-on which does not rely on the local wiki for authentication
* no shared database and configuration acces
* ensure even distribution of server resources without constant human supervision
* flat domain structure (to avoid cookie leaks) with affordable certificate management
* new management interfaces and corresponding core support to avoid spending developer time on common tasks (wiki creation and destruction, configuration and extension management)
While there is a significant one-time cost to setting up such a system, keeping it running is cheap, and the benefits are well worth it - information on what content types and what policies enable productive collaboration, a space free of the conservativism and change aversion of large established projects where new concepts can prove themselves, an entry point for projects with a less western-centric interpretation of knowledge, lots of small projects which are welcoming to new editors and provide lots of low-hanging fruit for contribution. (Also, the same feature set is needed for paid wiki hosting service, which is one way to create non-WMF income streams into the MediaWiki software ecosystem - a valid strategic goal on its own.)
V
Infrastructure for Open: Safe code sharing in the Wikiverse
Wikipedia has always been a place where people build things, starting with MediaWiki itself... Talk pages were created out of formatting conventions manually followed. Templates and Lua modules grew out of users' need to automate common markup & text blocks. Gadgets came about to let users add new capabilities to their experience.
To scale our users' ability to work, we need to build modern infrastructure and APIs for on-wiki code: templates, gadgets, and custom workflows.
First, gadgets and templates need to be maintainable and sharable in a centralized place; copy-pasting doesn't scale. Integrate with "real developer" tools like git, so complex tools can be edited and archived off-wiki.
Second, they need to be safer and more future-proof. Template & module output is in wikitext, a fragile format; consider separating sanitized "true" templates from the data sources.
JS gadgets can access internal or deprecated APIs that may break ... or hijack a session as malware! We should create narrower APIs and run the gadgets in isolated JS contexts to provide fault isolation -- this would also enable using them in different contexts such as mobile apps, by implementing the same interfaces.
Third, we need to make content "smarter" by giving it the ability to run interactive scripts safely -- a mix of what templates/modules and what gadgets can do. This can be used to make animated widgets for article pages, but more importantly could be used to implement discussion & editing workflows to supplement what you can do with just a talk page and a set of conventions. At a minimum, think of what people do with Google Forms to guide input, and let folks do that on-wiki.
On-wiki tool-building is a "force multiplier" that lets people get more done by organizing themselves. Providing better tools for tool builders will lead to happier, more productive users working for our mission. +
We are in the business of democratizing knowledge, and I believe that lowering and removing technical barriers to entry, and creating a culture of inclusion in our technical spaces is essential to our success.'
The Knowledge as a Service aspect of our strategic directions focuses on building infrastructure and platforms that help create and share open knowledge. The key to successfully building and scaling such infrastructure, in the context of the Wikimedia Technical spaces, is enabling everyone, irrespective of their experience or backgrounds to be able to utilize and create research, data, and tools on top of our infrastructure. When designing infrastructure and other technical products, we often fail to take into account technical barriers, inessential complexities and social costs that can discourage or prevent people from being able to leverage them. For instance, is it enough to build a dataset and store it in a database, if we do not provide friendly ways for researchers to access and analyze this data? Is it enough to put out a call to contribute to a project, but not provide easy-to-setup development environments to be able to test changes? Is it sufficient to have a state of the art environment to host applications, but not design good, simple processes around gaining access and deploying to them?
These conversations are crucial, because we are not building products for technology's sake, but are in fact trying to build a culture where it is easy to use and contribute to our technical projects, whether you are a volunteer who has a few hours to spare or a paid employee; a newcomer or a long time contributor. We also want our technical communities to be diverse, and these complex systems and processes, and unsaid social constructs around how to interact with our projects, often bias against traditionally underrepresented populations in technology.
I have always worked on or pushed for creating and supporting simple graphical interfaces that provide unified access to data sources, building platforms and processes that lets people just create tools/APIs/dashboards and be able to painlessly host them, developing tutorials and good documentation for getting involved in our projects, and codifying friendly and inclusive social norms and promoting a culture of being excellent to each other in our technical spaces.
When talking about the future directions of new and existing projects, we should take into account the costs and barriers to access, and who we may be failing to include as a result. I hope to be this voice in the Developer Summit.
Developer resources & collaboration on Wikimedia User Interface Style Guide
Presenting the Wikimedia User Interface Style Guide targeted on developers needs. The style guide is including all its resources and is hosted on GitHub as a test case for design collaboration.
In the ideation of the new style guide, it was emphasized for it to be a successful resource, it has to address both, designers and developers needs.
So far a big part of the overarching and visual style principles alongside one development 'base' layer - WikimediaUI Base variables, already being used in OOUI and Marvin as building block - has been accomplished.
Within the next couple of weeks, there will also be a 'Components' and 'Resources' section where developers are presented design principles 'in action', combined with demos hot-linking. The presentation would be on principles behind and application of the components:
Ideation and design
This is for everyone, internationalization/language
Open to collaboration
Design consistency
Trustworthy yet joyful
Usability and UX best practices and underlying user research,
Responsive design and mobile first principles and
Accessibility measurements
There are clearly topics overlapping with the questions posed on the thematic overview like maintaining and growing technical community, role of open source, scaling, tools for embracing mobility.
At the end of the presentation there is an open questions/feedback slot planned getting ideas on how to extend/improve the style guide even more. +
W
What technologies are necessary for embracing mobility?
How and with whom should we partner to create the technologies needed to support the mission?
Multilateral, asynchronous, bidirectional synchronization of wikis: How astronauts taught the world to wiki on the go
The world is certainly transitioning their internet usage from the desk to their mobile device. Let's not limit our focus on mobile devices that always have an internet connection. Let's talk about the millions of travelers stuck in a moving vehicle with nothing better to do than look out their window. I'm talking about passengers on planes, trains, and automobiles. The easiest target here are the millions of people who fly. As a passenger onboard a plane for hours, we're lucky if we have an in-flight movie system. But what if the plane offered a local intranet with a copy of Wikipedia? What if the airlines gave promotions to those who contributed? What if each flight competed with other flights for most contributions? The same approach could be applied for passenger trains, buses, subways, and ferries.
The main limitation here is a technical one. If you have thousands of Wikipedia clones buzzing around, each collecting contributions during their offline time, how do you reconcile the changes with the master database? While tools like Kiwix already offer an offline copy of Wikipedia, there is much work needed to support thousands of wiki clones reconciling changes every few hours. This will require revolutionary branch management and revision conflict handling.
But if you pull this off, it might kick off the biggest surge in user participation in years.
With whom should you partner to accomplish this? Why not start with NASA? They use MediaWiki to train astronauts and plan for spacewalks. Begin this development by running wiki servers onboard the International Space Station. Get astronauts to contribute to the same wiki used for their training while they are putting all that knowledge to use. Once the NASA wiki synchronization between the ground and the ISS is working, expand this model to Wikipedia. Yes, have a clone of Wikipedia onboard the ISS. Astronauts love to share their experience, their story, and their photos from their 6-month stays aboard the station. These lucky few represent countries from around the world and they have a huge influence on the rest of us on the ground. Once people see astronauts contributing to Wikipedia during their journey, they will want to join the movement on their travels (albeit aboard slightly less cool vehicles).
tldr: Encourage use of MediaWiki outside WMF projects,
because does so furthers the mission (and improves the software).
MediaWiki is primarily a tool that helps the Wikimedia movement. The movement is not primarily about MediaWiki, but MediaWiki is the central tool with which we are currently fulfilling our mission. Lots of other people use MediaWiki too, but their needs are not the focus of WMF development. I think we should do more to encourage 3rd party use of MediaWiki, and in doing so broaden the developer community and end up with higher-quality software that is easier to use for more people.
Our mission is about empowerment and education, and people running their own wikis should be seen as part of that. Just because content isn't hosted on WMF servers doesn't mean it's not part of the movement. I imagine a future in which MediaWiki is as common for collaborative websites as WordPress is for blogs, and people don't have to rely on Facebook, YouTube, GitHub, etc. to host their content.
A couple of parallels (sort of):
* OpenStreetMap hosts the central database of their map, but they actively discourage people doing anything other than editing on the OSM infrastructure. Instead, there is a large ecosystem of tools and systems for serving that data. This is mainly because it would be impossible for OSM to provide the bandwidth and required formats etc. for all the possible uses - similarly, the set of Wikimedia sister projects are never going to provide every wiki that people want.
* Automattic runs wordpress.com and also manages the opensource development of WordPress; the latter seemingly in conflict with the former, but because of the long history (and relative late-starting of wordpress.com perhaps) the software has remained a favorite of self-hosted websites. (I'm ignoring the obvious security arguments for a now.)
One reason that MediaWiki is not better for 3rd party users could be that there are not all that many non-WMF people making a living out of developing for it, at least not in comparison with other web frameworks (Drupal, WordPress, etc.).
WordPress had to be easy to install on cheap web hosts because that's all there was; MediaWiki seems these days to only have that characteristic as a historical hangover, and it could well end up moving away from being aimed at amateur sysadmins all together.
Perhaps there's a fundamental clash between the scale requirements of the WMF sites vs. the ease-of-administration requirements of 3rd party wikis - but if there's a choice to be made, it should be explicit and well-communicated. At the moment, it feels like it's happening by attrition.