Advancing the Contributor Experience

From devsummit
Revision as of 11:33, 12 January 2018 by Ccicalese (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Related Phabricator Task T183317
Topic Leaders Daren Welsh, Derk-Jan Hartman

13 primary statements. 4 secondary statements.


Loading...
  • Accessibility
  • Alternative Interfaces
  • API
  • Architecture
  • Censorship
  • Collaboration
  • Complexity
  • Contributors
  • Data Center
  • Discussions
  • Documentation
  • Editing
  • Gadgets
  • Infrastructure
  • Knowledge as a Service
  • Knowledge Equity
  • Languages
  • Machine Learning
  • Machine Translation
  • Mobile
  • Multimedia
  • New Users
  • Offline Editing
  • Real-time Collaboration
  • Schema.org
  • Social
  • Strategy
  • Structural Semantics
  • Structured Data
  • Style Guide
  • Synchronization
  • Templates
  • Third Parties
  • Tools
  • Translation
  • User Experience
  • Wikibase
  • Wikidata

Primary

Author Tags Primary Session Secondary Sessions Position Statement
David Chan Contributors, Editing, Real-time Collaboration, Social Advancing the Contributor Experience

Embracing real-time collaboration

Real-time collaboration (like Google Docs and Etherpad) has many benefits but also imposes certain workflow requirements. There is prototype code that can enable real-time collaboration within VisualEditor. But rolling out collaborative editing requires more than technical work. It will require a coordinated effort to re-imagine what editing is like. We will need mechanisms to create user groups, real time chat mechanisms, mechanisms to temporarily persist collaborative sessions, perhaps even new core mechanisms for describing revisions. We also need to think about social mechanisms and preventing harassment and vandalism of collaborative sessions. In exchange, we will gain improved mechanisms for mentoring, translating long articles, reporting on current events, and assisting non-native speakers.

We should embrace this opportunity to reimagine our platform, starting by organizing a number of trials to gain insights into whether, or how best, a real-time collaborative editing option would benefit our projects.

Sessions in previous Wikimanias / Hackathons / Developer summits identified potential uses, including for:

- Mentoring - Translating long articles - Current events - Assisting non-native speakers

Potential issues identified include:

- How to log authorship - Who decides when to publish - Preventing in-session abuse - Coexisting with non-real-time editors

References

https://wikimania2017.wikimedia.org/wiki/Submissions/Waiting_for_Real-Time_Collaboration (Wikimania 2017 panel discussion on real-time collaboration)

https://phabricator.wikimedia.org/T165941 (Wikimedia Hackathon 2017 showcase including live demo of VisualEditor real-time collaboration)

Benoît Evellin Discussions, Mobile, New Users, Strategy, User Experience Advancing the Contributor Experience

How to built a discussion system that would ease user interactions and content creation on the wikis?

I believe that Structured discussions are a must-have for MediaWiki. Build such a system will reduce communication gap on the wikis, ease newcomers first steps, empower all users and allow powerful interactive tools to be built. It will also increase a lot the adoption of MediaWiki as the knowledge creator system. The MediaWiki community has a strategic priority decision to take on this topic.

The Wikimedia communities and organizations, through MediaWiki, wants to give everyone a way to create (free) knowledge collaboratively, for all users from everywhere. Imagine doing it without a powerful discussion tool that would face international interactions, scale and manage to keep everyone aware of the ongoing work. MediaWiki powered experiences have proven that it is not possible.

Unstructured messages are based on a blank page which hasn't evolved since 2002. You can do anything using a blank talk page. But Discussions as the are now don't provide basic things people are used to on social networks or Gdocs for example. Among many missing features, users can't reply to a discussion by email, or using mobile the interface; users have to know where to post and how to use a unique technical etiquette to discuss; and more. Current discussion default system is not welcoming everyone.

Several communities like Wikimedia and WikiHow create inventive ways to structure discussions a bit: templates, contents preload, mentions, surcharge of discussions with HTML, local scripts and bots... Those are not unified and supported by other than communities themselves. Some wikis have decided to use Flow and expect improvements to have a better experience. Some others communities, often the small ones, prefer to use Facebook or other social networks to discuss, which is not a free, safe and open environment.

The approach supported by the Wikimedia Foundation is Structured Discussions extension (re-scoped from Flow) to focus on user-to-user discussions. Consider that extension as a MediaWiki high-priority building block extension is a political decision the MediaWiki community needs to take. It will permit to build strong and diverse communities, decreasing technical barriers.

Built that discussion system requires a clear strategy and resources, like it has been done for the visual editor a few years ago. Any important effort will have side effects that will benefit to other projects (like VE project did notably by developing Parsoid), by being used by other extensions or services that would benefit discussions to create very powerful features, like in-articles notes or suggestions, or easier request systems.

Work on discussions on the Web is not a new topic. We can benefit of studies made about on-line discussions, both about UX design and technical implementation. The MediaWiki community also have some experience about what is not possible or not desirable, taken from LiquidThreads and Flow.

Ariel Glenn Censorship Advancing the Contributor Experience

Think like a Pirate: How to beat Internet censorship

Universal access to a digital good such as the knowledge curated and made available via Wikimedia projects, presupposes access without censorship.

Censorship and circumvention methods become more advanced over time. Censorship ranges from blocks of single articles to targeting DNS providers to seizing servers to shutting off Internet access completely. Some of these methods are in use right now against Wikimedia projects.

One form of censorship evasion has proven virtually impossible to stamp out: piracy of copyrighted content, in particular music and movies. Let's look at the methods used by the pirates and adapt them for use by Wikimedia content providers and users. We would like our content to be widely shared, available everywhere. Here is what we need to get started:

1. Content must be downloadable and usable off-line.

  * Content meant to be used online, that requires contact with an external server, fails this test. Movies and music do not.

2. Content must be partitionable.

  * You don't grab all alternative music for 2017, but just the albums from the artists you want.  Users will likely not need or want all of the English language Wikipedia (for example) but only subsets.

3. Content must be usable off-line by applications everyone has.

  * Movies and music are downloaded in formats that play in apps that come standard with every OS on every platform.  Usability must include navigation and search of content.

4. Downloadable content must be easy to find, both before and after censorship.

  * You ask Google to find the music or movie you want on YouTube or elsewhere, click and download. Failing that, there is a fallback (see below).

5. Tech-savvy downloaders must be able to seed the distribution of content to everyone else.

  * For music or movies, folks who download from private torrent trackers make copies to give to all their friends; six degrees of separation later, we have reached saturation.

6. Content must be popular enough to be widely shared.

  * If a group of consumers cannot locate a content source or redistributor, the distribution chain breaks.  Poorly seeded torrents are the classic example.

7. People must not rely solely on the original online content source for access.

  * If no one has downloaded or mirrored a copy before access to the original content source is blocked, this approach fails.  Note that most people will have little incentive to save copies of content for offline use from a reliable site, unless Internet access itself is spotty, or the content bundles for download add value.

In some jurisdictions, it may be dangerous to possess certain content, including that of the Wikimedia projects. This issue is outside the scope of this proposal.

Related topics: https://www.mediawiki.org/wiki/Wikimedia_Apps/Offline_support, http://www.kiwix.org/, http://xowa.org/ and so on

Anne Gomez Multimedia, Strategy, Structured Data Advancing the Contributor Experience

Wikimedia properties need to keep pace with the norms of browsing and information consuming behavior to stay relevant, grow readership, and bring new editors to add their knowledge to repository. We need to support smaller content types - both for contributions and for consumption. At the same time, we need to support multimedia content, from video to interactive graphics to augmented reality. Structured data will allow us to be more flexible in our presentation of information, and create more complex interactions with that information. Video and audio will open the doors to new contributors and new projects.

Content consumers online now, whether among the highly connected or using the internet for the first time, are looking for the right information available to them at the right time. They don't necessarily want long, encyclopedic content, but instead prefer snippets of information served to them just when they need it. And they learn through more immersive experiences - video, augmented reality, interactive graphics - rather than long form text. Even beyond that, huge portions of the world can't access our content for a number of reasons: they don't have internet access, they can't read, their languages don't have keyboard support, there isn't content in their language. The internet as a whole is evolving to meet these changing needs. Messaging apps support walkie-talkie like communication, Google serves just the right answer to any question (in English), and language support for smaller languages is growing cross-platform. Our infrastructure needs to meet these needs.

Derk-Jan Hartman Complexity, Gadgets, Strategy, Tools, User Experience Advancing the Contributor Experience Evolving the MediaWiki Architecture

Growing and complexity

Our strategy is pointing us towards a bold and inclusive world in terms of projects and people. Almost by definition this will lead to increased complexity, not simply of our technology, but also of how to deliver to and to enable people to make use of our technology.

In the last few years we have spent energy in creating more api's and a more service oriented architecture. An area where we however have not made such major changes is how we design for and work with the front end of the software, which is where the majority of people are actually using all the other stuff we make.

Here we continue to think in larger products and problems to solve, and quite often tend to fail and even clash with our own 'customers'. By taking on a more diverse strategy, we risk being even more vulnerable to this. I have two suggestions:

Smaller engineering. Allowing more time for smaller projects, smaller bugs, smaller tests of ideas and refinement of existing software. Let's embrace the success of Community Wishlists and be closer to our communities by writing more Gadgets or tools (toolforge) when we can, instead of going for 'the big fix'. Have three 1 week tests instead of one 6 month beta. etc. Fix small bugs that annoy many and that make our website feel amateurish, and improve the experience for everyone. Working more often on the needs of smaller projects, giving them a bigger voice and sprucing up our own solutions by gaining a more diverse experience. Be closer to our communities by working nearer to them.

The second point that we should work on, is to stop thinking of our platform as a website. It is a work environment for an increasingly diverse crowd. We have a limited amount of space on the screen and a huge amount of tasks that various people want to do. Gadgets and even more so userscripts are hugely helpful, but have long since become unmanageable.

It is time to think beyond the simple APIs and widget kits. We need to take a step towards becoming an application environment. We need users to be able to install and use complete apps made from recognizable and reusable building blocks. I want to see and use Gadgets as my browser uses extensions. I want those extensions to put apps in recognizable and consistent spots, to allow for fullscreen or splitscreen views, to have a familiar UI, but without having to cram everything into the limited shared space that we have. Apps as gateways for diversifying the specific solutions we build.

Mark Hershberger Offline Editing, Synchronization, Third Parties Advancing the Contributor Experience Supporting Third-Party Use of MediaWiki

When Marshall McLuhan said "The medium is the message", he was saying that how the message is understood is affected by what is used to present that message.

MediaWiki is a fundamental part of the medium used to present Wikimedia's work (the "message"). Because the medium is an integral part of the message, it requires comparable attention to its availability and accessibility.

For example, effort is made to ensure that people in remote areas have access to selected content through Kiwix, but a very limited effort has been made to incorporate their knowledge into the "sum of all knowledge."

While there are efforts underway that include copying edits into Wikipedia by hand, it should be possible to provide people in remote areas with an editable copy of Wikipedia so that their edits could be incorporated with less intervention.

Improvements in the installation and resource consumption of a simple MediaWiki installation could be made without sacrificing the current PHP-based application such that someone could, for example, run a current MediaWiki installation an a un-rooted Android phone. Work could then be done to automate the synchronization of that MediaWiki with the current Wikipedia content.

This work on MediaWiki could, of course, be used by other people who use the tool besides the WMF which could create a virtuous cycle that would benefit the Foundation.

In fact, deeply incorporating McLuhan's thinking into WMF culture would mean that, while Wikipedia would remain the most visible product of the Foundation, there would be more room to focus on expanding MediaWiki's capabilities beyond what fits into the current focus on GLAM efforts, the website, etc.

Most of the world does not use Wikipedia every day, but many people use something they've learned as a result of reading from or contributing to Wikipedia every day. Making it easier for people to deploy MediaWiki where the potential users do not have the resources of the WMF (for example, in a place that doesn't have a stable Internet connection) could encourage more people to actively embrace of Wikimedia's vision of freely sharing knowledge.

SJ Klein New Users, Strategy Advancing the Contributor Experience

Our platforms should refocus on collaborating, drafting, and experimenting. Currently much focus is on polished presentation + restriction, hindering experiments and limiting participation.

__Technical aspects__

  • Editing tools focus on fast smooth drafting: multiple simultaneous editors, suggested changes. A terse, readable history highlights major revisions of an article. Discussion is integrated into the draft interface, and toggled on / off.
  • Articles can be forked & merged, supporting all sorts of experimentation. Different groups can work on parallel forks, merging later if they like. Newcomers not following a policy can be channeled to an individual branch while sorting it out, avoiding edit wars. Sandboxing helps avoid "deletion": questionable or disputed contributions can be sandboxed to a hidden or low-visibility personal page. [This is also conducive to distributing an online/offline federation of editors, e.g. over IPFS]
  • Editing, creation, and uploading are encouraged prominently in every page interface. Matchmaking tools help creators find others with similar interests, learn + collaborate. Tools for similarity checking, merging, metadata / license review, + meta-moderation, help anyone contribute and learn new ways to do so. Deleted material [unless oversighted] is reviewable by all who know where to look.
  • The reading experience focuses on contextual connections + human connections. Real-time conversation is available as an overlay while reading. Data-rich interfaces help readers browse multiple versions of an article, and get a sense of persistence, reliability, + interest. For instance, heatmaps for revised / controversial / commented areas; wikiblame for granular provenance; different colors for different sorts of cites; visual cues about how much complementary or conflicting knowledge is available in other articles, files, languages or Projects.

__Cultural aspects (& related tools) __

  • Namespaces include every potentially useful topic: completeness, notability, + copyright uncertainty affect how things are presented, not whether they exist. Similarly, media repositories include all useful material that is legal to host.
  • File uploads are welcome as contributions to the global commons even when they need work. Files are transcoded to free formats where possible. File formats with no free-codec options, or that cannot be thoroughly checked for malware, are stored in their own flexible repository [such as the Internet Archive]: using the same Wikimedia upload interface + metadata, and providing similar wikilinks to reference files from within the Projects.
  • The newcomer experience is simple, flexible, + protected. Contributions from people who "don't know how to do it right" are welcome, and kept separate from the flow of updates from regulars, with their own visibility defaults. Matchmaking tools help newcomers find active work in their area. Blocks, deletions, + warnings happen only for spam / vandalism. Other concerns at worst hide their work from public view, with a friendly review with a peer after the first weeks. A broad group of peers can protect newcomers, for instance by redirecting concerns and complaints about a newcomer to themselves.

== It is time to move away from a "single latest revision viewable by all" model, and the conservative policies designed around it. We need a more flexible model embracing multiple working copies, long-lived drafts, and a greater freedom to experiment, collaborate, + create.

Guillaume Paumier Alternative Interfaces, Knowledge as a Service, Knowledge Equity, New Users Advancing the Contributor Experience Knowledge as a Service

The strategic direction that has emerged has two components: "Knowledge as a service" and "Knowledge equity". "Knowledge as a service", which focuses on infrastructure, seems like the one most related to technology, This proposal is about exploring the less obvious intricacies between the two components, and in particular the technology implications of Knowledge Equity.

As a complex socio-technical system, it's not really possible to separate people from technology when talking about Wikimedia. A direction of Knowledge Equity invites the contributors of the Wikimedia movement to take a critical look at themselves and assess their biases and privileges. This, in turn, can help identify structural biases that have been reproduced and ingrained in our technical platform.

For example, MediaWiki is currently doing a great job at providing a localized interface in many languages. However, beyond language, interaction design and UX patterns seem very specific to Western culture. Similarly, when our strategic direction talks about building strong and diverse communities, this invites us to consider whether the current tools available to contributors enable them to provide an environment where newcomers can experiment, be mentored, and fail safely.

Beyond software, little effort has been invested in exploring alternative interfaces beyond the connected browser. Our primary interface for contribution (the web site) may work well for middle-class contributors from Europe and North America, but isn't necessarily what enables people from other backgrounds or geographies from contributing.

These are some of the topics I would like to bring up for discussion at the Developer Summit.

Keerthana S Contributors, Machine Learning Advancing the Contributor Experience Research, Analytics, and Machine Learning

Breaking the ice and catering to the could be Student Wikipedia contributors

Most of the valuable contributors especially in technically advanced articles comes from people in academia. So my paper is going to discuss why it can be valuable to expose University students about contributing to Wikipedia and give enough guidance for them to stick around, existing infrastructure that helps this cause and some points on how this can improve. As the infrastructure of mediawiki evolves and becomes a platform where beginners to Open Source find it easy to contribute to the project with a really well documented code base, a friendly community and the many outreach programs we should also think about introducing to the University Students about contributing to WIkipedia.

Wikipedia serves as an invaluable tool for students worldwide helping them to assimilate their course content. They write term papers as a part of their course work so it only makes sense that giving an awareness to students about contributing wikipedia and giving them guidance can be a source of reliable and high quality contributions to Wikipedia.

Existing Infrastructure

WikiEduDashboard which is a project of the WikiEducation Foundation caters to universities where students are required to contribute to Wikipedia articles as part of their course assessment and provides tools for the instructors to guide the students in it.

Machine Learning tools for Guidance

There are existing automatic mechanisms in wikipedia to find out plagiarism/promotional content or any form of spam in the edits. Automatically rating wikipedia articles is to an extent achieved by the Scoring Platform Team. This is being utilised by many bots in wikipedia to spot potential vandalism. This score prediction tool can also be used to give some immediate feedback to the newbie editors in a more friendly manner and point out to the faux paus in their edits.

Subramanya Sastry API, Knowledge as a Service, Schema.org, Structural Semantics, Templates, Wikidata Advancing the Contributor Experience Knowledge as a Service

PROBLEM

To satisfy the 'Knowledge as a service' theme, in addition to providing access to full page content, Wikimedia APIs should provide access to semantic units at: - an abstract document level (sections, headings, tables, etc.) - a domain specific level (infoboxes, geolocation, taxoboxes, etc.)

Wikitext, the core content creation technology on wikis, evolved as a string-processing language where one set of strings is replaced with another set of strings mostly via regular expression matches to yield the output HTML string. There is no notion of document structures here.

This lack of structural semantics gets in the way of being able to robustly identify semantic units and developing tools and features that operate on a page structurally at sub-page granularities.

SOLUTION: TRANSPARENT TYPING LAYER OVER WIKITEXT

Types improve abstraction, reasoning, and tooling abilities in programming languages. A transparent typing layer on top of wikitext can provide similar benefits.

A: ENFORCE STRUCTURAL TYPES ON OUTPUT OF WIKITEXT CONSTRUCTS INCLUDING TEMPLATES AND EXTENSIONS

- Specify that all wikitext constructs have an output with type:

 String, DOM, CSS property, HTML-attribute, or a List of one of those

- Extensions and templates can specify the expected output type. All other

 core wikitext constructs have the DOM output type.

- Parser enforces the output type of all wikitext constructs. Examples:

 For DOM types, unclosed tags and misnested tags are fixed up.
 For String types, HTML tags are escaped, wikitext strings are nowikied.
 For CSS types, values are sanitized.

Among other benefits, this basic typing mechanism enables MediaWiki to provide an API to extract and edit document fragments without introducing adverse side-effects on the rest of the page.

B: UNIFIED TYPING MECHANISM TO EXPOSE DOMAIN-SPECIFIC SEMANTIC INFORMATION

Editors impose structure in documents through a rich library of templates, policies, and maintainance processes they have developed over the years. If this semantic information (infoboxes, navboxes, sports rankings, railway timetables, etc) is mapped to a centralized ontology system (wikidata, schema.org, something else), the parser can expose this information in HTML and via MediaWiki APIs can expose this information in a wiki-neutral way.

There are multiple disparate mechanisms today wherein template authors specify metadata about templates (templatadata, templatestyles, possibly others?)

Instead of creating newer mechanisms for specifying structural output types and semantic information types for templates, it is better to provide a consolidated mechanism that unifies all this template metadata into a single user-defined type declaration. This lets newer applications and capabilities to be developed in the future without code changes to the core mechanism.

FEASIBILITY

This typing layer only affects template authors. Editors that use source editing won't see any impact (besides fewer markup errors). Editors that use visual editing might see improved tooling. Even for template authors, this is meant to be an opt-in mechanism with gradual migration over to the new model.

The proposal here is a logical extension of what Parsoid does today. Parsoid provides an illusion of structured wikitext and demonstrates what is possible (VE, CX, Linter, Flow among others) by embracing structured semantics.

Volker Accessibility, Collaboration, Documentation, Mobile, Style Guide Advancing the Contributor Experience

Developer resources & collaboration on Wikimedia User Interface Style Guide

Presenting the Wikimedia User Interface Style Guide targeted on developers needs. The style guide is including all its resources and is hosted on GitHub as a test case for design collaboration.

In the ideation of the new style guide, it was emphasized for it to be a successful resource, it has to address both, designers and developers needs.

So far a big part of the overarching and visual style principles alongside one development 'base' layer - WikimediaUI Base variables, already being used in OOUI and Marvin as building block - has been accomplished.

Within the next couple of weeks, there will also be a 'Components' and 'Resources' section where developers are presented design principles 'in action', combined with demos hot-linking. The presentation would be on principles behind and application of the components:

Ideation and design

This is for everyone, internationalization/language

Open to collaboration

Design consistency

Trustworthy yet joyful

Usability and UX best practices and underlying user research,

Responsive design and mobile first principles and

Accessibility measurements

There are clearly topics overlapping with the questions posed on the thematic overview like maintaining and growing technical community, role of open source, scaling, tools for embracing mobility.

At the end of the presentation there is an open questions/feedback slot planned getting ideas on how to extend/improve the style guide even more.

Daren Welsh Contributors, Mobile, Offline Editing, Synchronization Advancing the Contributor Experience

What technologies are necessary for embracing mobility? How and with whom should we partner to create the technologies needed to support the mission?

Multilateral, asynchronous, bidirectional synchronization of wikis: How astronauts taught the world to wiki on the go

The world is certainly transitioning their internet usage from the desk to their mobile device. Let's not limit our focus on mobile devices that always have an internet connection. Let's talk about the millions of travelers stuck in a moving vehicle with nothing better to do than look out their window. I'm talking about passengers on planes, trains, and automobiles. The easiest target here are the millions of people who fly. As a passenger onboard a plane for hours, we're lucky if we have an in-flight movie system. But what if the plane offered a local intranet with a copy of Wikipedia? What if the airlines gave promotions to those who contributed? What if each flight competed with other flights for most contributions? The same approach could be applied for passenger trains, buses, subways, and ferries.

The main limitation here is a technical one. If you have thousands of Wikipedia clones buzzing around, each collecting contributions during their offline time, how do you reconcile the changes with the master database? While tools like Kiwix already offer an offline copy of Wikipedia, there is much work needed to support thousands of wiki clones reconciling changes every few hours. This will require revolutionary branch management and revision conflict handling.

But if you pull this off, it might kick off the biggest surge in user participation in years.

With whom should you partner to accomplish this? Why not start with NASA? They use MediaWiki to train astronauts and plan for spacewalks. Begin this development by running wiki servers onboard the International Space Station. Get astronauts to contribute to the same wiki used for their training while they are putting all that knowledge to use. Once the NASA wiki synchronization between the ground and the ISS is working, expand this model to Wikipedia. Yes, have a clone of Wikipedia onboard the ISS. Astronauts love to share their experience, their story, and their photos from their 6-month stays aboard the station. These lucky few represent countries from around the world and they have a huge influence on the rest of us on the ground. Once people see astronauts contributing to Wikipedia during their journey, they will want to join the movement on their travels (albeit aboard slightly less cool vehicles).

Brian Wolff Censorship, Offline Editing, Synchronization Advancing the Contributor Experience

Wikimedia should diversify its distribution methods.

Currently Wikimedia distributes its content almost exclusively using the Internet. However, the Internet is controlled by gate keepers in the form of governments and ISPs. While historically these entities rarely controlled the flow of information, more recently we have seen an increase in censorship, particularly by governments. Since Wikimedia is distributed almost entirely over the Internet, we are vulnerable to their whims.

The risk of having our distribution lines interfered with, is an existential threat to our mission. While at present time, only a few geographic locations practise such interference, the future is unknowable and does not appear to be heading in a comforting direction.

Furthermore, in the face of such interference, there is very little we can do. TOR is often spoken as a solution to censorship, but any such on-Internet system will either have to be obscure or rely on secret information (e.g. TOR bridges) to avoid blocking, and thus cannot be used by the public at large. The most effective solution to censorship so far seems to be political pressure, combined with bundling to make censorship decisions as broad as possible. When much content is bundled together, such as entire domains with TLS, or Github and New York Times[1], it can reduce censorship if there is political will to censor a specific part, but not the whole thing. However, political opinion is fickle, and cannot be relied upon.

Thus, we should reduce this risk by diversifying how we distribute our content. Multiple distribution routes means no single point of failure. I see two ways of doing this:

First, by expanding offline versions of Wikimedia. Kiwix already provides an offline version of Wikimedia sites. We need to expand this capability to allow for better updating. Offline apps should be able to efficiently update their contents in accordance to a scenario where users only have intermittent access to the open Internet. More importantly, offline apps should be able to update in a P2P fashion with other apps. In a community with limited access to open Internet, a single person with an up to date version of Wikipedia, should be able to easily synchronise his/her app with other people's apps to spread the knowledge. This could be especially helpful in a scenario where a small number of people have access via methods such as TOR, but such methods are too burdensome for most people.

Second, we could experiment with broadcasting recent edits widely. To broadcast html versions of all main namespace pages recently edited on English Wikipedia, would only require about 12 KBps [2]. This is not a huge amount of bandwidth. During the Cold War it was common to broadcast propaganda using short wave radio, which could be listened to across the world. Perhaps we could broadcast everything that is edited across the world in a similar fashion, allowing users to stay up to date regardless of their connectivity. This could be combined with the P2P app, so a few power users could listen in to the RC stream, and then spread the data among their communities.

[1] https://en.wikipedia.org/wiki/Censorship_of_GitHub#DDoS_attack [2] Based on very rough experiment, ?action=render of a wikipedia page roughly gzips to the size of the raw wikitext. From there the 12 KBps number is based on the enwiki result of: SELECT sum(l)/(1024*3600*24) FROM

(select max(rc_new_len) 'l' from recentchanges
 WHERE  rc_namespace = 0 and rc_timestamp
 BETWEEN '20170926000000' AND '20170927000000'
 AND rc_type <= 1 group by rc_cur_id
) t;

Secondary

Author Tags Primary Session Secondary Sessions Position Statement
C. Scott Ananian Censorship, Infrastructure, Languages, Machine Learning, Machine Translation, Translation, User Experience Next Steps for Languages and Cross Project Collaboration Advancing the Contributor Experience

'One World, One Wiki!' Instead of today's many siloed wikis, separated by language and project, our goal should be to re-establish a unified community of collaborators. We will still respect language and cultural differences - there will still be English, German, Hebrew, Arabic, etc. Wikipedias; they will disagree at times - but instead of separate domains, we'll embrace a single user experience with integrated navigation between projects and languages and the possibility of split screen views aligning related content. On a single page we can work on articles in different languages, or simultaneously edit textbook content and encyclopedia articles. Via machine translation we can facilitate conversations and collaborations spanning languages and projects, without forcing a single culture or perspective.

Machine translation plays a key role in removing these barriers and enabling new content and collaborators. We should invest in our own engineers and infrastructure supporting machine translation, especially between minority languages and script variants. Our editing community will continually improve our training data and translation engines, both by explicitly authoring parallel texts (as with the Content Translation Tool) and by micro-contributions such as clicking yes/no on a proposed translation or pair of parallel texts ('bandit learning'). Using 'zero-shot translation' models, our training data from 'big' wikis can improve the translation of 'small' wikis. Every contribution further improves the ability of our tools to make additional articles from other languages available.

A translation suggestion tool will suggest an edit in one language whenever an edit is made to a parallel text in another language. The correspondences can be manually created (for example, via the Content Translation Tool), but our translation engine can also automatically search for and score potential new correspondences, or prune old entries when the translation has drifted. Again, each new correspondence trains the engine and improves its ability to suggest further correspondences and edits.

Red-links and stubs are replaced with article text from one of the user's preferred fallback languages, perhaps split-screened with a machine translation into the user's primary language. This will keep 'small' language wikis sticky, and prevent readers from getting into the habit of searching in a 'big' language first.

We should build clusters specifically for training translation (and other) deep learning models. As a supplement to our relationships with statistical translation tools Moses and Apertium, we should partner with the OpenNMT project for modern neural machine translation research. We should investigate whether machine translation can replace LanguageConverter, our script conversion tool; conversely, our editing fluency in ANY language pair should approach what LanguageConverter provides for its supported languages.

By embracing unity between projects and erasing barriers between languages, we encourage the flow of diverse content from minority languages around the world into all of our wikis, as well as improving the availability of all of our content into indigenous languages. Language tools route around cultural or governmental censorship: by putting parallel texts and translations in the forefront of our UX we expose our differences and challenge preconceptions, learning from each other.

Adam Baso Machine Learning, Mobile, Schema.org, Structured Data, Templates, Wikibase, Wikidata Knowledge as a Service Advancing the Contributor Experience, Research, Analytics, and Machine Learning

Structure Most Things with Schema.org

The future of digital information will likely be brokered by major platform providers such as Google, Apple, Amazon, Microsoft, and international equivalents and social networks. We're thankful they extend our reach, even as we seek to help consumers on the platforms join our movement.

We could help platform providers, their users, and our users solve problems better through adoption of the open standard Schema.org into Wikipedia pages mapped with templates and, ideally, federated and synchronized Wikidata properties.

Benefits:

  1. Wikipedia will have even better presentation and placement in search engines and other data rich experiences.
  2. We provide an opportunity for a more consistent data model for template authors and people/bots filling template values. And the richly defined Schema.org entities provide a good target to reach on all entities represented in the Wikipedia/Wikimedia corpora. Standardization can reduce duplication of effort and inconsistencies.
  3. We introduce an easier vector for mobile contribution, which could include simpler and different data entry, mapping, and modeling.
  4. We can elevate an open standard and push its adoption forward while increasing the movement's standing in the open standards community.
  5. Schema.org compliant data is more easily amenable to machine learning models that cover data structures, the relations between entities, and the dynamics of sociotechnical systems. This could bolster practical applications like vandalism detection, coverage analysis, and much more.
  6. This might provide a means for the education sector to educate students about knowledge creation, and data modeling, and more. It might also afford scientists and other practitioners a further standardized way to model the knowledge in their fields.

What would it take? And can this be done in harmony with the existing {{Template}} system?

This session will discuss the following:

  1. Are we aligned on the benefits, and which ones?
  2. Implementation options.
    1. Can we extend templates so they could be mapped to Schema.org?
      1. Would it be okay to derive the mapping by manual and automated analysis at WMF/WMDE and apply it behind the scenes? Would that be sustainable?
      2. Could we make it easy for template authors to mark up their templates for Schema.org compatibility and have some level of enforcement? Could Schema.org attributes and entity types be autosuggested for template creators?
    2. Is it easy to relate the most existing and proposed Wikidata entity types and properties to existing Schema.org entities and properties?
    3. What would it take to streamline MCR Schema.org data structures or MCR Wikibase property clusters mapped to Schema.org on defined entity types?
    4. Furthermore, if we can do #1 and #2, what's to prevent us from letting templates as is merely be the interface for Schema.org compliant Wikibase entities and properties (e.g., by duck typing / autosynthesis)?
    5. How could we bidirectionally synchronized between Wikipedia and Wikidata with confidence in a way compatible with patroller expectations? And what storage and event processing would be needed? Can the systems be scaled in a way to accommodate arrival of real-time and increasingly fine grained information?
Roan Kattouw Architecture, Contributors, Data Center Evolving the MediaWiki Architecture Advancing the Contributor Experience

Users should not be punished for logging in

WMF wikis are slower for logged-in users than for anonymous users, which is unhelpful for trying to get users to contribute. This is a long standing problem that's hard to solve, but we should have a vision for how we're going to solve it.

WMF has caching data centers in strategic locations around the world (Amsterdam, San Francisco and soon Singapore), which make the wikis faster for users who are not near the primary data center (in Virginia) but are near a caching location. However, this only benefits anonymous users. For logged-in users, every page view contains their user name and other user-specific information in the personal tools area, so logged-in page views are considered uncacheable and are always routed to the primary data center.

This means that if a new user browses the site for a while, then creates an account because they want to contribute (or makes an anonymous edit), the wiki suddenly becomes slower for them. All users are affected, because uncached requests are slower to serve than cached ones, but users outside North/South America are affected the most, because their traffic now has to cross an ocean that it didn't have to cross before. It's not nice that a new user's 'reward' for creating an account is a slower experience, but it's especially not nice that users in emerging communities are affected the most. If we want to encourage readers to become contributors, slowing the site down as soon as someone contributes is not very helpful.

Some requests will always have to go to the primary data center, such as POST requests saving an edit, and those are always going to be slower for users outside North America. But for logged-in page views this isn't fundamentally necessary, and serving them from the edge caches would speed up the site for logged-in users and reduce the load on the app servers. There are different ways that this could be done, each with their own obstacles. For example, a single-page application for MediaWiki could use a content service to retrieve only the new page's contents when navigating, but this would require modifying or rewriting a lot of code in MediaWiki; ESI could be used to have Varnish inject cached page contents into a user-specific chrome, but that would require using advanced and partly unproven Varnish features. In both cases, we'd have to reimplement certain rendering preferences using CSS or a post-processing step. It's far from trivial, but let's start talking seriously about how we can address this problem.

Katherine Maher Alternative Interfaces, Architecture, Knowledge as a Service, Strategy, User Experience Knowledge as a Service Advancing the Contributor Experience

This proposal focuses on the "Knowledge as a service" part of the strategic direction.

When I look at the core of what we do, to some extent I see a model that we've mastered, and that we're making incremental improvements to. My concern is that, while that model is incredible and powerful as a community, the model for the interface and the delivery mechanism for the product the community creates are changing, and for us to continue what we're doing today may or may not prepare us for what the future actually looks like. I think it also limits our ability to unlock all of the tremendous knowledge, unstructured and structured, that exists within our projects. And I also believe that it limits us to certain forms of knowledge and a certain hierarchy of creation in a way that is very inward-looking.

Right now much of our information is sitting, unstructured, in a SQL database, rendered through PHP, read through a rendering engine into a browser to read/write in one interface: the browser. While this is amazing for the world of the browser, we're not going to be a browser-based information world for that much longer, any more than anything else. It's not that the browser is going to go away, the browser will be like books: books haven't gone away, radio hasn't gone away, but there will be a transformation to a new interface, and we need to be ready for it. Perhaps we should actually backfill into those older interfaces that we're not currently part of, because people still use those interfaces, and those interfaces are valuable.

Essentially this is about taking the Model-View-Controller paradigm to the next level, and also about extending it to participation and to the "write" part of our read-write system. Even if Alexa is serving Wikimedia content outside the browser, there is no mechanism for contributing trough Alexa. We need to be planning for an architecture of information and architecture of experiences that is independent of the browser.

How do you get the most value out of the existing content? How do you serve a snippet to someone who just needs a quick answer? How do you serve different layers of sophistication to 8th-graders versus the college graduate, versus the PhD? Can we engage in the knowledge ecosystem and leverage what we have as a platform, and our traffic distribution and awareness, to actually open up greater resources of knowledge?

These are some of the topics I would like to see discussed at the Dev Summit.