Knowledge as a Service

From devsummit
Revision as of 15:36, 11 January 2018 by Ccicalese (talk | contribs)
Related Phabricator Task
Topic Leaders Adam Baso, Daniel Kinzler, Lydia Pintscher, Thomas Pellissier Tanon

5 primary statements. 4 secondary statements.


Loading...
  • Alternative Interfaces
  • Analytics
  • API
  • Architecture
  • Collaboration
  • Complexity
  • Documentation
  • Infrastructure
  • JavaScript
  • Knowledge as a Service
  • Knowledge Equity
  • Languages
  • Lua
  • Machine Learning
  • Mobile
  • Multimedia
  • New Users
  • OpenStreetMap
  • Oral Knowledge
  • Research
  • Schema.org
  • Strategy
  • Structural Semantics
  • Structured Data
  • Templates
  • Third Parties
  • Tools
  • Trust
  • User Experience
  • Wikibase
  • Wikidata

Primary

Author Tags Primary Session Secondary Sessions Position Statement
Adam Baso Machine Learning, Mobile, Schema.org, Structured Data, Templates, Wikibase, Wikidata Knowledge as a Service Advancing the Contributor Experience, Research, Analytics, and Machine Learning

Structure Most Things with Schema.org

The future of digital information will likely be brokered by major platform providers such as Google, Apple, Amazon, Microsoft, and international equivalents and social networks. We're thankful they extend our reach, even as we seek to help consumers on the platforms join our movement.

We could help platform providers, their users, and our users solve problems better through adoption of the open standard Schema.org into Wikipedia pages mapped with templates and, ideally, federated and synchronized Wikidata properties.

Benefits:

  1. Wikipedia will have even better presentation and placement in search engines and other data rich experiences.
  2. We provide an opportunity for a more consistent data model for template authors and people/bots filling template values. And the richly defined Schema.org entities provide a good target to reach on all entities represented in the Wikipedia/Wikimedia corpora. Standardization can reduce duplication of effort and inconsistencies.
  3. We introduce an easier vector for mobile contribution, which could include simpler and different data entry, mapping, and modeling.
  4. We can elevate an open standard and push its adoption forward while increasing the movement's standing in the open standards community.
  5. Schema.org compliant data is more easily amenable to machine learning models that cover data structures, the relations between entities, and the dynamics of sociotechnical systems. This could bolster practical applications like vandalism detection, coverage analysis, and much more.
  6. This might provide a means for the education sector to educate students about knowledge creation, and data modeling, and more. It might also afford scientists and other practitioners a further standardized way to model the knowledge in their fields.

What would it take? And can this be done in harmony with the existing {{Template}} system?

This session will discuss the following:

  1. Are we aligned on the benefits, and which ones?
  2. Implementation options.
    1. Can we extend templates so they could be mapped to Schema.org?
      1. Would it be okay to derive the mapping by manual and automated analysis at WMF/WMDE and apply it behind the scenes? Would that be sustainable?
      2. Could we make it easy for template authors to mark up their templates for Schema.org compatibility and have some level of enforcement? Could Schema.org attributes and entity types be autosuggested for template creators?
    2. Is it easy to relate the most existing and proposed Wikidata entity types and properties to existing Schema.org entities and properties?
    3. What would it take to streamline MCR Schema.org data structures or MCR Wikibase property clusters mapped to Schema.org on defined entity types?
    4. Furthermore, if we can do #1 and #2, what's to prevent us from letting templates as is merely be the interface for Schema.org compliant Wikibase entities and properties (e.g., by duck typing / autosynthesis)?
    5. How could we bidirectionally synchronized between Wikipedia and Wikidata with confidence in a way compatible with patroller expectations? And what storage and event processing would be needed? Can the systems be scaled in a way to accommodate arrival of real-time and increasingly fine grained information?
Marius Hoch Third Parties, Wikidata Knowledge as a Service

Making use of Wikidata's knowledge on Wikimedia projects and beyond

It is foreseeable that the way our content will be consumed is going to change a lot in the next years, as both the demographics of the internet as well as the devices used by our readers are changing. We should try to adapt to this by offering our content in ways best suited for many user scenarios. In order to achieve this, we need to modularize and structure our content so that it can be easily re-interpreted and used in many different ways.

Wikidata gives us the possibility to easily cater for this trend, by providing machine readable data about any subject, which can be formatted and presented in a wide variety of ways and languages. Wikidata makes it possible to more easily maintain up to date information on subjects in many languages, without the burden of manual data maintenance.

We should strengthen this by improving the integration of Wikidata with other Wikimedia projects, providing easy ways to use and profit from Wikidata especially for small communities and by making the power of Wikidata more visible. While all Wikimedia communities will profit from this, it can be especially worthwhile for small communities, that currently don't have the resources for managing data, like Infoboxes, themselves.

Example projects that will help in this area are the 'ArticlePlaceholder', that allows serving Wikidata-data about a certain subject if there's no article about it. Also the plans for automatic Infoboxes derived solely from Wikidata and other means of using Wikidata-data on Wikimedia projects. While both of these projects can have a big community impact, they need to fit in with the current infrastructure. Also they pose certain new scalability and data presentation challenges that need to be addressed.

Furthermore, Wikidata's information should be easy to reuse by third party projects to increase visibility and in order to gain contributions and data donations, making Wikidata the true data hub of the internet. This goal raises longstanding issues with the current Wikimedia dump infrastructure, which is neither very flexible nor does it provide a machine readable interface for data consumers. Also bringing more individual editors and organizations into Wikidata poses various infrastructure and scalability issues coping with the sheer amount of data and changes happening, as well as providing convenient tools for establishing and maintaining data quality.

Katherine Maher Alternative Interfaces, Architecture, Knowledge as a Service, Strategy, User Experience Knowledge as a Service Advancing the Contributor Experience

This proposal focuses on the "Knowledge as a service" part of the strategic direction.

When I look at the core of what we do, to some extent I see a model that we've mastered, and that we're making incremental improvements to. My concern is that, while that model is incredible and powerful as a community, the model for the interface and the delivery mechanism for the product the community creates are changing, and for us to continue what we're doing today may or may not prepare us for what the future actually looks like. I think it also limits our ability to unlock all of the tremendous knowledge, unstructured and structured, that exists within our projects. And I also believe that it limits us to certain forms of knowledge and a certain hierarchy of creation in a way that is very inward-looking.

Right now much of our information is sitting, unstructured, in a SQL database, rendered through PHP, read through a rendering engine into a browser to read/write in one interface: the browser. While this is amazing for the world of the browser, we're not going to be a browser-based information world for that much longer, any more than anything else. It's not that the browser is going to go away, the browser will be like books: books haven't gone away, radio hasn't gone away, but there will be a transformation to a new interface, and we need to be ready for it. Perhaps we should actually backfill into those older interfaces that we're not currently part of, because people still use those interfaces, and those interfaces are valuable.

Essentially this is about taking the Model-View-Controller paradigm to the next level, and also about extending it to participation and to the "write" part of our read-write system. Even if Alexa is serving Wikimedia content outside the browser, there is no mechanism for contributing trough Alexa. We need to be planning for an architecture of information and architecture of experiences that is independent of the browser.

How do you get the most value out of the existing content? How do you serve a snippet to someone who just needs a quick answer? How do you serve different layers of sophistication to 8th-graders versus the college graduate, versus the PhD? Can we engage in the knowledge ecosystem and leverage what we have as a platform, and our traffic distribution and awareness, to actually open up greater resources of knowledge?

These are some of the topics I would like to see discussed at the Dev Summit.

Thomas Pellissier Tanon Analytics, API, Collaboration, JavaScript, Lua, Mobile, Multimedia, Structured Data, User Experience, Wikibase, Wikidata Knowledge as a Service

Title: Content structuration and metadata are keys to fulfil our strategy

Content: The Wikimedia mouvement strategy is making a focus on serving more different kinds of knowledge and sharing them with allies and partners [1]. I believe that the most important ground work for reaching these goals is to focus on the outgoing project of moving MediaWiki from a "wikitext plus media file" collaboration system to a platform allowing people to be able to collaborate on many kind of contents and to organise them in a cohesive way.

Two axes seem important to me to pursue this goal: 1. Support a broader set of different contents, not just wikitext, Wikibase items and Lua/JavaScript/CSS contents but also images, sounds, movies, books, etc., that should bee editable just like wiki text pages in order to allow people to improve them in a collaborative way. 2. Build platforms and tools allowing contributors to create and clean metadata about these contents in order to build together the broadest cohesive set of knowledge ever available and increase its reusability.

Going in these directions would allow us to:

  • Allow sister projects (and possible new ones) to use relevant content structure for their projects instead of a designed-for-everything wiki markup. It should lead to an increase of their reusability and their user-friendliness, just like what the Structured Metadata on Commons project is aiming for.
  • Build powerful APIs to retrieve and edit content just like Wikidata has and so, make working with partners easier.
  • Increase the connections between our contents and their discovery using their metadata
  • Build better tasked-focused mobile viewing and edit interfaces
  • Be more ready for the possible new environment changes like voice-powered interfaces


Some examples of projects we could work on in order to move in this direction:

  • Use the multiple content revision facility to migrate progressively the data that could be structured out of Wikitext on all our projects (like the structured metadata on Commons project is aiming for files)
  • Federate all our structured content into a "Wikimedia Query Service" that would allow to do unified and powerful analytics and to ease the discoverability of all our contents
  • The logical granularity of Wikisource, Wikibooks and Wikiversity contents (and maybe other projects?) is not the wiki page but the set of wiki pages storing a book or a course. MediaWiki should be able to support such use cases by providing a "collection" system allowing us to add metadata and to do operations (renaming, add to watchlist) on sets of wiki pages.
  • Switch projects that stores fairly structured data in wiki text templates (like Wiktionary or Wikispecies) from a Wikitext storage to a structured one. Build on top of them user interfaces to edit their local contents (and maybe also the relevant data from Wikidata) and provide nice displays and APIs to make humans and machine both able to retrieve these contents.
  • ...

[1] https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction#We_will_advance_our_world_through_knowledge.

Leila Zia Infrastructure, Knowledge as a Service, Knowledge Equity, Languages, Oral Knowledge, Research, Strategy, Trust Knowledge as a Service Research, Analytics, and Machine Learning

Title: Knowledge is our direction. What's next?

Combined knowledge as a service (KAS) and knowledge equity (KE) is identified as our strategic direction (draft). We have decided to focus on knowledge in a broader sense and beyond just encyclopedic knowledge, create KE, and become the infrastructure that offers KAS. In this position paper, I offer some of my early thoughts on where we should focus our efforts to move in this strategic direction. Given the limits of word-count, I will not go through the details of research methods and techniques that can be used to address each point.

Knowledge

As the central focus of the strategic direction is knowledge, we need to arrive at a unified working definition of knowledge. English Wikipedia defines knowledge as familiarity, awareness, or understanding of someone or something which is acquired through experience or education, by perceiving, discovering, or learning. This definition, however, is not a working definition that can help us decide what new content to include.

Research on user behavior, needs, and learning patterns can help us define knowledge.

Knowledge equity

Our goal is to remove structural inequalities that limit our ability to represent knowledge from all people and by all people. To this end, we need to meet our users where they are. Today:

  • language is a barrier to sharing in knowledge. Content should be available to our users in their languages.
  • text-only knowledge is a blocker for gathering knowledge, especially from parts of the world that are already left behind. Our systems should become technologically receptive to accepting and allowing editability of new forms of knowledge (e.g., voice for oral knowledge).
  • limits in proficiency and literacy is a blocker for our users. The content and its presentation will need to become a function of these parameters.

Knowledge as a service

Our goal is to offer KAS: both in terms of the infrastructure that supports it as well as the content of it. To do this, we need to:

  • empower our users to learn, create, and go beyond consuming content: Wikimedia projects' talk and discussion pages are an asset for building systems that can help our users think critically and learn how to deliberate. We need to do research to surface this critical thinking and step by step deliberation to gain insights from it, and share it with others as part of our KAS effort.
  • do research and development on building systems where deliberation and decision making can be possible at scale. Today, there is no such system available but one of the building blocks of KAS is infrastructure for discussion, deliberation, and decision making.
  • empower our users with ways to assess the trustworthiness of the content. Trust and reputation become especially important as we move to new forms of knowledge such as oral knowledge. We should do research to build trust and reputation models for Wikimedia and its users and understand how to surface such metrics as measures of reliability of the knowledge we serve.

Secondary

Author Tags Primary Session Secondary Sessions Position Statement
Guillaume Paumier Alternative Interfaces, Knowledge as a Service, Knowledge Equity, New Users Advancing the Contributor Experience Knowledge as a Service

The strategic direction that has emerged has two components: "Knowledge as a service" and "Knowledge equity". "Knowledge as a service", which focuses on infrastructure, seems like the one most related to technology, This proposal is about exploring the less obvious intricacies between the two components, and in particular the technology implications of Knowledge Equity.

As a complex socio-technical system, it's not really possible to separate people from technology when talking about Wikimedia. A direction of Knowledge Equity invites the contributors of the Wikimedia movement to take a critical look at themselves and assess their biases and privileges. This, in turn, can help identify structural biases that have been reproduced and ingrained in our technical platform.

For example, MediaWiki is currently doing a great job at providing a localized interface in many languages. However, beyond language, interaction design and UX patterns seem very specific to Western culture. Similarly, when our strategic direction talks about building strong and diverse communities, this invites us to consider whether the current tools available to contributors enable them to provide an environment where newcomers can experiment, be mentored, and fail safely.

Beyond software, little effort has been invested in exploring alternative interfaces beyond the connected browser. Our primary interface for contribution (the web site) may work well for middle-class contributors from Europe and North America, but isn't necessarily what enables people from other backgrounds or geographies from contributing.

These are some of the topics I would like to bring up for discussion at the Developer Summit.

Subramanya Sastry API, Knowledge as a Service, Schema.org, Structural Semantics, Templates, Wikidata Advancing the Contributor Experience Knowledge as a Service

PROBLEM

To satisfy the 'Knowledge as a service' theme, in addition to providing access to full page content, Wikimedia APIs should provide access to semantic units at: - an abstract document level (sections, headings, tables, etc.) - a domain specific level (infoboxes, geolocation, taxoboxes, etc.)

Wikitext, the core content creation technology on wikis, evolved as a string-processing language where one set of strings is replaced with another set of strings mostly via regular expression matches to yield the output HTML string. There is no notion of document structures here.

This lack of structural semantics gets in the way of being able to robustly identify semantic units and developing tools and features that operate on a page structurally at sub-page granularities.

SOLUTION: TRANSPARENT TYPING LAYER OVER WIKITEXT

Types improve abstraction, reasoning, and tooling abilities in programming languages. A transparent typing layer on top of wikitext can provide similar benefits.

A: ENFORCE STRUCTURAL TYPES ON OUTPUT OF WIKITEXT CONSTRUCTS INCLUDING TEMPLATES AND EXTENSIONS

- Specify that all wikitext constructs have an output with type:

 String, DOM, CSS property, HTML-attribute, or a List of one of those

- Extensions and templates can specify the expected output type. All other

 core wikitext constructs have the DOM output type.

- Parser enforces the output type of all wikitext constructs. Examples:

 For DOM types, unclosed tags and misnested tags are fixed up.
 For String types, HTML tags are escaped, wikitext strings are nowikied.
 For CSS types, values are sanitized.

Among other benefits, this basic typing mechanism enables MediaWiki to provide an API to extract and edit document fragments without introducing adverse side-effects on the rest of the page.

B: UNIFIED TYPING MECHANISM TO EXPOSE DOMAIN-SPECIFIC SEMANTIC INFORMATION

Editors impose structure in documents through a rich library of templates, policies, and maintainance processes they have developed over the years. If this semantic information (infoboxes, navboxes, sports rankings, railway timetables, etc) is mapped to a centralized ontology system (wikidata, schema.org, something else), the parser can expose this information in HTML and via MediaWiki APIs can expose this information in a wiki-neutral way.

There are multiple disparate mechanisms today wherein template authors specify metadata about templates (templatadata, templatestyles, possibly others?)

Instead of creating newer mechanisms for specifying structural output types and semantic information types for templates, it is better to provide a consolidated mechanism that unifies all this template metadata into a single user-defined type declaration. This lets newer applications and capabilities to be developed in the future without code changes to the core mechanism.

FEASIBILITY

This typing layer only affects template authors. Editors that use source editing won't see any impact (besides fewer markup errors). Editors that use visual editing might see improved tooling. Even for template authors, this is meant to be an opt-in mechanism with gradual migration over to the new model.

The proposal here is a logical extension of what Parsoid does today. Parsoid provides an illusion of structured wikitext and demonstrates what is possible (VE, CX, Linter, Flow among others) by embracing structured semantics.

Raimond Spekking Lua, OpenStreetMap, Wikidata Growing the MediaWiki Technical Community Knowledge as a Service

tl;dr: Empower more Wikipedians [1] in using Lua, incl. invoking data from Wikidata, and data from the Data namespace on Commons.

In the past 15 years it was relatively easy for every Wikipedian to create templates, and with the ParserFunctions to write some kind of simple program code.

In 2013 we got Lua as real and powerful programming language. Since a short time it is possible to invoke data from Wikidata and a very short time to store/read data in/from the Data namespace on Wikimedia Commons.

These are all good improvements and increases the possibilities in adding valuable data/information to the (often) high quality content of the German Wikipedia. But these techniques increases the requirements in technical knowledge to the Wikipedians. In other words: Substantial less Wikipedians are able to write Lua module than creating templates.

As an active community member of the German Wikipedia and Wikimedia Commons (and Wikidata) I see a real bottleneck in the situation, that we do not have enough Wikipedians with skills in Lua and the possibitiy how to invoke data from Wikidata & Co.

Checking the Module namespace via RecentChanges for September 2017 on the German Wikipedia: Only 13 Wikipedians contributed to Lua code. Often I read on the Village Pump and related pages: "Sorry, but my to-do-list as volunteer is full until end of the year, I do not have the capacity to solve your problem." and so on. This is very frustrating for all sides.

Other cool [3] projects, like integration of Maps via Kartographer, stucks because the German Wikipedia has not enough man power for these works

As a result of this notice we have to create some help to empower more Wikipedians in using Lua, incl. invoking data from Wikidata, and data from the Data namespace on Commons.

In Germany, Austria, and Switzerland it probably easier than in other countries because we can offer real-live workshops. In 2018 the team of the "Wikipedia:Lokal K" [2], a real-live supporting base for Wikipedia & Co, is planning, with the support of WMDE, some workshops for Wikidata. Same could be done for supporting Lua & Co.

As originator of the "Technical Wishlist" [4] I am thinking about adding support for Lua modules & Co to the wishlist.

On the summit I would like to discuss these and other solutions for this bottleneck.


[1] to be read as users of all sister projects

[2] https://de.wikipedia.org/wiki/Wikipedia:Lokal_K

[3] My POV as active OpenStreetMap community member

[4] https://meta.wikimedia.org/wiki/Community_Tech/Community_Wishlist_Survey_description

   Originated by me in September 2013: https://de.wikipedia.org/wiki/Wikipedia:Technische_W%C3%BCnsche/%C3%9Cber_das_Projekt
Madhumitha Viswanathan API, Complexity, Documentation, Knowledge as a Service, Tools Evolving the MediaWiki Architecture Knowledge as a Service

We are in the business of democratizing knowledge, and I believe that lowering and removing technical barriers to entry, and creating a culture of inclusion in our technical spaces is essential to our success.'

The Knowledge as a Service aspect of our strategic directions focuses on building infrastructure and platforms that help create and share open knowledge. The key to successfully building and scaling such infrastructure, in the context of the Wikimedia Technical spaces, is enabling everyone, irrespective of their experience or backgrounds to be able to utilize and create research, data, and tools on top of our infrastructure. When designing infrastructure and other technical products, we often fail to take into account technical barriers, inessential complexities and social costs that can discourage or prevent people from being able to leverage them. For instance, is it enough to build a dataset and store it in a database, if we do not provide friendly ways for researchers to access and analyze this data? Is it enough to put out a call to contribute to a project, but not provide easy-to-setup development environments to be able to test changes? Is it sufficient to have a state of the art environment to host applications, but not design good, simple processes around gaining access and deploying to them?

These conversations are crucial, because we are not building products for technology's sake, but are in fact trying to build a culture where it is easy to use and contribute to our technical projects, whether you are a volunteer who has a few hours to spare or a paid employee; a newcomer or a long time contributor. We also want our technical communities to be diverse, and these complex systems and processes, and unsaid social constructs around how to interact with our projects, often bias against traditionally underrepresented populations in technology.

I have always worked on or pushed for creating and supporting simple graphical interfaces that provide unified access to data sources, building platforms and processes that lets people just create tools/APIs/dashboards and be able to painlessly host them, developing tutorials and good documentation for getting involved in our projects, and codifying friendly and inclusive social norms and promoting a culture of being excellent to each other in our technical spaces.

When talking about the future directions of new and existing projects, we should take into account the costs and barriers to access, and who we may be failing to include as a result. I hope to be this voice in the Developer Summit.