Wikidata

From devsummit

7 statements.

Author Tags Primary Session Secondary Sessions Position Statement
Adam Baso Machine Learning, Mobile, Schema.org, Structured Data, Templates, Wikibase, Wikidata Knowledge as a Service Advancing the Contributor Experience, Research, Analytics, and Machine Learning

Structure Most Things with Schema.org

The future of digital information will likely be brokered by major platform providers such as Google, Apple, Amazon, Microsoft, and international equivalents and social networks. We're thankful they extend our reach, even as we seek to help consumers on the platforms join our movement.

We could help platform providers, their users, and our users solve problems better through adoption of the open standard Schema.org into Wikipedia pages mapped with templates and, ideally, federated and synchronized Wikidata properties.

Benefits:

  1. Wikipedia will have even better presentation and placement in search engines and other data rich experiences.
  2. We provide an opportunity for a more consistent data model for template authors and people/bots filling template values. And the richly defined Schema.org entities provide a good target to reach on all entities represented in the Wikipedia/Wikimedia corpora. Standardization can reduce duplication of effort and inconsistencies.
  3. We introduce an easier vector for mobile contribution, which could include simpler and different data entry, mapping, and modeling.
  4. We can elevate an open standard and push its adoption forward while increasing the movement's standing in the open standards community.
  5. Schema.org compliant data is more easily amenable to machine learning models that cover data structures, the relations between entities, and the dynamics of sociotechnical systems. This could bolster practical applications like vandalism detection, coverage analysis, and much more.
  6. This might provide a means for the education sector to educate students about knowledge creation, and data modeling, and more. It might also afford scientists and other practitioners a further standardized way to model the knowledge in their fields.

What would it take? And can this be done in harmony with the existing {{Template}} system?

This session will discuss the following:

  1. Are we aligned on the benefits, and which ones?
  2. Implementation options.
    1. Can we extend templates so they could be mapped to Schema.org?
      1. Would it be okay to derive the mapping by manual and automated analysis at WMF/WMDE and apply it behind the scenes? Would that be sustainable?
      2. Could we make it easy for template authors to mark up their templates for Schema.org compatibility and have some level of enforcement? Could Schema.org attributes and entity types be autosuggested for template creators?
    2. Is it easy to relate the most existing and proposed Wikidata entity types and properties to existing Schema.org entities and properties?
    3. What would it take to streamline MCR Schema.org data structures or MCR Wikibase property clusters mapped to Schema.org on defined entity types?
    4. Furthermore, if we can do #1 and #2, what's to prevent us from letting templates as is merely be the interface for Schema.org compliant Wikibase entities and properties (e.g., by duck typing / autosynthesis)?
    5. How could we bidirectionally synchronized between Wikipedia and Wikidata with confidence in a way compatible with patroller expectations? And what storage and event processing would be needed? Can the systems be scaled in a way to accommodate arrival of real-time and increasingly fine grained information?
Marius Hoch Third Parties, Wikidata Knowledge as a Service

Making use of Wikidata's knowledge on Wikimedia projects and beyond

It is foreseeable that the way our content will be consumed is going to change a lot in the next years, as both the demographics of the internet as well as the devices used by our readers are changing. We should try to adapt to this by offering our content in ways best suited for many user scenarios. In order to achieve this, we need to modularize and structure our content so that it can be easily re-interpreted and used in many different ways.

Wikidata gives us the possibility to easily cater for this trend, by providing machine readable data about any subject, which can be formatted and presented in a wide variety of ways and languages. Wikidata makes it possible to more easily maintain up to date information on subjects in many languages, without the burden of manual data maintenance.

We should strengthen this by improving the integration of Wikidata with other Wikimedia projects, providing easy ways to use and profit from Wikidata especially for small communities and by making the power of Wikidata more visible. While all Wikimedia communities will profit from this, it can be especially worthwhile for small communities, that currently don't have the resources for managing data, like Infoboxes, themselves.

Example projects that will help in this area are the 'ArticlePlaceholder', that allows serving Wikidata-data about a certain subject if there's no article about it. Also the plans for automatic Infoboxes derived solely from Wikidata and other means of using Wikidata-data on Wikimedia projects. While both of these projects can have a big community impact, they need to fit in with the current infrastructure. Also they pose certain new scalability and data presentation challenges that need to be addressed.

Furthermore, Wikidata's information should be easy to reuse by third party projects to increase visibility and in order to gain contributions and data donations, making Wikidata the true data hub of the internet. This goal raises longstanding issues with the current Wikimedia dump infrastructure, which is neither very flexible nor does it provide a machine readable interface for data consumers. Also bringing more individual editors and organizations into Wikidata poses various infrastructure and scalability issues coping with the sheer amount of data and changes happening, as well as providing convenient tools for establishing and maintaining data quality.

Lucie-Aimée Kaffee Languages, Machine Learning, Translation, Wikidata Next Steps for Languages and Cross Project Collaboration Research, Analytics, and Machine Learning

Languages in the world of Wikimedia

One of the central topics of Wikimedia's world is languages. Currently, we cover around 290 languages in most projects, more or less well covered. In theory, all information in Wikipedia can be replicated and connected, so that different culture's knowledge is interlinked and accessible no matter which language you speak. In reality however, this can be tricky. The authors of [1] show, that even English Wikipedia's content is in big parts not represented in other languages, even in other big Wikipedias. And the other way around: The content in underserved languages is often not covered in English Wikipedia. A possible solution is translation by the community as done with the content translation tool [2]. Nevertheless, that means translation of all language articles into all other languages, which is an effort that's never ending and especially for small language communities barely feasible. And it's not only all about Wikipedia- the other Wikimedia projects will need a similar effort! Another approach for a better coverage of languages in Wikipedia is the ArticlePlaceholder [3]. Using Wikidata's inherently multi- and cross-lingual structure, AP displays data in a readable format on Wikipedias, in their language. However, even Wikipedia has a lack of support for languages as we were able to show in [4]. The question is therefore, how can we get more multilingual data into Wikidata, using the tools and resources we already have, and eventually how to reuse Wikidata's data on Wikipedia and other Wikimedia projects in order to support under-resourced language communities and enable them to access information in their language easier. Accessible content in a language will eventually also mean they are encouraged to contribute to the knowledge. Currently, we investigate machine learning tools in order to support the display of data and the gathering of new multilingual labels for information in Wikidata. It can be assumed, that over the coming years, language accessibility will be one of the key topics for Wikimedia and its projects and it is therefore important to already invest in the topic and enable an exchange about it.


[1] Hecht, B., & Gergle, D. (2010, April). The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 291-300). ACM. [2] https://en.wikipedia.org/wiki/Wikipedia:Content_translation_tool [3] https://commons.wikimedia.org/wiki/File:Generating_Article_Placeholders_from_Wikidata_for_Wikipedia_-_Increasing_Access_to_Free_and_Open_Knowledge.pdf [4] https://eprints.soton.ac.uk/413433/

Thomas Pellissier Tanon Analytics, API, Collaboration, JavaScript, Lua, Mobile, Multimedia, Structured Data, User Experience, Wikibase, Wikidata Knowledge as a Service

Title: Content structuration and metadata are keys to fulfil our strategy

Content: The Wikimedia mouvement strategy is making a focus on serving more different kinds of knowledge and sharing them with allies and partners [1]. I believe that the most important ground work for reaching these goals is to focus on the outgoing project of moving MediaWiki from a "wikitext plus media file" collaboration system to a platform allowing people to be able to collaborate on many kind of contents and to organise them in a cohesive way.

Two axes seem important to me to pursue this goal: 1. Support a broader set of different contents, not just wikitext, Wikibase items and Lua/JavaScript/CSS contents but also images, sounds, movies, books, etc., that should bee editable just like wiki text pages in order to allow people to improve them in a collaborative way. 2. Build platforms and tools allowing contributors to create and clean metadata about these contents in order to build together the broadest cohesive set of knowledge ever available and increase its reusability.

Going in these directions would allow us to:

  • Allow sister projects (and possible new ones) to use relevant content structure for their projects instead of a designed-for-everything wiki markup. It should lead to an increase of their reusability and their user-friendliness, just like what the Structured Metadata on Commons project is aiming for.
  • Build powerful APIs to retrieve and edit content just like Wikidata has and so, make working with partners easier.
  • Increase the connections between our contents and their discovery using their metadata
  • Build better tasked-focused mobile viewing and edit interfaces
  • Be more ready for the possible new environment changes like voice-powered interfaces


Some examples of projects we could work on in order to move in this direction:

  • Use the multiple content revision facility to migrate progressively the data that could be structured out of Wikitext on all our projects (like the structured metadata on Commons project is aiming for files)
  • Federate all our structured content into a "Wikimedia Query Service" that would allow to do unified and powerful analytics and to ease the discoverability of all our contents
  • The logical granularity of Wikisource, Wikibooks and Wikiversity contents (and maybe other projects?) is not the wiki page but the set of wiki pages storing a book or a course. MediaWiki should be able to support such use cases by providing a "collection" system allowing us to add metadata and to do operations (renaming, add to watchlist) on sets of wiki pages.
  • Switch projects that stores fairly structured data in wiki text templates (like Wiktionary or Wikispecies) from a Wikitext storage to a structured one. Build on top of them user interfaces to edit their local contents (and maybe also the relevant data from Wikidata) and provide nice displays and APIs to make humans and machine both able to retrieve these contents.
  • ...

[1] https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction#We_will_advance_our_world_through_knowledge.

Lydia Pintscher Collaboration, Wikidata Next Steps for Languages and Cross Project Collaboration

Breaking Down Barriers To Cross-Project Collaboration


"... a world in which every single human being can freely share in the sum of all knowledge."

Wikipedia has been our flagship for many years now and our main means of achieving our vision. As information consumption and expectations of our readers are changing, Wikipedia needs to adapt. One crucial building block for this adaption is re-using and integrating more content from the other Wikimedia projects and other language versions of Wikipedia. Connecting our projects more is vital for helping especially our smaller communities serve their readers. Surfacing more content from the other Wikimedia projects also gives them a chance to shine, find their audience and do their part in sharing in the sum of all knowledge.

This integration comes with a lot of challenges. Over many years the Wikipedias have lived largely independent from each other and the other Wikimedia projects. This is changing. Sharing and benefiting for example from data on Wikidata means collaborating with people from potentially very different projects, speaking different languages. It brings a perceived loss of local control. Editors see them-self first and foremost as editors of "their" Wikipedia at the moment and often don't perceive this integration as worth the effort - especially on the larger projects.

We need to address this on both the social and technical level if we want to bring our projects closer together and have them benefit from each other's strength and compensate their weaknesses.

We need to think about and find answers to these questions: What can we do in order to bring our projects together more closely? How can we help break down perceived and real barriers for cross-project work? What can we do to make cross-project collaboration easier?

Subramanya Sastry API, Knowledge as a Service, Schema.org, Structural Semantics, Templates, Wikidata Advancing the Contributor Experience Knowledge as a Service

PROBLEM

To satisfy the 'Knowledge as a service' theme, in addition to providing access to full page content, Wikimedia APIs should provide access to semantic units at: - an abstract document level (sections, headings, tables, etc.) - a domain specific level (infoboxes, geolocation, taxoboxes, etc.)

Wikitext, the core content creation technology on wikis, evolved as a string-processing language where one set of strings is replaced with another set of strings mostly via regular expression matches to yield the output HTML string. There is no notion of document structures here.

This lack of structural semantics gets in the way of being able to robustly identify semantic units and developing tools and features that operate on a page structurally at sub-page granularities.

SOLUTION: TRANSPARENT TYPING LAYER OVER WIKITEXT

Types improve abstraction, reasoning, and tooling abilities in programming languages. A transparent typing layer on top of wikitext can provide similar benefits.

A: ENFORCE STRUCTURAL TYPES ON OUTPUT OF WIKITEXT CONSTRUCTS INCLUDING TEMPLATES AND EXTENSIONS

- Specify that all wikitext constructs have an output with type:

 String, DOM, CSS property, HTML-attribute, or a List of one of those

- Extensions and templates can specify the expected output type. All other

 core wikitext constructs have the DOM output type.

- Parser enforces the output type of all wikitext constructs. Examples:

 For DOM types, unclosed tags and misnested tags are fixed up.
 For String types, HTML tags are escaped, wikitext strings are nowikied.
 For CSS types, values are sanitized.

Among other benefits, this basic typing mechanism enables MediaWiki to provide an API to extract and edit document fragments without introducing adverse side-effects on the rest of the page.

B: UNIFIED TYPING MECHANISM TO EXPOSE DOMAIN-SPECIFIC SEMANTIC INFORMATION

Editors impose structure in documents through a rich library of templates, policies, and maintainance processes they have developed over the years. If this semantic information (infoboxes, navboxes, sports rankings, railway timetables, etc) is mapped to a centralized ontology system (wikidata, schema.org, something else), the parser can expose this information in HTML and via MediaWiki APIs can expose this information in a wiki-neutral way.

There are multiple disparate mechanisms today wherein template authors specify metadata about templates (templatadata, templatestyles, possibly others?)

Instead of creating newer mechanisms for specifying structural output types and semantic information types for templates, it is better to provide a consolidated mechanism that unifies all this template metadata into a single user-defined type declaration. This lets newer applications and capabilities to be developed in the future without code changes to the core mechanism.

FEASIBILITY

This typing layer only affects template authors. Editors that use source editing won't see any impact (besides fewer markup errors). Editors that use visual editing might see improved tooling. Even for template authors, this is meant to be an opt-in mechanism with gradual migration over to the new model.

The proposal here is a logical extension of what Parsoid does today. Parsoid provides an illusion of structured wikitext and demonstrates what is possible (VE, CX, Linter, Flow among others) by embracing structured semantics.

Raimond Spekking Lua, OpenStreetMap, Wikidata Growing the MediaWiki Technical Community Knowledge as a Service

tl;dr: Empower more Wikipedians [1] in using Lua, incl. invoking data from Wikidata, and data from the Data namespace on Commons.

In the past 15 years it was relatively easy for every Wikipedian to create templates, and with the ParserFunctions to write some kind of simple program code.

In 2013 we got Lua as real and powerful programming language. Since a short time it is possible to invoke data from Wikidata and a very short time to store/read data in/from the Data namespace on Wikimedia Commons.

These are all good improvements and increases the possibilities in adding valuable data/information to the (often) high quality content of the German Wikipedia. But these techniques increases the requirements in technical knowledge to the Wikipedians. In other words: Substantial less Wikipedians are able to write Lua module than creating templates.

As an active community member of the German Wikipedia and Wikimedia Commons (and Wikidata) I see a real bottleneck in the situation, that we do not have enough Wikipedians with skills in Lua and the possibitiy how to invoke data from Wikidata & Co.

Checking the Module namespace via RecentChanges for September 2017 on the German Wikipedia: Only 13 Wikipedians contributed to Lua code. Often I read on the Village Pump and related pages: "Sorry, but my to-do-list as volunteer is full until end of the year, I do not have the capacity to solve your problem." and so on. This is very frustrating for all sides.

Other cool [3] projects, like integration of Maps via Kartographer, stucks because the German Wikipedia has not enough man power for these works

As a result of this notice we have to create some help to empower more Wikipedians in using Lua, incl. invoking data from Wikidata, and data from the Data namespace on Commons.

In Germany, Austria, and Switzerland it probably easier than in other countries because we can offer real-live workshops. In 2018 the team of the "Wikipedia:Lokal K" [2], a real-live supporting base for Wikipedia & Co, is planning, with the support of WMDE, some workshops for Wikidata. Same could be done for supporting Lua & Co.

As originator of the "Technical Wishlist" [4] I am thinking about adding support for Lua modules & Co to the wishlist.

On the summit I would like to discuss these and other solutions for this bottleneck.


[1] to be read as users of all sister projects

[2] https://de.wikipedia.org/wiki/Wikipedia:Lokal_K

[3] My POV as active OpenStreetMap community member

[4] https://meta.wikimedia.org/wiki/Community_Tech/Community_Wishlist_Survey_description

   Originated by me in September 2013: https://de.wikipedia.org/wiki/Wikipedia:Technische_W%C3%BCnsche/%C3%9Cber_das_Projekt