Wikidata
7 statements.
Author | Tags | Primary Session | Secondary Sessions | Position Statement |
---|---|---|---|---|
Adam Baso | Machine Learning, Mobile, Schema.org, Structured Data, Templates, Wikibase, Wikidata | Knowledge as a Service | Advancing the Contributor Experience, Research, Analytics, and Machine Learning |
Structure Most Things with Schema.orgThe future of digital information will likely be brokered by major platform providers such as Google, Apple, Amazon, Microsoft, and international equivalents and social networks. We're thankful they extend our reach, even as we seek to help consumers on the platforms join our movement. We could help platform providers, their users, and our users solve problems better through adoption of the open standard Schema.org into Wikipedia pages mapped with templates and, ideally, federated and synchronized Wikidata properties. Benefits:
What would it take? And can this be done in harmony with the existing {{Template}} system? This session will discuss the following:
|
Marius Hoch | Third Parties, Wikidata | Knowledge as a Service |
Making use of Wikidata's knowledge on Wikimedia projects and beyond It is foreseeable that the way our content will be consumed is going to change a lot in the next years, as both the demographics of the internet as well as the devices used by our readers are changing. We should try to adapt to this by offering our content in ways best suited for many user scenarios. In order to achieve this, we need to modularize and structure our content so that it can be easily re-interpreted and used in many different ways. Wikidata gives us the possibility to easily cater for this trend, by providing machine readable data about any subject, which can be formatted and presented in a wide variety of ways and languages. Wikidata makes it possible to more easily maintain up to date information on subjects in many languages, without the burden of manual data maintenance. We should strengthen this by improving the integration of Wikidata with other Wikimedia projects, providing easy ways to use and profit from Wikidata especially for small communities and by making the power of Wikidata more visible. While all Wikimedia communities will profit from this, it can be especially worthwhile for small communities, that currently don't have the resources for managing data, like Infoboxes, themselves. Example projects that will help in this area are the 'ArticlePlaceholder', that allows serving Wikidata-data about a certain subject if there's no article about it. Also the plans for automatic Infoboxes derived solely from Wikidata and other means of using Wikidata-data on Wikimedia projects. While both of these projects can have a big community impact, they need to fit in with the current infrastructure. Also they pose certain new scalability and data presentation challenges that need to be addressed. Furthermore, Wikidata's information should be easy to reuse by third party projects to increase visibility and in order to gain contributions and data donations, making Wikidata the true data hub of the internet. This goal raises longstanding issues with the current Wikimedia dump infrastructure, which is neither very flexible nor does it provide a machine readable interface for data consumers. Also bringing more individual editors and organizations into Wikidata poses various infrastructure and scalability issues coping with the sheer amount of data and changes happening, as well as providing convenient tools for establishing and maintaining data quality. | |
Lucie-Aimée Kaffee | Languages, Machine Learning, Translation, Wikidata | Next Steps for Languages and Cross Project Collaboration | Research, Analytics, and Machine Learning |
Languages in the world of Wikimedia One of the central topics of Wikimedia's world is languages. Currently, we cover around 290 languages in most projects, more or less well covered. In theory, all information in Wikipedia can be replicated and connected, so that different culture's knowledge is interlinked and accessible no matter which language you speak. In reality however, this can be tricky. The authors of [1] show, that even English Wikipedia's content is in big parts not represented in other languages, even in other big Wikipedias. And the other way around: The content in underserved languages is often not covered in English Wikipedia. A possible solution is translation by the community as done with the content translation tool [2]. Nevertheless, that means translation of all language articles into all other languages, which is an effort that's never ending and especially for small language communities barely feasible. And it's not only all about Wikipedia- the other Wikimedia projects will need a similar effort! Another approach for a better coverage of languages in Wikipedia is the ArticlePlaceholder [3]. Using Wikidata's inherently multi- and cross-lingual structure, AP displays data in a readable format on Wikipedias, in their language. However, even Wikipedia has a lack of support for languages as we were able to show in [4]. The question is therefore, how can we get more multilingual data into Wikidata, using the tools and resources we already have, and eventually how to reuse Wikidata's data on Wikipedia and other Wikimedia projects in order to support under-resourced language communities and enable them to access information in their language easier. Accessible content in a language will eventually also mean they are encouraged to contribute to the knowledge. Currently, we investigate machine learning tools in order to support the display of data and the gathering of new multilingual labels for information in Wikidata. It can be assumed, that over the coming years, language accessibility will be one of the key topics for Wikimedia and its projects and it is therefore important to already invest in the topic and enable an exchange about it.
|
Thomas Pellissier Tanon | Analytics, API, Collaboration, JavaScript, Lua, Mobile, Multimedia, Structured Data, User Experience, Wikibase, Wikidata | Knowledge as a Service |
Title: Content structuration and metadata are keys to fulfil our strategy Content: The Wikimedia mouvement strategy is making a focus on serving more different kinds of knowledge and sharing them with allies and partners [1]. I believe that the most important ground work for reaching these goals is to focus on the outgoing project of moving MediaWiki from a "wikitext plus media file" collaboration system to a platform allowing people to be able to collaborate on many kind of contents and to organise them in a cohesive way. Two axes seem important to me to pursue this goal: 1. Support a broader set of different contents, not just wikitext, Wikibase items and Lua/JavaScript/CSS contents but also images, sounds, movies, books, etc., that should bee editable just like wiki text pages in order to allow people to improve them in a collaborative way. 2. Build platforms and tools allowing contributors to create and clean metadata about these contents in order to build together the broadest cohesive set of knowledge ever available and increase its reusability. Going in these directions would allow us to:
| |
Lydia Pintscher | Collaboration, Wikidata | Next Steps for Languages and Cross Project Collaboration |
Breaking Down Barriers To Cross-Project Collaboration
Wikipedia has been our flagship for many years now and our main means of achieving our vision. As information consumption and expectations of our readers are changing, Wikipedia needs to adapt. One crucial building block for this adaption is re-using and integrating more content from the other Wikimedia projects and other language versions of Wikipedia. Connecting our projects more is vital for helping especially our smaller communities serve their readers. Surfacing more content from the other Wikimedia projects also gives them a chance to shine, find their audience and do their part in sharing in the sum of all knowledge. This integration comes with a lot of challenges. Over many years the Wikipedias have lived largely independent from each other and the other Wikimedia projects. This is changing. Sharing and benefiting for example from data on Wikidata means collaborating with people from potentially very different projects, speaking different languages. It brings a perceived loss of local control. Editors see them-self first and foremost as editors of "their" Wikipedia at the moment and often don't perceive this integration as worth the effort - especially on the larger projects. We need to address this on both the social and technical level if we want to bring our projects closer together and have them benefit from each other's strength and compensate their weaknesses. We need to think about and find answers to these questions: What can we do in order to bring our projects together more closely? How can we help break down perceived and real barriers for cross-project work? What can we do to make cross-project collaboration easier? | |
Subramanya Sastry | API, Knowledge as a Service, Schema.org, Structural Semantics, Templates, Wikidata | Advancing the Contributor Experience | Knowledge as a Service |
PROBLEM To satisfy the 'Knowledge as a service' theme, in addition to providing access to full page content, Wikimedia APIs should provide access to semantic units at: - an abstract document level (sections, headings, tables, etc.) - a domain specific level (infoboxes, geolocation, taxoboxes, etc.) Wikitext, the core content creation technology on wikis, evolved as a string-processing language where one set of strings is replaced with another set of strings mostly via regular expression matches to yield the output HTML string. There is no notion of document structures here. This lack of structural semantics gets in the way of being able to robustly identify semantic units and developing tools and features that operate on a page structurally at sub-page granularities. SOLUTION: TRANSPARENT TYPING LAYER OVER WIKITEXT Types improve abstraction, reasoning, and tooling abilities in programming languages. A transparent typing layer on top of wikitext can provide similar benefits. A: ENFORCE STRUCTURAL TYPES ON OUTPUT OF WIKITEXT CONSTRUCTS INCLUDING TEMPLATES AND EXTENSIONS - Specify that all wikitext constructs have an output with type: String, DOM, CSS property, HTML-attribute, or a List of one of those - Extensions and templates can specify the expected output type. All other core wikitext constructs have the DOM output type. - Parser enforces the output type of all wikitext constructs. Examples: For DOM types, unclosed tags and misnested tags are fixed up. For String types, HTML tags are escaped, wikitext strings are nowikied. For CSS types, values are sanitized. Among other benefits, this basic typing mechanism enables MediaWiki to provide an API to extract and edit document fragments without introducing adverse side-effects on the rest of the page. B: UNIFIED TYPING MECHANISM TO EXPOSE DOMAIN-SPECIFIC SEMANTIC INFORMATION Editors impose structure in documents through a rich library of templates, policies, and maintainance processes they have developed over the years. If this semantic information (infoboxes, navboxes, sports rankings, railway timetables, etc) is mapped to a centralized ontology system (wikidata, schema.org, something else), the parser can expose this information in HTML and via MediaWiki APIs can expose this information in a wiki-neutral way. There are multiple disparate mechanisms today wherein template authors specify metadata about templates (templatadata, templatestyles, possibly others?) Instead of creating newer mechanisms for specifying structural output types and semantic information types for templates, it is better to provide a consolidated mechanism that unifies all this template metadata into a single user-defined type declaration. This lets newer applications and capabilities to be developed in the future without code changes to the core mechanism. FEASIBILITY This typing layer only affects template authors. Editors that use source editing won't see any impact (besides fewer markup errors). Editors that use visual editing might see improved tooling. Even for template authors, this is meant to be an opt-in mechanism with gradual migration over to the new model. The proposal here is a logical extension of what Parsoid does today. Parsoid provides an illusion of structured wikitext and demonstrates what is possible (VE, CX, Linter, Flow among others) by embracing structured semantics. |
Raimond Spekking | Lua, OpenStreetMap, Wikidata | Growing the MediaWiki Technical Community | Knowledge as a Service |
tl;dr: Empower more Wikipedians [1] in using Lua, incl. invoking data from Wikidata, and data from the Data namespace on Commons. In the past 15 years it was relatively easy for every Wikipedian to create templates, and with the ParserFunctions to write some kind of simple program code. In 2013 we got Lua as real and powerful programming language. Since a short time it is possible to invoke data from Wikidata and a very short time to store/read data in/from the Data namespace on Wikimedia Commons. These are all good improvements and increases the possibilities in adding valuable data/information to the (often) high quality content of the German Wikipedia. But these techniques increases the requirements in technical knowledge to the Wikipedians. In other words: Substantial less Wikipedians are able to write Lua module than creating templates. As an active community member of the German Wikipedia and Wikimedia Commons (and Wikidata) I see a real bottleneck in the situation, that we do not have enough Wikipedians with skills in Lua and the possibitiy how to invoke data from Wikidata & Co. Checking the Module namespace via RecentChanges for September 2017 on the German Wikipedia: Only 13 Wikipedians contributed to Lua code. Often I read on the Village Pump and related pages: "Sorry, but my to-do-list as volunteer is full until end of the year, I do not have the capacity to solve your problem." and so on. This is very frustrating for all sides. Other cool [3] projects, like integration of Maps via Kartographer, stucks because the German Wikipedia has not enough man power for these works As a result of this notice we have to create some help to empower more Wikipedians in using Lua, incl. invoking data from Wikidata, and data from the Data namespace on Commons. In Germany, Austria, and Switzerland it probably easier than in other countries because we can offer real-live workshops. In 2018 the team of the "Wikipedia:Lokal K" [2], a real-live supporting base for Wikipedia & Co, is planning, with the support of WMDE, some workshops for Wikidata. Same could be done for supporting Lua & Co. As originator of the "Technical Wishlist" [4] I am thinking about adding support for Lua modules & Co to the wishlist. On the summit I would like to discuss these and other solutions for this bottleneck.
[1] to be read as users of all sister projects [2] https://de.wikipedia.org/wiki/Wikipedia:Lokal_K [3] My POV as active OpenStreetMap community member [4] https://meta.wikimedia.org/wiki/Community_Tech/Community_Wishlist_Survey_description Originated by me in September 2013: https://de.wikipedia.org/wiki/Wikipedia:Technische_W%C3%BCnsche/%C3%9Cber_das_Projekt |