Structured Data
5 statements.
Author | Tags | Primary Session | Secondary Sessions | Position Statement |
---|---|---|---|---|
Adam Baso | Machine Learning, Mobile, Schema.org, Structured Data, Templates, Wikibase, Wikidata | Knowledge as a Service | Advancing the Contributor Experience, Research, Analytics, and Machine Learning |
Structure Most Things with Schema.orgThe future of digital information will likely be brokered by major platform providers such as Google, Apple, Amazon, Microsoft, and international equivalents and social networks. We're thankful they extend our reach, even as we seek to help consumers on the platforms join our movement. We could help platform providers, their users, and our users solve problems better through adoption of the open standard Schema.org into Wikipedia pages mapped with templates and, ideally, federated and synchronized Wikidata properties. Benefits:
What would it take? And can this be done in harmony with the existing {{Template}} system? This session will discuss the following:
|
Erik Bernhardson | Analytics, Collaboration, Machine Learning, Open Source, Privacy, Structured Data | Research, Analytics, and Machine Learning |
Title: Empowering Editors with Machine Learning Background: Advances in machine learning, powered by open source libraries, is becoming the foundational backbone of technology organizations the world over. Many tedious, time consuming, tasks that previously required 100% human involvement can now be augmented with human in the loop machine learning to empower editors to get more done with the limited time they have available to contribute to the sum of all human knowledge. Advice: 1) Invest directly in applying known quantity machine learning, such as pre-trained ImageNet classifiers, to add structured data to our multimedia repositories to increase their discoverability. Perhaps via tools that provide editors with lists of appropriate items that they can easily click to add if appropriate to the multimedia. 2) Engage academia to work with Wikimedia data sets and employ developers to move the most promising results from research into production. There is already a significant amount of work being done in academia to test and evaluate machine learning with our data sets, but little to none of that work ever makes it back into Wikimedia sites. With more focus on collaboration we can encourage research that is specifically applicable to deployment goals. 3)Wikimedia has the ability to collect significant amounts of implicit user data via browsing sessions, searches, watchlists, editing histories, etc. that can be used for machine learning purposes. We need to be continuously thoughtful of the privacy implications of how we use this data. | |
Anne Gomez | Multimedia, Strategy, Structured Data | Advancing the Contributor Experience |
Wikimedia properties need to keep pace with the norms of browsing and information consuming behavior to stay relevant, grow readership, and bring new editors to add their knowledge to repository. We need to support smaller content types - both for contributions and for consumption. At the same time, we need to support multimedia content, from video to interactive graphics to augmented reality. Structured data will allow us to be more flexible in our presentation of information, and create more complex interactions with that information. Video and audio will open the doors to new contributors and new projects. Content consumers online now, whether among the highly connected or using the internet for the first time, are looking for the right information available to them at the right time. They don't necessarily want long, encyclopedic content, but instead prefer snippets of information served to them just when they need it. And they learn through more immersive experiences - video, augmented reality, interactive graphics - rather than long form text. Even beyond that, huge portions of the world can't access our content for a number of reasons: they don't have internet access, they can't read, their languages don't have keyboard support, there isn't content in their language. The internet as a whole is evolving to meet these changing needs. Messaging apps support walkie-talkie like communication, Google serves just the right answer to any question (in English), and language support for smaller languages is growing cross-platform. Our infrastructure needs to meet these needs. | |
Ryan Kaldari | Gadgets, Languages, Structured Data, Wiktionary | Next Steps for Languages and Cross Project Collaboration |
How should MediaWiki evolve to support the mission? One of the greatest barriers to the spread of human knowledge is the barrier of language. While Wikipedia does a great job of supporting hundreds of languages, the amount of content available in most language Wikipedias is still paltry and has a small impact on the knowledge available to speakers of those languages. For a huge percentage of the world's population, the key to unlocking knowledge isn't discovering Wikipedia, but learning new languages. Even for English speakers, the impact of learning a new language can be life-changing and open up many new opportunities. The Wikimedia Foundation is the steward of one of the greatest repositories of information about language in human history, Wiktionary. Unlike all other dictionaries on Earth, Wiktionary aims to define (in 172 languages) all words from all languages. In other words, not just defining English words in English and French words in French, but also French words in English, English words in French, Latin words in Swahili, Mopan Maya words in Arabic, etc. It's ambitious aim is to be the ultimate Rosetta Stone for the human species. While Wikipedia is in some respects maturing and gradually yielding diminishing returns for more investment, Wiktionary is still a small and growing project that has yet to fulfill its potential or break into mainstream consciousness the way that Wikipedia has. While one of the impediments to Wiktionary reaching its potential is lack of structured data support, which is being worked on, there are many improvements that could be made in the meantime to improve the usefulness of the site to both readers and editors. These include converting many of the fragile gadgets and site scripts into maintainable extensions, customizing the user interface to more closely match what users expect from a dictionary site, and adding dictionary-specific tools to the editing interface. There is also unexplored potential with building apps around the Wiktionary data, including apps tailored around language learning. Now that the Wikimedia Foundation has nearly 100 software engineers (and dozens of volunteer developers), it should explore the potential of its lesser known projects, especially Wiktionary, which has the potential to actually make a large impact on the Foundation's mission and bring more of the sum of human knowledge to more people around the globe. | |
Thomas Pellissier Tanon | Analytics, API, Collaboration, JavaScript, Lua, Mobile, Multimedia, Structured Data, User Experience, Wikibase, Wikidata | Knowledge as a Service |
Title: Content structuration and metadata are keys to fulfil our strategy Content: The Wikimedia mouvement strategy is making a focus on serving more different kinds of knowledge and sharing them with allies and partners [1]. I believe that the most important ground work for reaching these goals is to focus on the outgoing project of moving MediaWiki from a "wikitext plus media file" collaboration system to a platform allowing people to be able to collaborate on many kind of contents and to organise them in a cohesive way. Two axes seem important to me to pursue this goal: 1. Support a broader set of different contents, not just wikitext, Wikibase items and Lua/JavaScript/CSS contents but also images, sounds, movies, books, etc., that should bee editable just like wiki text pages in order to allow people to improve them in a collaborative way. 2. Build platforms and tools allowing contributors to create and clean metadata about these contents in order to build together the broadest cohesive set of knowledge ever available and increase its reusability. Going in these directions would allow us to:
|