Property:Statement
From devsummit
W
Wikimedia should diversify its distribution methods.
Currently Wikimedia distributes its content almost exclusively using
the Internet. However, the Internet is controlled by gate keepers
in the form of governments and ISPs. While historically these
entities rarely controlled the flow of information, more recently
we have seen an increase in censorship, particularly by governments.
Since Wikimedia is distributed almost entirely over the Internet, we
are vulnerable to their whims.
The risk of having our distribution lines interfered with, is an
existential threat to our mission. While at present time,
only a few geographic locations practise such interference, the
future is unknowable and does not appear to be heading in a comforting
direction.
Furthermore, in the face of such interference, there is very little we
can do. TOR is often spoken as a solution to censorship, but any such
on-Internet system will either have to be obscure or rely on secret
information (e.g. TOR bridges) to avoid blocking, and
thus cannot be used by the public at large. The most effective
solution to censorship so far seems to be political pressure, combined with
bundling to make censorship decisions as broad as possible. When much
content is bundled together, such as entire domains with TLS, or
Github and New York Times[1], it can reduce censorship if there is
political will to censor a specific part, but not the whole thing.
However, political opinion is fickle, and cannot be relied
upon.
Thus, we should reduce this risk by diversifying how we
distribute our content. Multiple distribution routes means no single
point of failure. I see two ways of doing this:
First, by expanding offline versions of Wikimedia. Kiwix already
provides an offline version of Wikimedia sites. We need to expand this
capability to allow for better updating. Offline apps should be
able to efficiently update their contents in accordance to a scenario where users
only have intermittent access to the open Internet. More importantly,
offline apps should be able to update in a P2P fashion with other
apps. In a community with limited access to open Internet, a single
person with an up to date version of Wikipedia, should be able to
easily synchronise his/her app with other people's apps to spread the
knowledge. This could be
especially helpful in a scenario where a small number of people have
access via methods such as TOR, but such methods are too burdensome for
most people.
Second, we could experiment with broadcasting recent edits widely. To
broadcast html versions of all main namespace pages
recently edited on English Wikipedia, would only require about 12 KBps
[2].
This is not a huge amount of bandwidth. During the Cold War it was
common to broadcast propaganda using short wave radio, which could be
listened to across the world. Perhaps we could broadcast everything
that is edited across the world in a similar fashion, allowing users
to stay up to date regardless of their connectivity. This could be
combined with the P2P app, so a few power users could listen in
to the RC stream, and then spread the data among their communities.
[1] https://en.wikipedia.org/wiki/Censorship_of_GitHub#DDoS_attack
[2] Based on very rough experiment, ?action=render of a wikipedia page
roughly gzips to the size of the raw wikitext. From there the 12 KBps
number is based on the enwiki result of:
SELECT sum(l)/(1024*3600*24) FROM
(select max(rc_new_len) 'l' from recentchanges
WHERE rc_namespace = 0 and rc_timestamp
BETWEEN '20170926000000' AND '20170927000000'
AND rc_type <= 1 group by rc_cur_id
) t;
Y
Embracing a new era with only small language obstacles
Recent progress on neural machine translation gives us better translation results. The industry invests huge amounts of money in this area for a promising future. For the first time people can communicate with only small language obstacles. We should be prepared for this near future by evaluating our position and understanding the impact. Also we should seek new opportunities, and contribute to the trend.
Advice:
(1) Cooperate with the industry to enhance our translation infrastructure
(2) Continuously release our translation data as an open corpus
(3) Evaluate the impact. For example, probably very radical, how about setting up one unified Wikipedia in the future? +
Z
Title: Knowledge is our direction. What's next?
Combined knowledge as a service (KAS) and knowledge equity (KE) is identified as our strategic direction (draft). We have decided to focus on knowledge in a broader sense and beyond just encyclopedic knowledge, create KE, and become the infrastructure that offers KAS. In this position paper, I offer some of my early thoughts on where we should focus our efforts to move in this strategic direction. Given the limits of word-count, I will not go through the details of research methods and techniques that can be used to address each point.
==Knowledge==
As the central focus of the strategic direction is knowledge, we need to arrive at a unified working definition of knowledge. English Wikipedia defines knowledge as familiarity, awareness, or understanding of someone or something which is acquired through experience or education, by perceiving, discovering, or learning. This definition, however, is not a working definition that can help us decide what new content to include.
Research on user behavior, needs, and learning patterns can help us define knowledge.
==Knowledge equity==
Our goal is to remove structural inequalities that limit our ability to represent knowledge from all people and by all people. To this end, we need to meet our users where they are. Today:
* language is a barrier to sharing in knowledge. Content should be available to our users in their languages.
* text-only knowledge is a blocker for gathering knowledge, especially from parts of the world that are already left behind. Our systems should become technologically receptive to accepting and allowing editability of new forms of knowledge (e.g., voice for oral knowledge).
* limits in proficiency and literacy is a blocker for our users. The content and its presentation will need to become a function of these parameters.
==Knowledge as a service==
Our goal is to offer KAS: both in terms of the infrastructure that supports it as well as the content of it. To do this, we need to:
* empower our users to learn, create, and go beyond consuming content: Wikimedia projects' talk and discussion pages are an asset for building systems that can help our users think critically and learn how to deliberate. We need to do research to surface this critical thinking and step by step deliberation to gain insights from it, and share it with others as part of our KAS effort.
* do research and development on building systems where deliberation and decision making can be possible at scale. Today, there is no such system available but one of the building blocks of KAS is infrastructure for discussion, deliberation, and decision making.
* empower our users with ways to assess the trustworthiness of the content. Trust and reputation become especially important as we move to new forms of knowledge such as oral knowledge. We should do research to build trust and reputation models for Wikimedia and its users and understand how to surface such metrics as measures of reliability of the knowledge we serve.