Brian Wolff

From devsummit
Tags Censorship, Offline Editing, Synchronization
Primary Session Advancing the Contributor Experience
Secondary Sessions

Wikimedia should diversify its distribution methods.

Currently Wikimedia distributes its content almost exclusively using the Internet. However, the Internet is controlled by gate keepers in the form of governments and ISPs. While historically these entities rarely controlled the flow of information, more recently we have seen an increase in censorship, particularly by governments. Since Wikimedia is distributed almost entirely over the Internet, we are vulnerable to their whims.

The risk of having our distribution lines interfered with, is an existential threat to our mission. While at present time, only a few geographic locations practise such interference, the future is unknowable and does not appear to be heading in a comforting direction.

Furthermore, in the face of such interference, there is very little we can do. TOR is often spoken as a solution to censorship, but any such on-Internet system will either have to be obscure or rely on secret information (e.g. TOR bridges) to avoid blocking, and thus cannot be used by the public at large. The most effective solution to censorship so far seems to be political pressure, combined with bundling to make censorship decisions as broad as possible. When much content is bundled together, such as entire domains with TLS, or Github and New York Times[1], it can reduce censorship if there is political will to censor a specific part, but not the whole thing. However, political opinion is fickle, and cannot be relied upon.

Thus, we should reduce this risk by diversifying how we distribute our content. Multiple distribution routes means no single point of failure. I see two ways of doing this:

First, by expanding offline versions of Wikimedia. Kiwix already provides an offline version of Wikimedia sites. We need to expand this capability to allow for better updating. Offline apps should be able to efficiently update their contents in accordance to a scenario where users only have intermittent access to the open Internet. More importantly, offline apps should be able to update in a P2P fashion with other apps. In a community with limited access to open Internet, a single person with an up to date version of Wikipedia, should be able to easily synchronise his/her app with other people's apps to spread the knowledge. This could be especially helpful in a scenario where a small number of people have access via methods such as TOR, but such methods are too burdensome for most people.

Second, we could experiment with broadcasting recent edits widely. To broadcast html versions of all main namespace pages recently edited on English Wikipedia, would only require about 12 KBps [2]. This is not a huge amount of bandwidth. During the Cold War it was common to broadcast propaganda using short wave radio, which could be listened to across the world. Perhaps we could broadcast everything that is edited across the world in a similar fashion, allowing users to stay up to date regardless of their connectivity. This could be combined with the P2P app, so a few power users could listen in to the RC stream, and then spread the data among their communities.

[1] https://en.wikipedia.org/wiki/Censorship_of_GitHub#DDoS_attack [2] Based on very rough experiment, ?action=render of a wikipedia page roughly gzips to the size of the raw wikitext. From there the 12 KBps number is based on the enwiki result of: SELECT sum(l)/(1024*3600*24) FROM

(select max(rc_new_len) 'l' from recentchanges
 WHERE  rc_namespace = 0 and rc_timestamp
 BETWEEN '20170926000000' AND '20170927000000'
 AND rc_type <= 1 group by rc_cur_id
) t;