How to setup Wikileaks mirror site by using wget and Github


Wikileaks mirror site on github.com
Wikileaks, a new internet media NPO launched its website in 2006,has been under threat of being eliminated from the internet. Wikileaks lost its DNS service and its main domain(wikileaks.org) has not been reachable due to heavy DDoS Attacks on Wikileaks. Some large U.S companies such as Bank of America, Amazon, Paypal have already stopped giving service to Wikileaks. There are many organization/people who don’t support Wikileaks and who are desperate to shut down Wikileaks’ website, to speak of extremes. Wikileaks is not just one website. Good or bad, it is a lot more powerful and influential than you’ve ever thought.

There are millions of Wikileaks supporters all over the world. You will be surprised to see how many of mirror sites has been setup. Wikileaks has been seeking support and it currently has 1426 mirrors registered at this time (Dec24,2010) if my counting is not wrong. Of course, you also can be one of the Wikileaks supporters by hosting a mirror of the site. Wikileaks provides simple instructions on how to setup a WikiLeaks mirror on its Mass-mirroring Wikileaks page. Basically you need to have a unix based server on which web server is running first, then give Wikileaks staff access permission to the server so that they can upload a copy of the Wikileaks site. That’s how you provide the the Wikileaks content from your webserver. But don’t you think it is risky to host the Wikileaks content on your server? Highly possibly your server may suffer from DDoS Attacks. What if whoever attackers get to know who you are from your domain registration record or something? it may be frightening … Soooooo here I am going to introduce the way you can set up a Wikileaks mirror site without having such a risk of your expose to the attackers and without having any hosting space for Wikileaks.  — Here I use wget and github on linux environment (ubuntu).

Github as web server

What you need to do is basically only 2 steps:

  • 1. mirror a Website with all markup, text, css, scripts, images, etc. to your local machine
  • 2. save the content to the github repository(project name: username.github.com).

Then, the Github Pages allows you to publish the content to the web as if you publish the content on your own site. The Github Pages rule is very simple. If your Github username is ‘wikileaks-mirror-jp’ and you push the content to repository named ‘wikileaks-mirror-jp.github.com’, the content can be accessible through the URL – http://wikileaks-mirror-jp.github.com. In short, you can use Github as web server to publish the mirriroed wikileaks content to the web.

1. Setup a Github user account for the mirror

You need to setup a Github user account so that you can push the content to Github repositories. Let’s say your Github username is ‘wikileaks-mirror-jp’ and you have an existing unix user named ‘wikileaks-mirror-jp’ on your linux machine. Firest of all, you need to add wikileaks-mirror-jp’s SSH pub key to the Github account. This below is how I created SSH pub key, id_rsa_github.pub. See also Generating SSH keys for more detail.

$ su wikileaks-mirror-jp
$ ssh-keygen -t rsa

Generating public/private rsa key pair.
Enter file in which to save the key (/home/wikileaks-mirror-jp/.ssh/id_rsa): /home/wikileaks-mirror-jp/.ssh/id_rsa_github
Created directory ‘/home/wikileaks-mirror-jp/.ssh’.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/wikileaks-mirror-jp/.ssh/id_rsa_github.
Your public key has been saved in /home/wikileaks-mirror-jp/.ssh/id_rsa_github.pub.
The key fingerprint is:
92:23:42:9e:70:c8:ef:65:68:d5:23:0d:62:af:da:87 wikileaks-mirror-jp@ubuntu
The key’s randomart image is:
….

Then, configure SSH like this below so as for the SSH key created above to be used in accessing to github.com server through SSH. See also Multiple SSH Keys for more detail.

$ vi ~/.ssh/config

Host github.com
User wikileaks-mirror-jp
Port 22
Hostname github.com
IdentityFile ~/.ssh/id_rsa_github
TCPKeepAlive yes
IdentitiesOnly yes

Finally, add the SSH pub key on your account setting page.

2. Create a Github repository for the mirror

Once the Github account is ready, then you need to create a new repository for Github user page , wikileaks-mirror-jp.github.com. First, you need to create a new reposotiry by entering project name, description and homepage URL on Create a New Repository page. The new project Name is wikileaks-mirror-jp.github.com

Then, you create an empty git repository for wikileaks-mirror-jp.github.com and add a sample file to the repository like this below.

$ git config --global user.name "wikileaks-mirror-jp"
$ mkdir ~/github
$ mkdir wikileaks-mirror-jp.github.com
$ cd wikileaks-mirror-jp.github.com
$ git init
$ touch README
$ echo "wikileaks-mirror-jp.github.com" > README
$ git add README
$ git commit -m 'first commit'
$ git remote add origin git@github.com:wikileaks-mirror-jp/wikileaks-mirror-jp.github.com.git
$ git push origin master

You will see the sample file pushed to the repository on the project page.

3. Wikileaks mirroring with wget

Here you mirror a Wikileaks website to your local machine by using a wget command. Let’s say the mirroring target site is http://wikileaks.ch and you save all the files and subdiectories to the directory where create the empty git repository for wikileaks-mirror-jp.github.com, ~/github/wikileaks-mirror-jp.github.com. By executing the following command line, you will have all the files downloaded from http://wikileaks.ch under the directory for the Github repository.

$ wget --mirror --convert-links -w 2 -p -e robots=off \
-P ~/github/wikileaks-mirror-jp.github.com http://wikileaks.ch/

--2010-12-23 22:39:40-- http://wikileaks.ch/
Resolving wikileaks.ch... 178.21.20.9, 213.251.145.96, 46.59.1.2, ...
Connecting to wikileaks.ch|178.21.20.9|:80... connected.
HTTP request sent, awaiting response... 200 OK
....

Speaking of ‘-e robots=off‘, you need to turn robots param off in order to download some script or css file that are located under the directories which the site’s robots.txt instructs web robots not to visit. Other key options are ‘–mirror‘, ‘–convert-links‘. Please see wget man page or GNU manual for the wget option details.

4. Push the content to the repository to publish to the web

Finally, push the downloaded content to the repository like this below.

$ cd ~/github/wikileaks-mirror-jp.github.com
$ git add *
$ git commit -m "added mirror site"
$ git push

Counting objects: 3043, done.
Compressing objects: 100% (3013/3013), done.
Writing objects: 100% (3042/3042), 20.37 MiB | 39 KiB/s, done.
Total 3042 (delta 2836), reused 0 (delta 0)
To git@github.com:wikileaks-mirror-jp/wikileaks-mirror-jp.github.com.git
aff2b93..1032dc2 master -> master

Now the content is accessible through http://wikileaks-mirror-jp.github.com. What’s more, if you want to automate the series of commands that you have executed above, put them on crontab. That’s it!

No related posts.

Posted in: Environment Setup

Tags: , , ,



1 Comment

rssComments RSS


Finally cables from U.S. embassy in Tokyo were released!
@see the WSJ article: “WikiLeaks Japan: Whale Diplomacy”:
http://blogs.wsj.com/japanrealtime/2011/01/03/wikileaks-japan-whale-diplomacy/

Comment by yoichi on January 6, 2011 9:29 pm

Sorry, the comment form is closed at this time.