Building R2G Tools

I started out building a tool to discover recently dropped domains and in the process domains search engine with a twist. You can check it out at R2G Tools . This was a extremely challenging project, with more than 120 million domains in the world and extremely tight budget. I finally pulled it off last week end. You would be surprised that whois.r2g.in is running off 3 commodity servers (mainly for redundancy, I could have squeezed it into 1 with only slight performance hit).

What’s behind

Database; MongoDB is the main database backend, with MySQL used to hold non domain records. MongoDB was chosen because of it’s ability to handle large amount of objects, schemaless and ability to index keys. Thanks to MongoDB I can search, count and even insert and update records in few seconds (most operations are few milliseconds).

Web application framework; whois.r2g.in is completely built with CodeIgniter. It’s my framework of choice. I like it’s clean and lean architecture.

Servers; I’m running Lighty behind Varnish HTTP accelerator. I also use eaccelerator to optimize PHP code. I also have a memcached instance on the web server to cache data from MySQL. Goal is to speed up whois.r2g.in as much as possible. There are 2 MongoDB instances sharing the load which the application connects. Everything is running off 3 Athlon X2 servers, 2 running MongoDB instances and one running the application.

Why use both MongoDB and MySQL?

That’s because I didn’t want to put all my eggs in one basket. Any data that doesn’t change much like WHOIS servers put into a MySQL table. Information such at WHOIS data were stored in a MongoDB collection.

What I gained?

Lot of experience about data mining, storing and analyzing. I learned a lot about how to optimize data mining, I managed to bring down the time it takes to analyze all the domains for drops to few hours from few days by just pre sorting the zone files. I also gained few rare domains and high page rank domains. I have become a domainer thanks to the project 😀

Please head over to R2G Tools and give it a try,  you might discover a great domain while you are there and make a huge profit. Do not forget to send me some feed back 🙂

Using custom error pages in Varnish

While playing around with the fancy error pages I wanted to use them with Varnish as well. Since there is no means to include a file or serve a file from Varnish (without serving from a back end server), I went with inline C snippet to read and serve the error pages. Please note that the style sheet and the images are being served from a CDN. Otherwise it will have to be cached prior to the back end server becoming inaccessible. Here is the whole vcl_error sub. You will notice that we fall back to default Varnish error page for anything other than 5XX errors.

sub vcl_error {
set obj.http.Content-Type = "text/html; charset=utf-8";

if ( obj.status >= 500 && obj.status <= 505) {
C{
#include <stdio.h>
#include <string.h>

FILE * pFile;
char content [100];
char page [10240];
char fname [50];

page[0] = '\0';
sprintf(fname, "/var/www/errors/%d.html", VRT_r_obj_status(sp));

pFile = fopen(fname, "r");
while (fgets(content, 100, pFile)) {
strcat(page, content);
}
fclose(pFile);
VRT_synth_page(sp, 0, page, "<!-- XID: ", VRT_r_req_xid(sp), " -->", vrt_magic_string_end);
}C
} else {
synthetic {"
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>"} obj.status " " obj.response {"</title>
</head>
<body>
<h1>Error "} obj.status " " obj.response {"</h1>
<p>"} obj.response {"</p>
<h3>Guru Meditation:</h3>
<p>XID: "} req.xid {"</p>
<address>
<a href="http://www.varnish-cache.org/">Varnish</a>
</address>
</body>
</html>
"};
}

return (deliver);
}

Hope someone will find this useful as I had to put some effort to find out all the internal Varnish function names

Fancy HTTP Error pages – 5xx

If you hadn’t noticed my site was giving HTTP 500 errors last couple of days. The issue was found to be a segfault and it’s fixed now. That got me to come up with a set of funny and slick HTTP error pages. I only came up with HTTP 5xx error pages, I believe HTTP 4xx error pages should be specific to the site. You can download them here. If you want to take a peek, here is the list. HTTP 500, 501, 502, 503, 504, 505. Feel free to modify them and share what you come up with.