Friday, January 1, 2010

Automatically Backing up a Blogger.com Blog

I’ve been in the computer software industry for a very long time. Long enough to know that the unexpected does happen and sometimes it causes a data loss. I can humbly say that there were times when the work that I have done was the cause of one or two “incidents”. With that said, I understand the importance of redundancy in some form.

Recently I stood up a Windows Home Server for the main purpose of automatically backing up the four computers that I have around the house.  It does an excellent job. With that done, my next goal was to use it to backup my blog. I don’t have a lot of posts but it’s still an investment that I don’t want to take a chance losing.

The manual way

I started the way that must computer people start – with a search engine. Unfortunately I didn’t find anything that I liked. The most obvious answer was to use the Export feature that Blogger provides. On the Blog Settings page of the Blogger.com Dashboard are a set of tools for Importing and Exporting a blog.

BlogTools

The Export functionality does exactly what I want with one exception: it’s not automatic. There also nothing within the Dashboard that allows me to schedule an export. Almost there…

Blogger API

It turns out that Blogger.com has an API that can be used to access and manipulate a blog. They’ve even gone as far as offering clients in a number of different languages: .NET, Java, JavaScript, PHP, and Python. So I took this as an opportunity to start another side project. I decided to create a service that would use the Blogger API and export my blog on a regular basis. And I’ll use my Windows Home Server to host the service.

I decided to not use the .NET client that was available and instead go with the native HTTP protocol that was provided. Why? Just to be different…

The export request is straight-forward to create but requires an Authentication header. So I started with that.

Authentication

The Blogger API supports three authentication methods: AuthSub, ClientLogin and OAuth.

Both AuthSub and OAuth  were designed to allow web-based applications to authenticate on behalf of a user. Since I’m not writing a web app, they weren’t applicable. 

ClientLogin was designed to allow a user to authenticate from a desktop applications. I’m not writing a desktop application but the ideal holds. Basically you pass the email and password to Google authentication service and Google returns an authorization token. You then pass the token back with any other request that is being make on behalf  of the user.

To make the Authentication request, you do a POST request to:

https://www.google.com/accounts/ClientLogin

I create the request like this:

        private HttpWebRequest CreateAuthenticationRequest()
{
var uri = new Uri("https://www.google.com/accounts/ClientLogin");
HttpWebRequest authenticationRequest = (HttpWebRequest)WebRequest.Create(uri);
authenticationRequest.AllowAutoRedirect = false;
authenticationRequest.Method = "POST";
authenticationRequest.ContentType = "application/x-www-form-urlencoded";
authenticationRequest.KeepAlive = false;
authenticationRequest.Expect = string.Empty;
authenticationRequest.Headers.Add("GData-Version", "2");
authenticationRequest.Timeout = this.TimeoutInMs;

var postBody = new StringBuilder();
postBody.Append("accountType=GOOGLE&");
postBody.AppendFormat("Email={0}&", this._email.ToUrlEncoded());
postBody.AppendFormat("Passwd={0}&", this._password.ToUrlEncoded());
postBody.Append("service=blogger&");
postBody.AppendFormat("source={0}", "malevy.net-Blogger.Backup-1".ToUrlEncoded());

if (logger.IsTraceEnabled) logger.Trace("body of post: " + postBody.ToString());

byte[] encodedData = (new ASCIIEncoding()).GetBytes(postBody.ToString());

authenticationRequest.ContentLength = encodedData.Length;
var stream = authenticationRequest.GetRequestStream();
stream.Write(encodedData, 0, encodedData.Length);
stream.Close();

return authenticationRequest;
}


The ToUrlEncoded() is an extension method that Url encodes a string. I based it on a post from Rick Strahl.



(Just a word of caution, the service parameter is case-sensitive. I wasted *way* too much time on that little typo)



If the request succeeds, you’ll get back a 200 response with a body containing SID, LSID, and Auth values. The documentation says that we can ignore the SID and LSID values. Only the Auth value is required.



To use the Auth token, you attach it as a header to a request using the following format:



Authorization: email Auth={the-returned-auth-value}



Archive/Export



This is where things broke down for me. I’ll save you the reading time with a quick summary. I couldn’t get the Archive function to work. The request is really simple:



GET http://www.blogger.com/feeds/{blog-id}/archive



I created the request and attached the authorization token as laid out in the documentation. I was rewarded with a response saying that the authorization header was not recognized. I then spent a lot of time searching the Internet and came up with nothing. I even tried formatting the request and authorization header differently and I still wasn’t able to get it to work. As a last resort, I posted a message to the Blogger Developer Group. Posts from new members are moderated so I’ve not even seen my question appear in the list yet.







Many of the Blogger APIs require that you supply the ID of your blog. If you’re like me and don’t know the value, this Blogger Help article will help.


 



The Workaround



In the meantime, I’ve decided to use the querying capabilities of the API to pull down all the posts. The request looks like:



GET http://www.blogger.com/feeds/{blog-id}/posts/full?updated=1990-01-01T00:00:00&orderby=updated



Which basically retrieves all the posts that have been updated since midnight on 1 Jan 1990. I use the following code snippet to create the request:



        private HttpWebRequest CreateExportRequest(AuthorizationToken token)
{
const string archiveUriFormat = "http://www.blogger.com/feeds/{0}/posts/full?updated=1990-01-01T00:00:00&orderby=updated";
var uri = new Uri(string.Format(archiveUriFormat, this._blogId));
HttpWebRequest exportRequest = (HttpWebRequest)WebRequest.Create(uri);

exportRequest.AllowAutoRedirect = false;
exportRequest.Method = "GET";
exportRequest.KeepAlive = false;
exportRequest.Expect = string.Empty;
exportRequest.Headers.Add("GData-Version", "2");
exportRequest.Timeout = this.TimeoutInMs;

exportRequest.Headers.Add(token.ToAuthorizationHeader());

return exportRequest;
}


I’m going to stick with this approach until I can get some help with the Archive/Export functionality.



In the next post, I’ll discuss the form and architecture of the application. In particular I’ll discuss using Quartz.Net to kick off the archive functionality and the use of Spring.Net to pull everything together.

No comments:

Post a Comment