EEEK -- Bleg Related to Google Feedproxy Links

For years I have been blogging from articles in my Google Reader, which is going away in a month.  When I cut and paste the article URL from the reader, I get a Google shortcut like "http://feedproxy.google.com/~r/Twistedsifter/~3/BohimNYue3Y/".  This resolves to "http://twistedsifter.com/2013/04/strangely-similar-movies-released-around-the-same-time/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Twistedsifter+%28TwistedSifter+%29".  The links are written in my wordpress data base, in many cases, as the feedproxy version.  So they depend on this Google service remaining live to work.

Does anyone know if the Google feedproxy servers are going away with Reader?  If so, about a zillion links on my site are about to break.  My hope is that Google uses these for more than just reader.  Perhaps at Feedburner? (though if Google is bailing on RSS that might be next on the kill list).

I would normally just do a Regex search to fix this, but there is no systematic way to do it, you have to resolve the link and then replace the resolved URL.  Someone seems to have an app for this, but I am not sure it is ready for prime time and I do not want to use it unless I have to.  But once the servers are turned off, it will be too late.

Anyone know about this or have advice?  Obviously, I have been trying not to use these feedproxy URL's if I can remember not to do so.

5 Comments
Inline Feedbacks
View all comments

For technical blogs I get a drip feed of their old blogs using http://www.streamspigot.com/feed-playback/ They use google's feed archive to get data. I hope they don't cut that off. I learn quite a bit from those old blog posts in an easy fashion (slowly fed to me in my reader).

My condolences, that sounds like a real pain.

It would be better to de-proxify those links if possible. You don't really want the feedproxy in there anyway; it just snuck up on you because of how Google Reader works.

On a related note, you can also trim off the crud starting with "/?" That stuff is just for statistics, and it doesn't make any sense for a normal hyperlink.

You could probably script this pretty easily. Curl accepts a "--head" argument that will show you the 301 redirect, along with the destination URL. For example:

[xxx.yyyyy@W80501BMAGZ:tmp] $ curl --head "http://feedproxy.google.com/~r/Twistedsifter/~3/BohimNYue3Y/"

HTTP/1.1 301 Moved Permanently

Location: http://twistedsifter.com/2013/04/strangely-similar-movies-released-around-the-same-time/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Twistedsifter+%28TwistedSifter+%29

Content-Type: text/html; charset=UTF-8
[snip]

Note that you could just grep out the location, and replace it with the new URL. I'm not sure how your data is stored (I don't know much about blogs), but you might even be able to generate a big lookup table, with the source & destination URLs, then script the search & replace based on that.

You are correct for most sites. The use the /? for stats and tracking, however, on Wall Street Journal's web site there have been times when I stripped the /? crap and ended up without a useful page.

WSJ is unbelievably clunky - if you use a dated browser, you can't even access it any more.