It not a joke!!! It is the truth!!!

Giving people what they want: violence and sloppy eating

Previous Entry Share Next Entry
lazyweb: filtering rss feeds
mini me + poo
lovingboth
There is an rss feed I am interested in which has a high level of crap I am not interested in on it. I would like to avoid having to look at the crap.

Does anyone use popfile (email / newsgroup Bayes classifier, usually used to detect spam) and nttp//rss to do this? In theory, the latter should take the rss feed and present it as a newsgroup, for the former to look at and classify according to the 'buckets' I set up. As you may guess :) it's not working for me at the moment...

Are there any other ways to do it?

Edit: Oh, yes, I know about Sux0r.org, but public implementations of that want to concentrate on 'more academic' feeds than the one I am looking at. I might try their software though... Commercial services exist (FeedZero and FeedScrub) but I want, as ever, to do this for free.

Edit2: Ah, it looks like you can at least try the commercial services. Let's see how they do. Annoyingly, both only categorise things into 'yes/no' rather than an arbitrary number of categories (I'd like at least four...)

Edit3: It looks like FeedScrub cannot cope with long URLs for feeds. In a busy feed with lots of crap - exactly the sort of feed you need this for - it also has a habit of a) not looking at it often enough (I cannot find a setting for how often to check it) and b) deciding everything in it is crap, and leaving you to look through the 'I'm not going to show you these, they are crap' feed. Which, if you do it via their website, only shows you about ten items (I cannot find a 'ok, show me more' button). And then when you try to train it via this, asks for your password regardless of whether you are logged in, and then doesn't apparently do anything.

Let's see how easy it is to install Sux0r...

  • 1
Yahoo Pipes is probably what you want for any plumbing/manipulation of RSS feeds. Reportedly a little unresponsive on occasion of late, tho'.

Minds thinking alike, etc.

Pipes have recently been banned from Craigslist for example - they are a little heavy on the site apparently, as they can fetch the feeds too often.

Hmm, see other comment. I think this is Bayes territory. It certainly will need training - I know what I want when I see it, but I cannot describe it in advance beyond saying 'it'll be interesting to me'.

I've not run my own bayes filters for some time, and not on an RSS feed. The first thought I had on that for something that could do it is to wire it up through http://pipes.yahoo.com/ though. You can add various regexes and other filters to remove what you don't want to see.

From past experience with popfile as an email filter, where it could sort email into multiple categories better than I can describe it, I am not convinced regexes would work.

Distinguishing between 'ads I might be interested in' and 'ads I would probably not be interested in' is what I want to do here (along with a couple of other buckets). I can tell the difference when I see it, but it's a pain and when I was using it for email, popfile used to be able to do this automatically.


  • 1
?

Log in

No account? Create an account