Log in

No account? Create an account

It not a joke!!! It is the truth!!!

Giving people what they want: violence and sloppy eating

Previous Entry Share Next Entry
lazyweb: filtering rss feeds
mini me + poo
There is an rss feed I am interested in which has a high level of crap I am not interested in on it. I would like to avoid having to look at the crap.

Does anyone use popfile (email / newsgroup Bayes classifier, usually used to detect spam) and nttp//rss to do this? In theory, the latter should take the rss feed and present it as a newsgroup, for the former to look at and classify according to the 'buckets' I set up. As you may guess :) it's not working for me at the moment...

Are there any other ways to do it?

Edit: Oh, yes, I know about Sux0r.org, but public implementations of that want to concentrate on 'more academic' feeds than the one I am looking at. I might try their software though... Commercial services exist (FeedZero and FeedScrub) but I want, as ever, to do this for free.

Edit2: Ah, it looks like you can at least try the commercial services. Let's see how they do. Annoyingly, both only categorise things into 'yes/no' rather than an arbitrary number of categories (I'd like at least four...)

Edit3: It looks like FeedScrub cannot cope with long URLs for feeds. In a busy feed with lots of crap - exactly the sort of feed you need this for - it also has a habit of a) not looking at it often enough (I cannot find a setting for how often to check it) and b) deciding everything in it is crap, and leaving you to look through the 'I'm not going to show you these, they are crap' feed. Which, if you do it via their website, only shows you about ten items (I cannot find a 'ok, show me more' button). And then when you try to train it via this, asks for your password regardless of whether you are logged in, and then doesn't apparently do anything.

Let's see how easy it is to install Sux0r...

  • 1
Minds thinking alike, etc.

Pipes have recently been banned from Craigslist for example - they are a little heavy on the site apparently, as they can fetch the feeds too often.

  • 1