Log in

No account? Create an account

It not a joke!!! It is the truth!!!

Giving people what they want: violence and sloppy eating

Previous Entry Share Next Entry
Lazyweb: literary forensics
mini me + poo
Are there any websites / Linux programs for helping determine if the same person wrote two different sets of message board posts?

I can find plagiarism checkers, but these tend to be looking for whether the texts have a common source, rather than a common author. (If I write two texts, one about soft fruit and one about deckchairs, they are liable to pass a plagiarism test, but should still have enough in common for someone to be able to say that the same person wrote both...)

This entry was originally posted at http://lovingboth.dreamwidth.org/469880.html, because despite having a permanent account, I have had enough of LJ's current owners trying to be evil. Please comment there using OpenID - comment count unavailable have and if you have an LJ account, you can use it for your OpenID account. Or just join Dreamwidth! It only took a couple of minutes to copy all my entries here to there.

  • 1
What you're after is stylometrics. One widely used tool that's as good a starting place as any is JGAAP: http://evllabs.com/jgaap/w/index.php/Main_Page

Your problem might be a very hard one, though. These things work best wIth decent corpuses of training text - like dozens of novels' worth. And they can generally only tell you, for a given test text, which of the authors in the training corpus is most likely to have written it - they assume that it must have been one of the authors in the training materials, not any random writer.

Oh, and it's also worth noting that it's even harder to detect impersonation - e.g. determining whether a text is really Dickens or a skilled writer pretending to be Dickens.

  • 1