?

Log in

No account? Create an account

It not a joke!!! It is the truth!!!

Giving people what they want: violence and sloppy eating

Previous Entry Share Next Entry
Google & PDFs, partly a note to myself
mini me + poo
lovingboth
When converting these to HTML, it doesn't look like it can cope with some ligatures - using a special character to represent particular combinations of two letters: '&' is the most common example (it's 'et', the Latin for 'and') and 'æ' would be another example. Google is ok with these.

But some programs use the ability in some fonts to do this for combinations of letters that otherwise clash visually like fi, fl, ft, tt etc. And google isn't ok with these.

So "This will be the best chance in fifty years to change things for the better" can become, in google's eyes, "This will be the best chance in fi y years to change things for the be er"!

It certainly can't cope with images in PDFs, which is odd. And text at an angle produces some interesting effects...

How often do people use this feature? Is it worth setting up the PDF to be usable in this way, or do people look at / print PDFs directly?

  • 1
I'd assume the main purpose in google htmling a pdf was to make the text readable to its word indexing software (it also, sometimes, allows you to read something in google's cache that has otherwise been deleted in the orginal).

  • 1