MobileRead Forums - View Single Post - hello fb2 questions. calibre thoughts as well.

BobC · 05-01-2010, 06:00 AM

Personally I find PDF is a pretty poor starting point for converting ebooks, mainly because of the problem of properly identifying paragraphs which should re-flow. I'd rather work with a clean pure-text file than PDFs. However that doesn't address your problem.

Like Dave_S I used to use Book Designer but I have moved over to using the OpenOffice plug-in which, when teamed up with the AltSearch plug-in gives excellent capabilities for converting texts etc. One reason I prefer this approach is that one size doesn't fit all - IMO you need to examine the document you have and decide how best to convert it. Typically does it have italics, are there chunks of text that need formatting as "cite" or "poem" or even as tables, are there footnotes that need handling; some of these cannot be automated fully or you need an intermediate step before getting rid of all the line breaks. Some things I do using the AltSearch macros others I use the OOOFBTools text correction facilities - it has special facilities for handling hyphens where they have been hard coded into the text (typical if the source is OCR'd).

If your source is a PDF and there are no restrictions on it you might be able to simply highlight the text then copy and paste it into OOO - that way, unlike extracting the text (using "Save as Text" in Acrobat Reader), you will at least preserve the italics. Almost certainly you will need to manually remove all page headers and footers (including page numbers) unless you are very adept with RegExps and can automate the task.

I did a short guide to the OOO approach and you should be able to find it by searching the forums for OOOFBTOOLS as there is little English documentation on the subject.

Calibre - while a good tool for some formats has only incorporated fb2 recently and I have commented on a few occasions that the code it produces for fb2 is very poor - often failing quite elementary validation and using markup that doesn't appear in the fb2 Schema.

BobC

05-01-2010, 06:00 AM	#3
BobC Guru Posts: 691 Karma: 3026110 Join Date: Dec 2008 Location: Lancashire, U.K. Device: BeBook 1, BeBook Pure, Kobo Glo, (and HD),Energy Sistem EReader Pro +	Personally I find PDF is a pretty poor starting point for converting ebooks, mainly because of the problem of properly identifying paragraphs which should re-flow. I'd rather work with a clean pure-text file than PDFs. However that doesn't address your problem. Like Dave_S I used to use Book Designer but I have moved over to using the OpenOffice plug-in which, when teamed up with the AltSearch plug-in gives excellent capabilities for converting texts etc. One reason I prefer this approach is that one size doesn't fit all - IMO you need to examine the document you have and decide how best to convert it. Typically does it have italics, are there chunks of text that need formatting as "cite" or "poem" or even as tables, are there footnotes that need handling; some of these cannot be automated fully or you need an intermediate step before getting rid of all the line breaks. Some things I do using the AltSearch macros others I use the OOOFBTools text correction facilities - it has special facilities for handling hyphens where they have been hard coded into the text (typical if the source is OCR'd). If your source is a PDF and there are no restrictions on it you might be able to simply highlight the text then copy and paste it into OOO - that way, unlike extracting the text (using "Save as Text" in Acrobat Reader), you will at least preserve the italics. Almost certainly you will need to manually remove all page headers and footers (including page numbers) unless you are very adept with RegExps and can automate the task. I did a short guide to the OOO approach and you should be able to find it by searching the forums for OOOFBTOOLS as there is little English documentation on the subject. Calibre - while a good tool for some formats has only incorporated fb2 recently and I have commented on a few occasions that the code it produces for fb2 is very poor - often failing quite elementary validation and using markup that doesn't appear in the fb2 Schema. BobC