View Single Post
Old 01-18-2014, 03:29 AM   #71
LadyKate
Fanatic
LadyKate ought to be getting tired of karma fortunes by now.LadyKate ought to be getting tired of karma fortunes by now.LadyKate ought to be getting tired of karma fortunes by now.LadyKate ought to be getting tired of karma fortunes by now.LadyKate ought to be getting tired of karma fortunes by now.LadyKate ought to be getting tired of karma fortunes by now.LadyKate ought to be getting tired of karma fortunes by now.LadyKate ought to be getting tired of karma fortunes by now.LadyKate ought to be getting tired of karma fortunes by now.LadyKate ought to be getting tired of karma fortunes by now.LadyKate ought to be getting tired of karma fortunes by now.
 
Posts: 515
Karma: 1470724
Join Date: Jul 2013
Location: Quebec CA
Device: android 4 (samsung tablet and asus tablet)
unboggling, I do use the unpack book or tweek book as it was when I want to edit one book. Usually for the worst offenders, pdf files, I start off with either a save from either my rather outdated acrobat 7 or use the old mobipocket creator.

Mobipocket creator is good for creating a reasonable html file from a pdf. The prc files it creates are rather horrid for calibre to handle and not the nicest of formats and for my favorite authors I like nice formatting. (I'm one who enjoys going back to reread old favorites so want them nice looking)

To get to the next step I like an old tool called "HTML Book Fixer" by Snowsoft. I can't find it anywhere at all these days lol. Perhaps it's time to throw up a small webpage with freebie tools that are considered outdated but useful. This tool removes a lot of the spans and excess font settings that are in most html that has come from a word processor or pdf file (that could be because a lot of pdf files are from word processors)


Another tool that is very useful is "HTML Book Checker" that checks for things like paragraphs with no punctuation etc.

BUT back to the reason for recommending either the pro or free version of editpad.

One of the first steps after clearing up the excess spans and font settings in an html file is to check for paragraph markings.

A book that has <br> or <br/> as a means of separating paragraphs of text is not going to allow you to use calibre to indent the first line of paragraphs.

I will use a regex search setting the editpad to be sensitive to case to search for line breaks that are followed by a lower case letter. This will usually indicate that the break was put in to make it look right in a word processor.
eg search string would be
<br>
([a-z])

replacement string would be
\1

This is a very fast way to remove a chunk of stuff that needs to be cleaned up.

oops. Just looked at how much I typed. Maybe I should start a thread on html cleanup tips.
LadyKate is offline   Reply With Quote