Quote:
Originally Posted by LadyKate
Ok, I tend to look at things as starting to cleanup with HTML.
HTML can be obtained by opening an ePub, or a Mobi file from Calibre. Saving an rtf, doc or docx file as html in some kind of editor that handles it.
Converting a pdf file to HTM or HTML using Acrobat Pro (I only have version 7 lol. don't use it enough to buy a newer version), a word processor that can translate to HTML or mobipocket creator which as part of the process of translating the prc generates an html file.
In other words. Using any method I can find I translate my original document to HTML. Perhaps even taking an old text file and going through and adding tags to it. (I can't find the php files I had that used a bunch of rules for creating paragraphs out of a flat txt file. It took me quite a while to write it and figure out the regex for finding all the characters found in a paragraph) ...
|
You want to clean the HTML, or generate clean HTML. This thread isn't the best place to discuss that. Like I said, I generally ignore the code level (HTML/XHTML, CSS). My knowledge/skills there are at the low end of the learning curve. I fix formatting problems interfering with readability if they are quickly fixable, but for my purpose, reading books for enjoyment, I don't care if the underlying code is clean or not.
btw, take a look at Toxaris' Word macro for clean HTML code:
https://www.mobileread.com/forums/showthread.php?t=142530
(for Word on Windows or OS X)