MobileRead Forums - View Single Post

unboggling · 02-10-2014, 04:40 AM

Quote:

Originally Posted by LadyKate

Ok, I tend to look at things as starting to cleanup with HTML.

HTML can be obtained by opening an ePub, or a Mobi file from Calibre. Saving an rtf, doc or docx file as html in some kind of editor that handles it.

Converting a pdf file to HTM or HTML using Acrobat Pro (I only have version 7 lol. don't use it enough to buy a newer version), a word processor that can translate to HTML or mobipocket creator which as part of the process of translating the prc generates an html file.

In other words. Using any method I can find I translate my original document to HTML. Perhaps even taking an old text file and going through and adding tags to it. (I can't find the php files I had that used a bunch of rules for creating paragraphs out of a flat txt file. It took me quite a while to write it and figure out the regex for finding all the characters found in a paragraph) ...

You want to clean the HTML, or generate clean HTML. This thread isn't the best place to discuss that. Like I said, I generally ignore the code level (HTML/XHTML, CSS). My knowledge/skills there are at the low end of the learning curve. I fix formatting problems interfering with readability if they are quickly fixable, but for my purpose, reading books for enjoyment, I don't care if the underlying code is clean or not.

btw, take a look at Toxaris' Word macro for clean HTML code:

https://www.mobileread.com/forums/showthread.php?t=142530

(for Word on Windows or OS X)