01-27-2013, 12:26 PM | #1 |
Enthusiast
Posts: 38
Karma: 12
Join Date: May 2010
Device: iPhone apps
|
Converting to a Clean html File
What is the easiest/best way to convert a book I've created in Pages (Mac) to a clean html file to then create my ePub in Sigil? In the past, I've manually taken the text file output and gone through each paragraph and adding back in the bold, italics, etc., but this book is 400 pages and has a LOT of formatting and that would just take forever!
I've tried the export to ePub feature from Pages, but I hate the way it outputs the file (very messy and lots of unnecessary css code) - I *could* take that file and clean it up if I have to... But I'm wondering if there's any way to either copy and paste into Sigil and keep the formatting (just the basic formatting - bold, italic, underline) - or to import a file into Sigil and still have that formatting intact? Thanks so much for any help! |
01-27-2013, 01:31 PM | #2 |
A Hairy Wizard
Posts: 3,108
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Hey Jackibar!
The first step in making things easier is to ditch the Mac.... lol Seriously though, there are a few program's that people have written to do what you suggest. There is one that you can find on this thread HERE that works very well for "Word on a PC"...but I'm not sure if they have been tried on the Mac...I have looked over the code on the linked macro and I didn't see anything that stood out as bad. You may have some luck with it - are you any good at transcribing code?? There are also a some more editing tools that I have seen discussed in this thread HERE...but I haven't used them. If you have some knowledge/experience with search/replace in Pages you can: - replace the end paragraph markers with </p><p> - replace anything formatted as bold/italics with the proper <b><i> tags - replace lines formatted as headers with the appropriate <h> tag Then SelectAll from the main screen and copy/paste into your favorite text editor or directly into an html page in Sigil. That keeps the html tags but gets rid of ALL the extraneous codes. I hope that helps! |
Advert | |
|
01-27-2013, 03:23 PM | #3 |
Guru
Posts: 696
Karma: 150000
Join Date: Feb 2010
Device: none
|
possibly another suggestion. Can pages output other formats besides epub? If you could save the book as an rtf file, for example, you could then use LibreOffice (or OpenOffice) with the writer2xhtml extension to output a fairly decent epub that you could easily clean up in Sigil as necessary.
NB: writer2xhtml is a part of the writer2latex package at the above link. All of the above software is free-as-in-speech, as far as I know. |
01-27-2013, 06:41 PM | #4 | ||
Enthusiast
Posts: 38
Karma: 12
Join Date: May 2010
Device: iPhone apps
|
Quote:
Quote:
I downloaded and got it installed but when I ran it, it gave a compile error, so not sure what it's mad about...! I'll try some of the other tools recommended and see what I can come up with. Thanks again for the help! |
||
01-27-2013, 06:43 PM | #5 | |
Enthusiast
Posts: 38
Karma: 12
Join Date: May 2010
Device: iPhone apps
|
Quote:
I hadn't heard of the other resources you mentioned, so I'll give that a try... If all else fails, I can always do the search/replace method - was just wondering if there was a more automated way of getting this done! |
|
Advert | |
|
01-30-2013, 11:39 AM | #6 |
mostly an observer
Posts: 1,515
Karma: 987654
Join Date: Dec 2012
Device: Kindle
|
I believe Pages will save a file in *.doc format. I've never tried this (because I don't use Mac OS) but I have run an Open Office *.doc file through word2cleanhtml.com and it cleaned up nicely. Be worth a try, surely.
|
01-30-2013, 11:07 PM | #7 | |
Bookmaker & Cat Slave
Posts: 11,462
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
I've also been surprised at how clean an exported ePUB was from a very simple Pages document I was sent (as an experiment). It was poetry, and while the output was far from perfect, it would have been super-fast to regex it and have the book done and dusted. I didn't see anything untoward about the ePUB, given that it came from what is basically a word-processor. I think the output for clean-up is six of one, half-dozen of another, considering it will have to be cleaned either as HTML or XHTML. {shrug}. Just my $.02. Hitch |
|
01-31-2013, 09:25 AM | #8 | |
mostly an observer
Posts: 1,515
Karma: 987654
Join Date: Dec 2012
Device: Kindle
|
Quote:
(As opposed to saving it as a *.doc file, then opening it say in Open Office and following one of the more traditional routes to generating html?) |
|
01-31-2013, 09:39 AM | #9 |
A Hairy Wizard
Posts: 3,108
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
+1 for Hitch's rec. If Pages puts out anywhere near a clean-ish ePub then putting that through Sigil is probably the best/fastest route. Sigil has pretty much replaced my normal text editor for cleaning up the HTML (that took a bunch of recreating saved regex's) and since it keeps everything packaged in a compliant ePub it is much more convenient.
|
01-31-2013, 03:58 PM | #10 | |
Bookmaker & Cat Slave
Posts: 11,462
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
(Between you and Bear, this place is starting to feel very familiar, LOL)...yes, generally, I think I would. I'm not going to state it categorically, because my experience is limited. I don't much care for the Mac environment, so have not spent a lot of time banging around in it. However, when I received that poetry in Pages, and had to open it over on the Mac, while I was exporting the .doc format, I thought, "what the hell" and tried other export options. The ePUB output from the original Pages file was remarkably clean. It would have been a matter of minutes (because we already have the relevant CSS ready-to-go) to rename the elements as needed, marry it to our House CSS and have it done. Now, I want to iterate that it was an extremely simple file, just one line of text, followed by "enter" after another, so it wasn't a complicated test. To me, either it would have been fast to a) do it in Sigil or even b) explode the exported ePUB, open all the html files in NT Pro, and regex it either way. I don't see that going Pages-->Doc-->HTML-->Sigil would have been faster. Now, that's someone who does this all the time speaking. I'm not sure it would be more intuitive for a noob DIY'er, like the folks on the KDP. That's my caveat, here. It's fine if you know HTML/XHTML/Regex. Probably not so easy if one doesn't, or needs to upload Word or HTML at the KDP. Hitch |
|
01-31-2013, 04:05 PM | #11 | |
A Hairy Wizard
Posts: 3,108
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Quote:
|
|
02-01-2013, 03:01 AM | #12 |
Connoisseur
Posts: 98
Karma: 122982
Join Date: Apr 2010
Location: Humboldt County, California
Device: ipad, iPod touch, JetBook Lite
|
TextEdit puts out clean HTML. If the paragraph and font formatting is complex, you can remove the multiple fonts and multiple font-sizes by selecting the entire text and using the tools on the toolbar to change all the fonts to the same font (but preserving the italic and bold), as well as changing all the font sizes to just one size (or leaving the sizes all the same). Plus, you can examine the HTML code in TextEdit by changing the the preferences to show the HTML codes versus rendering the rich text. Afterwords, you can hand edit the HTML code, and just remove the font property altogether.
Also you can make all the paragraph styles the same using the Copy Ruler / Paste Ruler commands found under Format->Text. This ought to decrease the number of styles in your book down to 1 or 2. |
02-02-2013, 06:57 AM | #13 |
Guru
Posts: 878
Karma: 2457540
Join Date: Nov 2011
Device: none
|
If you're writing a book from scratch, there's a lot to be said for starting off in Sigil. You'll get the ultimate "Clean code", just paragraphs and maybe a Header style for chapter titles. Which is just about all it's sensible to ask of an eBook, though you can play around complicating it if you like!
If you want a printed version, Ctrl-A in Page View of each chapter followed by Ctrl-C and then Ctrl-V into Word doesn't take long. Then you can add page layout as much as you wish, in a medium that will take some notice of it! (Translate all that into Mac if necessary.) Last edited by exaltedwombat; 02-02-2013 at 06:59 AM. |
02-04-2013, 09:15 AM | #14 |
mostly an observer
Posts: 1,515
Karma: 987654
Join Date: Dec 2012
Device: Kindle
|
I don't suppose the developers would be willing to add the basic WordStar formatting commands to Sigil?
I am at this moment writing a book in WordStar "non-document" (text) mode with the extension *.htm. I'd be happy to cut out the middleman! |
02-04-2013, 09:39 AM | #15 |
Well trained by Cats
Posts: 29,869
Karma: 55267620
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Wordstar. I have not heard that name in a long time.
There is a set of WS 5.5 Floppies around here somewhere , problem is, they are 5.25" |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to Clean/Strip HTML from epub file? | Jimbo724 | General Discussions | 9 | 12-12-2012 11:22 AM |
Converting Mobi or HTML file to Epub | Patuba | Sigil | 1 | 07-23-2011 04:14 PM |
Converting Mobi or HTML file to Epub | Patuba | ePub | 7 | 07-19-2011 12:11 PM |
Need help converting file which is too long to be HTML | ficbot | Workshop | 8 | 04-06-2010 11:45 PM |
converting lit html output into one big file for BD | Dave Berk | Sony Reader | 15 | 03-29-2007 10:02 PM |