09-06-2021, 06:42 AM | #1 |
Junior Member
Posts: 6
Karma: 10
Join Date: Sep 2021
Device: none (Apple Books desktop)
|
How do I get clean files from Apple Pages?
I am struggling to produce clean html and css from Apple Pages.
My documents are (to me) very simple and structured. Essentially novel-like texts, 'correctly' formatted. Nothing fancy. I am trying to end up with ePub files with clean HTML and stylesheets. My only misbehaviour is that I explicitly identify first paragraphs. I want only my named paragraph and character styles in the output. Exporting directly to ePub (from Pages or LibreOffice) ends up messy. Exporting to Word (docx) and then using Calibre also ends up with very messy files. I just want <h> <p class="SameNameAsWordProcessorParagraphStyle"> <span class="WPCharacterStyle>. I don't want a dozen .blocks or .calibres. And I would prefer the generated stylesheets to not have formatting information—just a list of the named or used paragraph and character/span styles. It is possible (even likely) I am using the wrong tools for this. |
09-06-2021, 08:28 AM | #2 |
the rook, bossing Never.
Posts: 11,171
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
LO Writer
edit in odt format Save As an extra copy in docx Convert docx to epub2 in Calibre. Only edit HTML if doing something NOT simple, and then maybe in Sigil or Calibre Editor, not with Pages or any other HTML WYSIWYG editor. If you use paragraph styles, heading levels and links-anchors properly the results from Word or extra Save As in LO Writer to docx are perfect in Calibre to epub2. Never edit docx with LO Writer, either import it once and fix it and edit only in odt and "Save As" an extra copy from LO Writer. Samples in my sig. Each css block { } created by Calibre is from ONE LO Writer paragraph style. There are no extra blocks. Only items to appear in the NCX have a heading level (via paragraph style). Some textual headings and all scene breaks are just paragraph styles with body text level. |
Advert | |
|
09-06-2021, 09:42 AM | #3 |
Resident Curmudgeon
Posts: 74,037
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
You best solution is to forget Pages ever existed. It's GARBAGE! Next, convert your word processing document to ePub using Calibre. Then use either the Calibre editor or Sigil to clean it up. That is what I would do. IT's how I did it the last time I had a Word document to convert to ePub.
|
09-07-2021, 07:57 PM | #4 | ||||
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Side Note: We recently brought this up on a tangent in this InDesign thread: https://www.mobileread.com/forums/sh...58#post4119758 Complaining about atrocious InDesign/GoogleDocs/Pages code. Quote:
If you export to HTML (or EPUB) from Pages, what does the actual HTML look like? So let's say you apply your "first" Style. Clean HTML would look like this: Code:
<h2>Chapter 1</h2> <p class="first">On a cold and stormy night...</p> Quote:
2020: "eBook Formatting in Sigil" Sounds like you're on a Mac though, so some of the really clean output tools are out of your reach (Windows only). * * * I'm not sure what your LibreOffice DOCX problems are... could be when LibreOffice imports your Pages document, it carries over all the Pages cruft, which then makes its way into your Calibre conversion. Quote:
For a little bit more Mammoth ease-of-use, there's DiapDealer's "DOCXImport" (Sigil plugin)... but again, it's a very advanced method of conversion. IF you use Styles properly/consistently though, that could be what you need. But... a lot of this depends on the actual Pages code. (Which I must admit, I haven't personally seen yet, only heard through the grapevine.) Last edited by Tex2002ans; 09-08-2021 at 02:45 PM. |
||||
09-08-2021, 02:13 PM | #5 |
Resident Curmudgeon
Posts: 74,037
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Besides Sigil, there is the Calibre editor.
|
Advert | |
|
09-13-2021, 08:44 AM | #6 |
Junior Member
Posts: 6
Karma: 10
Join Date: Sep 2021
Device: none (Apple Books desktop)
|
(Thank you for the replies. I am not ghosting, just still scratching my head, trying various things, trying to work out what to do. This was a lot easier last century (literally). Back then, I had Classic Macintosh System widgets I could drag-and-drop Word files to that (in my memory, at least,) spat out clean HTML.)
|
09-13-2021, 10:52 AM | #7 | ||
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
You can send me a Private Message on MobileRead by: Pressing on my username > Send Message Just upload them to Dropbox or Google Drive (or some other filesharing site) and send me the URL. From rereading your first post, I think it may be helpful to get these 4 files:
Quote:
Garbage In, Garbage Out. These programs output absolutely atrocious HTML:
If you use Styles properly though (LibreOffice, Word, etc.), you can still get clean HTML out. It's just 99.9%+ of people don't use Styles, or know they exist. |
||
09-13-2021, 11:59 AM | #8 |
Resident Curmudgeon
Posts: 74,037
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Even if you do use styles in Word, it's still a good idea to go over the code by hand to make sure it's good code.
|
09-15-2021, 10:49 AM | #9 | |
Bookmaker & Cat Slave
Posts: 11,462
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
I would be interested to know, Tex, if you see something odd in it. :-) Hitch |
|
09-15-2021, 11:06 AM | #10 |
Resident Curmudgeon
Posts: 74,037
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Pages is disaster. Don't use it.
|
09-15-2021, 11:34 AM | #11 |
Bookmaker & Cat Slave
Posts: 11,462
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Jon, to be fair, I'm not sure it's Pages, per se.
I think it's about the same as all the rest--altho, to be fair, it is a bit more disastrous when it's trying to compensate for ad hoc styles created by users that want something that "just works." That don't use styles and so on. But if you get a file (Pages) from someone that's relatively good at using Styles/headings and the like, mostly, it's about on par with the others. (To me, this is like that oft-repeated mantra "the HTML output from Word is HORRRRIBLE! Oh, the humanity! The world as we know it will end, if you use Word for your book!" and so on, all of which is utter bollocks.) Where it might suffer is in a user mindset that is more "just works" than other pieces of software. But for all I know, my view of it is skewed, because the successful users make their own ePUBs and upload them and don't use companies like mine. I mean, after all, by and large, I tend to get the writers that do NOT follow the directions, rather than the ones that do. :-) Hitch |
09-15-2021, 11:41 AM | #12 |
Evangelist
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
|
In my experiences such conversions are trivial, unless it is OCR or the input contains weird things like negative spans. Usually it takes about 5 regexes to clean things up.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
What files to transfer for a clean library? | ficbot | Calibre | 2 | 08-09-2016 01:23 PM |
Getting a Clean NCX for Mobi files | WeaverDonnaK | Editor | 1 | 04-01-2016 11:40 AM |
How do I clean up WordPerfect-published HTML files? | ecbritz | Sigil | 12 | 04-09-2013 11:41 AM |
Apple: Creating ePub files with Pages | kjk | Apple Devices | 1 | 08-26-2010 05:46 PM |
Docvert 2.0 converts MS Word files to clean HTML | Alexander Turcic | Lounge | 0 | 03-16-2006 04:50 AM |