Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 09-06-2021, 06:42 AM   #1
Blood Black
Junior Member
Blood Black began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Sep 2021
Device: none (Apple Books desktop)
How do I get clean files from Apple Pages?

I am struggling to produce clean html and css from Apple Pages.

My documents are (to me) very simple and structured. Essentially novel-like texts, 'correctly' formatted. Nothing fancy. I am trying to end up with ePub files with clean HTML and stylesheets. My only misbehaviour is that I explicitly identify first paragraphs. I want only my named paragraph and character styles in the output.

Exporting directly to ePub (from Pages or LibreOffice) ends up messy. Exporting to Word (docx) and then using Calibre also ends up with very messy files.

I just want
<h>
<p class="SameNameAsWordProcessorParagraphStyle">
<span class="WPCharacterStyle>.

I don't want a dozen .blocks or .calibres.

And I would prefer the generated stylesheets to not have formatting information—just a list of the named or used paragraph and character/span styles.

It is possible (even likely) I am using the wrong tools for this.
Blood Black is offline   Reply With Quote
Old 09-06-2021, 08:28 AM   #2
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 11,171
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
LO Writer
edit in odt format
Save As an extra copy in docx
Convert docx to epub2 in Calibre.

Only edit HTML if doing something NOT simple, and then maybe in Sigil or Calibre Editor, not with Pages or any other HTML WYSIWYG editor.

If you use paragraph styles, heading levels and links-anchors properly the results from Word or extra Save As in LO Writer to docx are perfect in Calibre to epub2.

Never edit docx with LO Writer, either import it once and fix it and edit only in odt and "Save As" an extra copy from LO Writer.

Samples in my sig. Each css block { } created by Calibre is from ONE LO Writer paragraph style. There are no extra blocks.

Only items to appear in the NCX have a heading level (via paragraph style). Some textual headings and all scene breaks are just paragraph styles with body text level.
Quoth is offline   Reply With Quote
Advert
Old 09-06-2021, 09:42 AM   #3
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,037
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
You best solution is to forget Pages ever existed. It's GARBAGE! Next, convert your word processing document to ePub using Calibre. Then use either the Calibre editor or Sigil to clean it up. That is what I would do. IT's how I did it the last time I had a Word document to convert to ePub.
JSWolf is online now   Reply With Quote
Old 09-07-2021, 07:57 PM   #4
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Blood Black View Post
I am struggling to produce clean html and css from Apple Pages.

My documents are (to me) very simple and structured. Essentially novel-like texts, 'correctly' formatted. Nothing fancy. I am trying to end up with ePub files with clean HTML and stylesheets.
From what I've gathered over the years, Pages's HTML output is a mess.

Side Note: We recently brought this up on a tangent in this InDesign thread:

https://www.mobileread.com/forums/sh...58#post4119758

Complaining about atrocious InDesign/GoogleDocs/Pages code.

Quote:
Originally Posted by Blood Black View Post
My only misbehaviour is that I explicitly identify first paragraphs. I want only my named paragraph and character styles in the output.
And you're sure you're using Styles properly?

If you export to HTML (or EPUB) from Pages, what does the actual HTML look like?

So let's say you apply your "first" Style. Clean HTML would look like this:

Code:
<h2>Chapter 1</h2>

<p class="first">On a cold and stormy night...</p>
What does your Apple Pages HTML look like?

Quote:
Originally Posted by Blood Black View Post
Exporting directly to ePub (from Pages or LibreOffice) ends up messy. Exporting to Word (docx) and then using Calibre also ends up with very messy files.
There was a heck of a lot of great discussion about Styles + DOCX conversion + tools in:

2020: "eBook Formatting in Sigil"

Sounds like you're on a Mac though, so some of the really clean output tools are out of your reach (Windows only).

* * *

I'm not sure what your LibreOffice DOCX problems are... could be when LibreOffice imports your Pages document, it carries over all the Pages cruft, which then makes its way into your Calibre conversion.

Quote:
Originally Posted by Blood Black View Post
And I would prefer the generated stylesheets to not have formatting information—just a list of the named or used paragraph and character/span styles.

It is possible (even likely) I am using the wrong tools for this.
Sounds like Mammoth may work for you, but that's a much more advanced Python commandline tool.

For a little bit more Mammoth ease-of-use, there's DiapDealer's "DOCXImport" (Sigil plugin)... but again, it's a very advanced method of conversion.

IF you use Styles properly/consistently though, that could be what you need.

But... a lot of this depends on the actual Pages code. (Which I must admit, I haven't personally seen yet, only heard through the grapevine.)

Last edited by Tex2002ans; 09-08-2021 at 02:45 PM.
Tex2002ans is offline   Reply With Quote
Old 09-08-2021, 02:13 PM   #5
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,037
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Besides Sigil, there is the Calibre editor.
JSWolf is online now   Reply With Quote
Advert
Old 09-13-2021, 08:44 AM   #6
Blood Black
Junior Member
Blood Black began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Sep 2021
Device: none (Apple Books desktop)
(Thank you for the replies. I am not ghosting, just still scratching my head, trying various things, trying to work out what to do. This was a lot easier last century (literally). Back then, I had Classic Macintosh System widgets I could drag-and-drop Word files to that (in my memory, at least,) spat out clean HTML.)
Blood Black is offline   Reply With Quote
Old 09-13-2021, 10:52 AM   #7
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Blood Black View Post
(Thank you for the replies. I am not ghosting, just still scratching my head, trying various things, trying to work out what to do.
Send me the files. I'll take a look at it and see if anything can be salvaged (or what the best method would be).

You can send me a Private Message on MobileRead by:

Pressing on my username > Send Message

Just upload them to Dropbox or Google Drive (or some other filesharing site) and send me the URL.

From rereading your first post, I think it may be helpful to get these 4 files:
  • Original file (whatever format Pages saves it as)
  • Pages -> DOCX
  • Pages -> EPUB
  • Pages -> HTML

Quote:
Originally Posted by Blood Black View Post
This was a lot easier last century (literally). Back then, I had Classic Macintosh System widgets I could drag-and-drop Word files to that (in my memory, at least,) spat out clean HTML.)
The problem is using garbage tools/workflows:

Garbage In, Garbage Out.

These programs output absolutely atrocious HTML:
  • Google Docs
  • iBooks Author
  • ???Apple Pages???
  • [...]

If you use Styles properly though (LibreOffice, Word, etc.), you can still get clean HTML out. It's just 99.9%+ of people don't use Styles, or know they exist.
Tex2002ans is offline   Reply With Quote
Old 09-13-2021, 11:59 AM   #8
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,037
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Even if you do use styles in Word, it's still a good idea to go over the code by hand to make sure it's good code.
JSWolf is online now   Reply With Quote
Old 09-15-2021, 10:49 AM   #9
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,462
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by Tex2002ans View Post
Send me the files. I'll take a look at it and see if anything can be salvaged (or what the best method would be).

(snippage)



The problem is using garbage tools/workflows:

Garbage In, Garbage Out.

(snippage for space, but I agree absolutely with Tex's list)

If you use Styles properly though (LibreOffice, Word, etc.), you can still get clean HTML out. It's just 99.9%+ of people don't use Styles, or know they exist.

I would be interested to know, Tex, if you see something odd in it. :-)

Hitch
Hitch is offline   Reply With Quote
Old 09-15-2021, 11:06 AM   #10
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,037
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Pages is disaster. Don't use it.
JSWolf is online now   Reply With Quote
Old 09-15-2021, 11:34 AM   #11
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,462
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by JSWolf View Post
Pages is disaster. Don't use it.
Jon, to be fair, I'm not sure it's Pages, per se.

I think it's about the same as all the rest--altho, to be fair, it is a bit more disastrous when it's trying to compensate for ad hoc styles created by users that want something that "just works." That don't use styles and so on.

But if you get a file (Pages) from someone that's relatively good at using Styles/headings and the like, mostly, it's about on par with the others.

(To me, this is like that oft-repeated mantra "the HTML output from Word is HORRRRIBLE! Oh, the humanity! The world as we know it will end, if you use Word for your book!" and so on, all of which is utter bollocks.)

Where it might suffer is in a user mindset that is more "just works" than other pieces of software. But for all I know, my view of it is skewed, because the successful users make their own ePUBs and upload them and don't use companies like mine.

I mean, after all, by and large, I tend to get the writers that do NOT follow the directions, rather than the ones that do. :-)

Hitch
Hitch is offline   Reply With Quote
Old 09-15-2021, 11:41 AM   #12
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
In my experiences such conversions are trivial, unless it is OCR or the input contains weird things like negative spans. Usually it takes about 5 regexes to clean things up.
Sarmat89 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
What files to transfer for a clean library? ficbot Calibre 2 08-09-2016 01:23 PM
Getting a Clean NCX for Mobi files WeaverDonnaK Editor 1 04-01-2016 11:40 AM
How do I clean up WordPerfect-published HTML files? ecbritz Sigil 12 04-09-2013 11:41 AM
Apple: Creating ePub files with Pages kjk Apple Devices 1 08-26-2010 05:46 PM
Docvert 2.0 converts MS Word files to clean HTML Alexander Turcic Lounge 0 03-16-2006 04:50 AM


All times are GMT -4. The time now is 06:44 AM.


MobileRead.com is a privately owned, operated and funded community.