|
![]() |
|
Thread Tools | Search this Thread |
![]() |
#16 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 682
Karma: 4568205
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
|
Well, if I get a pdf with a clean page printout on each page, run it through Finereader, it would give me a useful epub (with some Sigil work).
It would be way easier to buy the paper book, but... why not... ![]() (This is more driven by my personal curiosity and that I like to learn how-to-do stuff.) |
![]() |
![]() |
![]() |
#17 | |
Bookmaker & Cat Slave
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
The alternative, as you said, is to TRY to get it to a print layout and scan it from there. Calibre isn't really meant to tackle fixed-layout, so...not sure that there's a path there. My best guesstimate is that Tex may have some thoughts. INDD-generated FXL is a tough cookie. Hitch |
|
![]() |
![]() |
![]() |
#18 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,750
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I was able to convert this eBook into a non-fixed layout eBook. But it's divided into pages to it's not yet properly reflowable. You would have to merge the pages. Also, one other issue is sometimes the last paragraph is split between two HTML files.
I did this with a lot of searching/replacing using regex and Diaps Editing Toolbag plugin for the Calibre editor. But just WOW! That eBook is one heck of a disaster. But it can be fixed if you want to put the work into it. In order to save you from having to edit the CSS, here is the CSS I ended up with after my editing. Code:
@font-face { font-family: Constantia; font-style: normal; font-weight: normal; src: url(../font/Constantia.ttf); } @font-face { font-family: Constantia; font-style: normal; font-weight: bold; src: url(../font/Constantia-Bold.ttf); } @font-face { font-family: Constantia; font-style: italic; font-weight: bold; src: url(../font/Constantia-BoldItalic.ttf); } @font-face { font-family: Constantia; font-style: oblique; font-weight: bold; src: url(../font/Constantia-BoldItalic.ttf); } @font-face { font-family: Constantia; font-style: italic; font-weight: normal; src: url(../font/Constantia-Italic.ttf); } @font-face { font-family: Constantia; font-style: oblique; font-weight: normal; src: url(../font/Constantia-Italic.ttf); } @font-face { font-family: "Milton Two Bold"; font-style: normal; font-weight: normal; src: url(../font/MiltonTwoBold.otf); } body { widows: 1; orphans: 1; margin-top: 0; margin-right: 0; margin-bottom: 0; margin-left: 0; text-align: justify; } img { max-height: 100%; max-width: 100%; } p { margin-top: 0; margin-bottom: 0; text-indent: 1.2em; } p.ParaOverride-1 { } img._idGenObjectAttribute-1 { min-width: 100%; } Last edited by JSWolf; 05-05-2021 at 02:54 PM. |
![]() |
![]() |
![]() |
#19 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 910
Karma: 12671918
Join Date: Jul 2017
Device: Boox Nova 2
|
Quote:
|
|
![]() |
![]() |
![]() |
#20 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,750
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Here is one of the HTML files after my editing.
Code:
<?xml version='1.0' encoding='utf-8'?> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops"> <head> <meta content="width=396,height=612" name="viewport"/> <title>Petrified_Minds_TRIPLE_KINDLE_FINAL.mobi-12</title> <link href="css/idGeneratedStyles.css" rel="stylesheet" type="text/css"/> <meta content="urn:uuid:9ba17578-df77-4ec5-9ee1-6309573d6d23" name="Adept.expected.resource"/> </head> <body> <p class="ParaOverride-1">He also wasn’t looking for answers regarding the relationship between Jason and his parents—he wasn’t interested in that. Why? Because whatever Jason would tell him would be information from the conscious mind—that’s why the hypnotist needed to know about specific things. </p> <p class="ParaOverride-1">“Are they still married?” Jason indicated they were. “To each other?” </p> <p class="ParaOverride-1">Jason nodded. “Yes . . .”</p> <p class="ParaOverride-1">Then, it was time to get down to it. “Any brothers or sisters?”</p> <p class="ParaOverride-1">“One brother.” </p> <p class="ParaOverride-1">“Older or younger?”</p> <p class="ParaOverride-1">“Younger.”</p> <p class="ParaOverride-1">“How much younger?” </p> <p class="ParaOverride-1">“Three years . . .”</p> <p class="ParaOverride-1">Sterling paused for a moment. “You’re married—is this your first wife? </p> <p class="ParaOverride-1">“First wife.” Jason swiped his right palm on his pants.</p> <p class="ParaOverride-1">“How old were you when you got married?” </p> <p class="ParaOverride-1">“When I got married, I was twenty-two.” </p> <p class="ParaOverride-1">“And, you’re now twenty-four, so it’s a couple of years. Children?”</p> <p class="ParaOverride-1">“One . . .” </p> <p class="ParaOverride-1">“What’s your relationship with alcohol?” </p> </body> </html> |
![]() |
![]() |
![]() |
#21 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 910
Karma: 12671918
Join Date: Jul 2017
Device: Boox Nova 2
|
For completeness here's an example using a HTML/XML parser in my programming language of choice R. I put one of the HTML snippets from this thread in a file called "test.html".
Code:
##install xml2 package and load it install.packages("xml2") library(xml2) ##read in the HTML file arf = read_html("~/test.html",options="RECOVER") ##find all span nodes using xpath selectors spans <- xml_find_all(arf,"//span") ##Replace them with just their text contents xml_replace(spans,xml_contents(spans)) ##Write out the file write_html(arf,"~/testOut.html") Code:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><body id="xx" lang="en-US" style="width:396px;height:612px" xml:lang="en-US"> <div class="Basic-Text-Frame" id="_idContainer250"> <div style="width:5760px;height:9540px;position:absolute;top:0px;left:0px;-webkit-transform-origin: 0% 0%; -webkit-transform: translate(0px,5.83px) rotate(0deg) scale(0.05);transform-origin: 0% 0%; transform: translate(0px,5.83px) rotate(0deg) scale(0.05);"> <p class="Chapter-Title ParaOverride-1">Time to Forgive</p> <p class="Drop-Cap ParaOverride-1">“I want you to imagine your reflection in a beautiful mirror—the person who caused </p> </div> </div> </body></html> EDIT: I Think R mangled the em dashes and quotes with it's crappy text support though. Last edited by salamanderjuice; 05-05-2021 at 03:12 PM. |
![]() |
![]() |
![]() |
#22 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,750
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
It's easier to use Regex/Diaps Editing Toolbag then a parser. I did my edits in less then 20 minutes. I can't say exactly how long because I was also watching TV at the same time.
I used the Calibre editor and it worked. The net step is to fix the broken paragraphs and merge the HTML files and then split them appropriately. And also add the cover. |
![]() |
![]() |
![]() |
#23 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 910
Karma: 12671918
Join Date: Jul 2017
Device: Boox Nova 2
|
Quote:
|
|
![]() |
![]() |
![]() |
#24 | |||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
It supports normal, reflowable EPUBs only. Quote:
Never had to really work on a book that "required" such fixed layouts (like double-page spread cookbooks, magazines, etc.). * * * IF I only had Fixed-Layout EPUB to work from: I think it might be better to use Calibre to convert to RTF or TXT (Markdown)... some simpler format that doesn't have <span>s and crap, but still supports basic formatting (Bold/Italics/Headings). To test that, go into Calibre: 1. Right-Click your book > Convert Books > Convert individually. 2. In the upper right dropdown, select Output Format: RTF (or TXT). 2.5. If you chose TXT, on the left-hand side, select TXT output. In General > Formatting, change the dropdown from "plain" to "markdown". 3. Convert. That should generate a file that's minus a lot of the HTML cruft, but should still carry over italics/bold, etc. I think that'd be infinitely easier to clean than a FXL-EPUB->PDF/Screenshots->Finereader->EPUB roundtrip. Note: This should work well for something like patrik's example, a normal Fiction book. For very complicated books like comics/children's—with highlighting/read-along/curvy text within images—or heavy Maths/Physics, that method wouldn't work. Alternate #2: Maybe even a Calibre EPUB->EPUB conversion might be able to condense a lot of that crap down. But first, I'd run regex to remove inline: Code:
id="_idTextSpan41408" top:2589.14px; left:1107.67px; letter-spacing:0.73px;
then running a Calibre EPUB->EPUB should morph all those thousands of <span>s into a smaller amount of <span class="calibre##"> classes. So Step 1 (Original): Spoiler:
Step 2 (The 4 Regexes): Code:
<p class="Drop-Cap ParaOverride-1"><span class="CharOverride-14" style="position:absolute;">“</span><span class="CharOverride-3" style="position:absolute;">I </span><span class="CharOverride-3" style="position:absolute;">want </span><span class="CharOverride-3" style="position:absolute;">you </span> Code:
<p class="Drop-Cap ParaOverride-1"><span class="calibre1">“</span><span class="calibre2">I </span><span class="calibre2">want </span><span class="calibre2">you </span> If you're lucky, the book will condense down to only a few dozen calibre## classes... If you're unlucky, the book will condense down to hundreds/thousands of calibre## classes. Quote:
![]() Agreed. "20 minutes manual cleanup per book" vs. "a few seconds to run a parser". Although I haven't seen enough (disgusting) FXL EPUBs to know what potential ugly code you'd run across. All I know is that every single word—and sometimes character, as Hitch said—is wrapped in a <span> with enough manual styling to fill up your entire screen, flying off the monitor. To throw out all <span>s might not be right... so I'd go in and surgically remove certain inline styles (like top/left/letter-spacing). But sometimes it's easier to throw out nearly everything, then add the rare formatting exceptions back in later. (Like blockquotes, poetry, etc.) All depends on the book... Side Note: And InDesign's mentality is to always go back to InDesign as your "source document", do your fixes/adjustments there, then reexport. Never to create human-readable/maintainable code in the EPUB/output itself. Last edited by Tex2002ans; 05-05-2021 at 07:12 PM. |
|||
![]() |
![]() |
![]() |
#25 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 682
Karma: 4568205
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
|
Thanks everyone. You have given me plenty of stuff to ponder and try.
![]() |
![]() |
![]() |
![]() |
#26 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,750
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
If you replace the existing CSS with the CSS I posted, you can then do a Remove unused CSS rules in Calibre's editor and clean up a lot of the mess. I cannot say how well fixed the HTML code will be after that, but it will be a lot better.
|
![]() |
![]() |
![]() |
#27 |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 46,190
Karma: 168983734
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
But gee golly whiz, Jon, I tried your CSS and bold, italic, etc. stopped working. There are subtler ways than using a sledgehammer to fix things.
|
![]() |
![]() |
![]() |
#28 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,750
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
After having a look at the original CSS, there is no bold or italic in the main text that actually matters. I didn't find any bold or italic in the main text. If I am incorrect, please let me know where so I can see the code.
Last edited by JSWolf; 05-07-2021 at 10:20 AM. |
![]() |
![]() |
![]() |
#29 | |
Bookmaker & Cat Slave
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
God knows, I've seen some books that I'd have LOVED to hammer.... ![]() Hitch |
|
![]() |
![]() |
![]() |
#30 |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 46,190
Karma: 168983734
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
That I can agree with. Though at times, taking the hammer to the author's computer seems like it would be more useful.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
CSS Styles From InDesign 5.5 Export Problem | SamL | ePub | 4 | 11-10-2011 01:46 PM |
Export to ePub from InDesign CS5 | gardefjord | ePub | 42 | 10-29-2011 10:42 AM |
InDesign export as ePub? | Alda | General Discussions | 3 | 01-24-2011 12:59 PM |
EPUB Expert Needed: Cant properly export epub from InDesign | crottmann | ePub | 17 | 08-27-2010 10:23 AM |