06-10-2012, 08:22 AM | #91 | |
Jr. - Junior Member
Posts: 586
Karma: 2000358
Join Date: Aug 2010
Location: Alabama
Device: Archos, Asus, HP, Lenovo, Nexus and Samsung tablets in 7,8 and 10"
|
Quote:
I suppose Penguin would be considered a BPH. If so, I considerer them one of the most egregious in this regard. Whole paragraphs out of order. Hit the convert button and push it out the door. Why worry, there is no refund on an ebook. My 2¢ Regards - John |
|
06-10-2012, 06:15 PM | #92 | |
Bookmaker & Cat Slave
Posts: 11,461
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
Wherever you're buying, you should stop--if you buy at Amazon or Nook, (and I think iBooks and Kobo) there damn sure IS a refund. I certainly wouldn't continue to buy Penguin titles if they are that bad--that's wholly unacceptable. This is absolutely not the type of "error" I was talking about--I was discussing the usual typos, etc.; not wholesale neglect." @Jellby--that reminds me, pls. look for a PM from me on an unrelated topic, speaking of your mad skills--but I also agree with you that any publisher that does not require (on from PDF or from scan titles) a character-by-character comparison, like we do, is not living up to its responsibilities to its readers. Hitch |
|
Advert | |
|
06-10-2012, 10:48 PM | #93 |
Resident Curmudgeon
Posts: 73,932
Karma: 128903250
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
There is no way to do a novel length conversion from PDF without errors. OCR can be better if you correct any issues the OCR flags as it does its thing. But I do agree that you need a full A/B comparison to make sure it's correct.
|
06-11-2012, 06:27 AM | #94 | |
Bookmaker & Cat Slave
Posts: 11,461
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
Just my $.02. Hitch |
|
06-11-2012, 10:09 AM | #95 |
Resident Curmudgeon
Posts: 73,932
Karma: 128903250
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I have converted some PDF using Acrobat Pro 8 that turned out not too badly. But of course, there were errors as nothing can convert without any errors.
|
Advert | |
|
06-11-2012, 12:07 PM | #96 |
Evangelist
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
|
I usually just crop out header/footer/page numbering; throw it over to Samatra and save it out as plain text.
If it comes out in a reasonable form (i.e not missing characters), I will progress to merging the paragraph lines, marking up chapters and the rest of it. If not, wasting time trying to clean and fix it won't save you any time: scan. |
06-11-2012, 02:26 PM | #97 | |
Enthusiast
Posts: 35
Karma: 110336
Join Date: Dec 2011
Location: Los Angeles, CA
Device: Kindle n-T, Nook Color Tablet, Nexus 6
|
Quote:
What would be the regex expression for any number with paragraph break before and after? Code:
<p> ### </p> |
|
06-11-2012, 04:12 PM | #98 |
Evangelist
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
|
<p[^<>]*>\s*\d+\s*</p>
<p[^<>]*> - p tags, that may have class or other attributes (can confuse with other tags starting with p, but simple enough for books) \s*- to collect whitespace if there is any present padding the digits \d+ - collect one or more digit characters |
06-13-2012, 11:37 PM | #99 |
Junior Member
Posts: 3
Karma: 10
Join Date: May 2012
Device: Pandigital eReader
|
Calibre TOC not surviving Sigil Generate TOC
Hi guys (and gals),
I'm trying to generate eBooks from several sources (WordPerfect, PageMaker, InDesign, etc.). My first stop on the trail is Kompozer, which lets me see the text and the crap code presented by those other programs (and place the illustrations where they belong, fix minor transition errors, set up where I want page breaks, etc.). Then I move them to Calibre where I put in the Meta Data, Cover, and generate the ePub files. My final step is going to be Sigil, and that's where I hit a wall. I don't always want to use <h1> codes for Table of Contents Entries. For example, the copyright page is NOT going to have a big<h1> headline at the top that says "Copyright Page", nor are the Dedication or Acknowledgment pages! Okay, Sigil lets me "Add Semantics," so that should be okay. HOWEVER, that's not what I get when I hit the "Generate TOC" button. All the TOC entries created in Calibre disappear (except for the ones that actually use the <h1> and <h2> tags), and the entries I added via the Sigil "Add Semantics" are a no show. Here is a sample chapter head: <body class="calibre"> <p class="ChapterNmbr2" id="calibre_pb_5"><span class="calibre14 calibre15 calibre19">-2-</span></p> <div class="calibre4"> <p class="calibre7"><span class="calibre1 chapter calibre15" id="calibre_toc_5"><a class="calibre20" id="TOC1_2"></a>Endangering the King</span></p> </div> class="ChapterNmbr2" ==> is my pagebreak separator in Calibre. The 14, 15, and 19 are center, bold, and font. it displays like this: -2- Endangering the King Where the page break is above the -2- and the "Endangering the King" is what goes into the TOC as the Chapter Title. So, how do I preserve the TOC coming from Calibre and get the Add Semantics to actually appear? Terry Kepner |
06-14-2012, 02:20 AM | #100 | ||
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Quote:
If you want to be able to automatically generate TOC entries, you’ll need to use header tags and styles. You could easily simplify your example as follows: Code:
<?xml version="1.0" encoding="utf-8" standalone="no"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title></title> <style type="text/css"> /*<![CDATA[*/ h3 { text-align: center; font-weight: bold; font-size: 130%; page-break-before: always; } /*]]>*/ </style> </head> <body> <h3 id="TOC1_2" title="Endangering the King">-2-<br /> Endangering the King</h3> <!-- text of chapter 2 --> </body> </html> Last edited by Doitsu; 06-14-2012 at 04:41 AM. |
||
06-14-2012, 11:58 AM | #101 |
Resident Curmudgeon
Posts: 73,932
Karma: 128903250
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
WOW! That is some nasty looking code. What is the source that caused that?
|
06-14-2012, 12:10 PM | #102 |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
|
06-14-2012, 12:18 PM | #103 |
Grand Sorcerer
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
|
06-14-2012, 01:49 PM | #104 |
Well trained by Cats
Posts: 29,792
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
|
06-15-2012, 03:10 AM | #105 |
Junior Member
Posts: 3
Karma: 10
Join Date: May 2012
Device: Pandigital eReader
|
The code came from Word-perfect originally and looked somewhat like this:
--------------------------- <p align="center"><span style="font-size: 10pt;"></span><span style="font-size: 11pt;"></span><span style="font-size: 10pt;"></span><span style="font-size: 15pt;"></span><span style="font-family: Goudy Old Style;"><strong></strong></span><span style="font-family: Goudy Old Style;">-2-</span></p> <p><span style="font-family: Goudy Old Style;"></span><span style="font-size: 10pt;"></span><span style="font-size: 10pt;"><strong>Endangering the King</strong></span></p> <br wp="BR1"> <br wp="BR2"> -------------------------------------- Then it went into Tidy, which removed the awful redundancies. Then I moved it into Calibre which gave the code I posted. Anyway, thanks for the response. Unfortunately it doesn't help. I cannot use <h1> headers on things like a title page, copyright page, acknowledgements page, and so forth--it would make the book look like an amateur did it instead of coming from a professional publications house. (Yeah, having the words "Title Page" above the title of the book would really look stupid, at least with Dedications and Acknowledgements I might be able to get away using one of the other header tags). And while your code looks nice, considering where I am coming from I don't want to become a professional coder for epub files anymore than you would want to become a professional graphics person just to combine a simple picture with text. The procedures I am using deliver clean enough code to do the job, I was just hoping it was something I was missing that wouldn't let me put those Add Semantic things into the TOC. It should be something they should add as an option in the drop-down box in the Generate Window: Include Semantics in TOC. Until then this is just one more limitation preventing eBooks from replacing real books. Again, thanks for your help. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Calibre: Chapter Headings | Paxman53 | Introduce Yourself | 5 | 10-22-2011 09:13 AM |
Chapter Headings | Paxman53 | Conversion | 3 | 10-12-2011 12:31 PM |
Chapter Headings on their own page? Help! | Lee5150 | Calibre | 3 | 10-06-2011 08:12 AM |
Why H1 and H2 Chapter Headings? | Ransom | Calibre | 11 | 08-10-2011 04:29 PM |
Help converting chapter headings | p3aul | Conversion | 6 | 04-03-2011 12:56 PM |