Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 07-14-2020, 08:33 AM   #16
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 205
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
Thanks for the great input.

For some reason, the cover doesn't appear in the EPUB when read on my e-reader.

It's 322px wide and 500px tall. I tried JPG and PNG to no avail.

Is it something in LibreOffice?

Shohreh is offline   Reply With Quote
Old 07-14-2020, 09:41 AM   #17
Quoth
Still reading
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 13,851
Karma: 103895653
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
I add the cover to Calibre and use it to convert Docx to epub.
Then epub or mobi, azw or whatever.
I edit in LO Writer, saving/editing in odt format. I do an EXTRA save in docx for Calibre, and never open that in LO Writer as Writer will ALWAYS convert on load any non-odt file.

The LO Writer (or earlier years, MS Word) never ever has the cover in it.

I edit covers in The Gimp, in layers, stored native at about x4 the resolution for an ebook. I export various resolutions of jpg and png files for different purposes:
Upload to Amazon / Smashwords etc and thence to Kobo, Apple, B&N, Tolino etc. The uploaded ebook has a lower resolution cover added by Calibre.

Then a different jpg might be used for our blog or other promotional material
A paper version will use 300dpi, 400 dpi or 600dpi depending on process/quality and thus a larger book format needs a larger image as the DPI has to be the same.

The same Epub2 is uploaded to Amazon KDP and Smashwords, but Smashwords also gets a dual mobi (because they can't tell what Kindle their customers have) as well as maybe a .doc for additional formats.
Amazon does their own conversion into all their formats from the epub2, including fully enhanced typeset KFX (there is no reason why a Kindle can't have a FW update so azw renders the same). KFX is really about delivery and DRM.

The goal, usually achieved, is that azw, kfx, epub2 should all look about the same and the same as the view in LO Writer. Old Mobi should have at least serif, sans, mono all in normal, bold, italic, bold-italic, larger headings, correct justification, relatively similar relative offsets to non-body margins, TOC, page breaks and links corresponding to the epub2/azw.

Calibre does a good job, but it needs fed with a docx where the styles and TOC are done correctly.
I auto create the index to level 2 (the headings are all level1 or level2 and EVERYTHING not in the index / TOC is body level), copy to a plain text editor, paste back and format.
I put anchors ONLY at the start of a paragraph (each heading is also a paragraph) and then select each line of the text index and edit link. The anchor is entered just in the URL box, not via document browse, just putting # prefix. The anchors are all lowercase with no punctuation, spaces or accents, typically ch2, ch3 etc. Then Calibre makes the ebook NCX from that correctly formatted user index, which is also inline in the ebook.

Last edited by Quoth; 07-14-2020 at 09:56 AM.
Quoth is offline   Reply With Quote
Advert
Old 07-14-2020, 10:39 AM   #18
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 205
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
Thanks for the tip.

For some reason, when Calibre converts the ODT into EPUB, the cover appears twice in the EPUB as displayed on the computer with SumatraPDF, one page after the other, but only once as expected on the e-reader. Oh, well.
Shohreh is offline   Reply With Quote
Old 07-14-2020, 11:01 AM   #19
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Shohreh View Post
For some reason, the cover doesn't appear in the EPUB when read on my e-reader.
How are you converting it from LibreOffice to EPUB?

Are you using Calibre? Or trying to use LibreOffice's built-in Export As > Export As EPUB?

* * *

If you just want a quick conversion "that just works":

Saving as a DOCX copy, then use Calibre to convert DOCX->EPUB.

Calibre should detect and convert the first image in the document as a cover.

Note: DOCX->EPUB works a little bit cleaner than ODT->EPUB. Of course, keep your source document as ODT, but only save as DOCX temporarily for the conversion.

Quote:
Originally Posted by Shohreh View Post
Is it something in LibreOffice?
LibreOffice covers should also be working fine. If you press Export As > Export As EPUB, do you see this?

Click image for larger version

Name:	LibreOffice.6.4.4.EPUB.Cover.png
Views:	322
Size:	5.5 KB
ID:	180597

Which version of LibreOffice do you have?

Last edited by Tex2002ans; 07-14-2020 at 11:04 AM.
Tex2002ans is offline   Reply With Quote
Old 07-14-2020, 11:45 AM   #20
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 205
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
Directly from the ODT file.

Also, the ToC is totally different:


I'll try the ODT → DOCX → EPUB alternative.
Shohreh is offline   Reply With Quote
Advert
Old 07-15-2020, 10:50 AM   #21
Quoth
Still reading
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 13,851
Karma: 103895653
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
Quote:
Originally Posted by Shohreh View Post
Thanks for the tip.

For some reason, when Calibre converts the ODT into EPUB, the cover appears twice in the EPUB as displayed on the computer with SumatraPDF, one page after the other, but only once as expected on the e-reader. Oh, well.
Don't ever import odt to Calibre. Do an EXTRA save as to docx. Import docx to Calibre.
There is a setting to NOT detect covers in docx, I think, otherwise the first image, whatever it is will replace the cover set manually in Metadata Browse for cover!
Don't include the cover in the actual wordprocessor file!

The plugin (older LO) or built in epub export in LO Writer is very poor compared to an extra Save As docx, import to Calibre.
Make sure page setup image properties are 'Tablet' to avoid resizing images.
Convert to epub2.
Do any other formats from the epub2.

Last edited by Quoth; 07-15-2020 at 10:54 AM.
Quoth is offline   Reply With Quote
Old 07-16-2020, 12:39 AM   #22
Sarmat89
Fanatic
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 516
Karma: 2268308
Join Date: Nov 2015
Device: none
Please, never-never-never use Tesseract or other headless OCR systems for books. All text must be proofed interactively. Also, that frontend is very primitive and it will mess up the text formatting.
Sarmat89 is offline   Reply With Quote
Old 07-16-2020, 07:11 AM   #23
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,624
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Quote:
Originally Posted by Sarmat89 View Post
Please, never-never-never use Tesseract or other headless OCR systems for books. All text must be proofed interactively. Also, that frontend is very primitive and it will mess up the text formatting.
After over one year of exclusive use of Tesseract (about 50 ebooks), I strongly disagree.

Tesseract 4.11, coupled with the latest tessdata 2.4. (ENG and FRA tested) is quite able to ocr efficiently any book.

With a good quality scan, you can even ocr directly a full book (about 30 pages minute) and save in text format. The graphic interface (gImageReader-qt5) is quite clean.
- first you can proofread your text line by line
- with a click, the text is changed into paragraphs interspersed with empty ones.
Roughly I would say, on average, you may have one mistake a page (including accents, punctuation).

Cons

No italics, no anchors that need to be set up manually.
Garbage for full white pages (?)

Free tip

If you have a white text on a black background, Tesseract will give you a blank page. So, open a terminal and use imagemagick first with this command (adapt as needed), then proceed as usual.

Code:
convert name-image.jpg -channel RGB -negate output.jpg

Last edited by roger64; 07-23-2020 at 06:54 AM. Reason: had forgotten convert...
roger64 is offline   Reply With Quote
Old 07-18-2020, 04:50 PM   #24
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 205
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
This time, I'm trying to convert a PDF to EPUB.

Lucky me, gImageReader says: "PDFs with text: These PDF files already contain text".

Indeed, when opening the file in Windows, the text is copy/pastable with the mouse, so it's not scanned images.

FWIW, here's what cpdf says about it:
Code:
XMP pdf:Producer: Adobe Acrobat 10.0 Paper Capture Plug-in with ClearScan
XMP xmp:CreatorTool: Canon
What would you recommend I do to turn it into an EPUB?
Shohreh is offline   Reply With Quote
Old 07-18-2020, 11:27 PM   #25
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Shohreh View Post
This time, I'm trying to convert a PDF to EPUB.

[...]

Code:
XMP pdf:Producer: Adobe Acrobat 10.0 Paper Capture Plug-in with ClearScan
What would you recommend I do to turn it into an EPUB?
ClearScan is just one of Adobe's technologies to clean a scan by replacing the actual bitmaps with generated "custom fonts". It may look like a purely digital file, but in reality it's still a scanned document.

For more info, see: https://blogs.adobe.com/acrolaw/2009...rscan_is_smal/

All OCR errors and usual PDF->EPUB recommendations still apply.

Last edited by Tex2002ans; 07-18-2020 at 11:30 PM.
Tex2002ans is offline   Reply With Quote
Old 07-19-2020, 05:56 AM   #26
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 205
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
That's why it looked like scanned pages, but the text is still selectable like text PDF.

I'll play with Sigil and see if it's more convenient to build an EPUB than LibreOffice Writer. The mid-page carriage returns are especially annoying.
Attached Thumbnails
Click image for larger version

Name:	2819BBC4-079A-4CFE-95A8-8C129A119028.png
Views:	317
Size:	24.0 KB
ID:	180748   Click image for larger version

Name:	31EAEAD4-1BAD-479A-BB4C-22EF4204E083.png
Views:	304
Size:	27.8 KB
ID:	180749  
Shohreh is offline   Reply With Quote
Old 07-19-2020, 06:55 AM   #27
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,681
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by Shohreh View Post
The mid-page carriage returns are especially annoying.
Transtools Unbreaker tool (Word Add in) can fix most of those, including when they're in tables

BR
BetterRed is offline   Reply With Quote
Old 07-19-2020, 07:06 AM   #28
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 205
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
Thanks. Interesting that there's no free alternative for eg. LibreOffice. I guess it's harder than it looks.

http://www.translatortools.net/produ...ools/unbreaker

--
Edit: Opening the PDF in Abbyy FineReader does a pretty good job. Gone are the mid-sentence linebreaks (on a few test pages, at least).

An AutoIT script might come in useful to automate the process.
Attached Thumbnails
Click image for larger version

Name:	E57FF56E-2CFF-4CC6-B00C-4E2B38D44FB0.png
Views:	316
Size:	185.1 KB
ID:	180755  

Last edited by Shohreh; 07-19-2020 at 10:05 AM.
Shohreh is offline   Reply With Quote
Old 07-19-2020, 04:12 PM   #29
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 205
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
Incidently, if a PDF contains two layers (scanned pages as bitmaps, and OCRed text), is there an application that can extract just the text layer, so I can open it Sigli or LibreOffice?

I checked cpdf, mutool, and qpdf, but saw no obvious command, even just to list layers.
Shohreh is offline   Reply With Quote
Old 07-19-2020, 05:28 PM   #30
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,772
Karma: 103362673
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by Shohreh View Post
Incidently, if a PDF contains two layers (scanned pages as bitmaps, and OCRed text), is there an application that can extract just the text layer, so I can open it Sigli or LibreOffice?

I checked cpdf, mutool, and qpdf, but saw no obvious command, even just to list layers.
pdftotext:
https://en.wikipedia.org/wiki/Pdftotext

Also, k2pdfopt, documented in the PDF forum at mobileread.
j.p.s is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
An advice on OCRing, please. nlundberg Workshop 6 03-13-2013 06:29 AM
Book Designer Hints and Tips Patricia Workshop 59 06-10-2010 07:14 AM


All times are GMT -4. The time now is 04:16 PM.


MobileRead.com is a privately owned, operated and funded community.