![]() |
#16 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 205
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
Thanks for the great input.
For some reason, the cover doesn't appear in the EPUB when read on my e-reader. It's 322px wide and 500px tall. I tried JPG and PNG to no avail. Is it something in LibreOffice? ![]() |
![]() |
![]() |
![]() |
#17 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 13,851
Karma: 103895653
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
I add the cover to Calibre and use it to convert Docx to epub.
Then epub or mobi, azw or whatever. I edit in LO Writer, saving/editing in odt format. I do an EXTRA save in docx for Calibre, and never open that in LO Writer as Writer will ALWAYS convert on load any non-odt file. The LO Writer (or earlier years, MS Word) never ever has the cover in it. I edit covers in The Gimp, in layers, stored native at about x4 the resolution for an ebook. I export various resolutions of jpg and png files for different purposes: Upload to Amazon / Smashwords etc and thence to Kobo, Apple, B&N, Tolino etc. The uploaded ebook has a lower resolution cover added by Calibre. Then a different jpg might be used for our blog or other promotional material A paper version will use 300dpi, 400 dpi or 600dpi depending on process/quality and thus a larger book format needs a larger image as the DPI has to be the same. The same Epub2 is uploaded to Amazon KDP and Smashwords, but Smashwords also gets a dual mobi (because they can't tell what Kindle their customers have) as well as maybe a .doc for additional formats. Amazon does their own conversion into all their formats from the epub2, including fully enhanced typeset KFX (there is no reason why a Kindle can't have a FW update so azw renders the same). KFX is really about delivery and DRM. The goal, usually achieved, is that azw, kfx, epub2 should all look about the same and the same as the view in LO Writer. Old Mobi should have at least serif, sans, mono all in normal, bold, italic, bold-italic, larger headings, correct justification, relatively similar relative offsets to non-body margins, TOC, page breaks and links corresponding to the epub2/azw. Calibre does a good job, but it needs fed with a docx where the styles and TOC are done correctly. I auto create the index to level 2 (the headings are all level1 or level2 and EVERYTHING not in the index / TOC is body level), copy to a plain text editor, paste back and format. I put anchors ONLY at the start of a paragraph (each heading is also a paragraph) and then select each line of the text index and edit link. The anchor is entered just in the URL box, not via document browse, just putting # prefix. The anchors are all lowercase with no punctuation, spaces or accents, typically ch2, ch3 etc. Then Calibre makes the ebook NCX from that correctly formatted user index, which is also inline in the ebook. Last edited by Quoth; 07-14-2020 at 09:56 AM. |
![]() |
![]() |
Advert | |
|
![]() |
#18 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 205
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
Thanks for the tip.
For some reason, when Calibre converts the ODT into EPUB, the cover appears twice in the EPUB as displayed on the computer with SumatraPDF, one page after the other, but only once as expected on the e-reader. Oh, well. |
![]() |
![]() |
![]() |
#19 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Are you using Calibre? Or trying to use LibreOffice's built-in Export As > Export As EPUB? * * * If you just want a quick conversion "that just works": Saving as a DOCX copy, then use Calibre to convert DOCX->EPUB. Calibre should detect and convert the first image in the document as a cover. Note: DOCX->EPUB works a little bit cleaner than ODT->EPUB. Of course, keep your source document as ODT, but only save as DOCX temporarily for the conversion. ![]() LibreOffice covers should also be working fine. If you press Export As > Export As EPUB, do you see this? Which version of LibreOffice do you have? Last edited by Tex2002ans; 07-14-2020 at 11:04 AM. |
|
![]() |
![]() |
![]() |
#20 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 205
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
Directly from the ODT file.
Also, the ToC is totally different: ![]() I'll try the ODT → DOCX → EPUB alternative. |
![]() |
![]() |
Advert | |
|
![]() |
#21 | |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 13,851
Karma: 103895653
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
Quote:
There is a setting to NOT detect covers in docx, I think, otherwise the first image, whatever it is will replace the cover set manually in Metadata Browse for cover! Don't include the cover in the actual wordprocessor file! The plugin (older LO) or built in epub export in LO Writer is very poor compared to an extra Save As docx, import to Calibre. Make sure page setup image properties are 'Tablet' to avoid resizing images. Convert to epub2. Do any other formats from the epub2. Last edited by Quoth; 07-15-2020 at 10:54 AM. |
|
![]() |
![]() |
![]() |
#22 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 516
Karma: 2268308
Join Date: Nov 2015
Device: none
|
Please, never-never-never use Tesseract or other headless OCR systems for books. All text must be proofed interactively. Also, that frontend is very primitive and it will mess up the text formatting.
|
![]() |
![]() |
![]() |
#23 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,624
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Quote:
![]() Tesseract 4.11, coupled with the latest tessdata 2.4. (ENG and FRA tested) is quite able to ocr efficiently any book. With a good quality scan, you can even ocr directly a full book (about 30 pages minute) and save in text format. The graphic interface (gImageReader-qt5) is quite clean. - first you can proofread your text line by line - with a click, the text is changed into paragraphs interspersed with empty ones. Roughly I would say, on average, you may have one mistake a page (including accents, punctuation). Cons No italics, no anchors that need to be set up manually. Garbage for full white pages (?) Free tip If you have a white text on a black background, Tesseract will give you a blank page. So, open a terminal and use imagemagick first with this command (adapt as needed), then proceed as usual. Code:
convert name-image.jpg -channel RGB -negate output.jpg Last edited by roger64; 07-23-2020 at 06:54 AM. Reason: had forgotten convert... |
|
![]() |
![]() |
![]() |
#24 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 205
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
This time, I'm trying to convert a PDF to EPUB.
Lucky me, gImageReader says: "PDFs with text: These PDF files already contain text". Indeed, when opening the file in Windows, the text is copy/pastable with the mouse, so it's not scanned images. FWIW, here's what cpdf says about it: Code:
XMP pdf:Producer: Adobe Acrobat 10.0 Paper Capture Plug-in with ClearScan XMP xmp:CreatorTool: Canon |
![]() |
![]() |
![]() |
#25 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
For more info, see: https://blogs.adobe.com/acrolaw/2009...rscan_is_smal/ All OCR errors and usual PDF->EPUB recommendations still apply. Last edited by Tex2002ans; 07-18-2020 at 11:30 PM. |
|
![]() |
![]() |
![]() |
#26 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 205
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
That's why it looked like scanned pages, but the text is still selectable like text PDF.
I'll play with Sigil and see if it's more convenient to build an EPUB than LibreOffice Writer. The mid-page carriage returns are especially annoying. |
![]() |
![]() |
![]() |
#27 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,681
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
|
![]() |
![]() |
![]() |
#28 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 205
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
Thanks. Interesting that there's no free alternative for eg. LibreOffice. I guess it's harder than it looks.
http://www.translatortools.net/produ...ools/unbreaker -- Edit: Opening the PDF in Abbyy FineReader does a pretty good job. Gone are the mid-sentence linebreaks (on a few test pages, at least). An AutoIT script might come in useful to automate the process. Last edited by Shohreh; 07-19-2020 at 10:05 AM. |
![]() |
![]() |
![]() |
#29 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 205
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
Incidently, if a PDF contains two layers (scanned pages as bitmaps, and OCRed text), is there an application that can extract just the text layer, so I can open it Sigli or LibreOffice?
I checked cpdf, mutool, and qpdf, but saw no obvious command, even just to list layers. |
![]() |
![]() |
![]() |
#30 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,772
Karma: 103362673
Join Date: Apr 2011
Device: pb360
|
Quote:
https://en.wikipedia.org/wiki/Pdftotext Also, k2pdfopt, documented in the PDF forum at mobileread. |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
An advice on OCRing, please. | nlundberg | Workshop | 6 | 03-13-2013 06:29 AM |
Book Designer Hints and Tips | Patricia | Workshop | 59 | 06-10-2010 07:14 AM |