mathematics equations, technical mechanics and all kind of diagrams

Difermo · 03-11-2017, 06:58 AM

I'm sorry for openin new thread if one is already there. I searched on google and all what i found is to old. Date is 2012 or 2009. And I gess a lot have change since then.

I need to put some books in my eReader. It is kindle paperwhite. But if mobi or azw3 can not suport that i'm ready to buy some epub reader.

The books I need to put is for civil engineering. So there is a lot of math trigonometry, equations, integrals, lots of diagrams, tables.
Here are some examples

Any sugestions how to resolve this..

Tex2002ans · 03-11-2017, 04:54 PM

Quote:

Originally Posted by Difermo

The books I need to put is for civil engineering. So there is a lot of math trigonometry, equations, integrals, lots of diagrams, tables.

I suspect these are just PDF scans of books?

Your best bet would probably be reading this as a PDF on a larger screen (tablet/monitor).

Depending on the PDF, you may be able to do some cropping to make it a bit easier to read on your device. For example, using a tool like k2pdfopt:

https://www.mobileread.com/forums/sh...d.php?t=144711

but most of the time it just might not be possible to shrink a very large and complex 8.5"x11" page into a smaller screen.

Converting this type of complex material to a proper ebook is EXTREMELY labor intensive... and if the publisher doesn't release an ebook directly from the source material... it probably wouldn't be worth the time invested for a single individual to OCR (easily tens/hundreds of hours).

The more complex the layout (multi-column, lots of footnotes, tables, figures, captions, equations, [...]) the harder the books are to convert using OCR + the more manual intervention would be needed to fix all the broken formatting.

Quote:

Originally Posted by Difermo

I need to put some books in my eReader. It is kindle paperwhite. But if mobi or azw3 can not suport that i'm ready to buy some epub reader.

Dedicated ereaders can read PDFs... but the experience is typically very poor: slow/sluggish page turns, having to pan/scan, not being able to easily highlight text or take notes, can't resize text, etc. etc.

For example, here is The Digital Reader showing off PDFs on a Kobo Aura One (Kindles/Nooks/others are similar):

http://the-digital-reader.com/2016/0...just-no-video/

Quote:

Originally Posted by Difermo

I'm sorry for openin new thread if one is already there. I searched on google and all what i found is to old. Date is 2012 or 2009. And I gess a lot have change since then.

The largest change in this kind of material is probably MathML in EPUB3.

Equations/Formulas? Each equation is going to have to be included as bitmap images (or SVG or MathML).

Each and every equation would require laborious double-checking to make sure it is correct and require some serious markup.

The only program/engine I know of that handles OCRing Formulas is InftyReader:

http://www.sciaccess.net/en/InftyReader/

and that costs $800+.

Side Note: Back in 2013 I wrote a topic, "Tutorial: Formulas to PNG", where I sort of show off one method of digitizing equations (using LibreOffice Math):

https://www.mobileread.com/forums/sh...d.php?t=223254

I have used this to recreate formulas in books that had <50 equations... now I tend to prefer using LaTeX as a middleman... but it STILL requires a massive amount of manual work per equation. I shudder to think how long it would take working on a book that is as full as your example pages.

Nate the great · 03-11-2017, 08:23 PM

Get an iPad. Use Goodreader.

Seriously, this type of content requires a _large_ screen. And honestly, the iPad is the best for this.

Difermo · 03-12-2017, 06:32 AM

Thanks @Tex2002ans for very detail answer
@Nate the great thanks to you2

I gess it would be best to use some 9.7" tablet (ipad or some android device)
Some of pictures are a4 paper format some are smaller. But the 90% are books dimension 16cm x 24cm.
These are books for the building profession. The professor advised to make maximum use today's technology. Earlier, it was difficult, actually not possible, to pull all the books on the construction site. You always need a little of something to be reminded. So I thought to process books for kindle. Given that this is too complicated, it is better to keep them in pdf format on the tablet. It is better to drag tablet than suitcase with 20 books.

It is to bad that kindle or kobo aren't so powerful. In tablet, pdf, I can't hold word and see what it means (if i do not understand it). That was the second reason for me to place books on my kindle paperwhite..

Notjohn · 03-12-2017, 07:34 AM

Quote:

Dedicated ereaders can read PDFs... but the experience is typically very poor: slow/sluggish page turns, having to pan/scan, not being able to easily highlight text or take notes, can't resize text, etc. etc.

A publisher once sent me a review copy as a PDF, and it was two pages up! So yes, I had to do a lot of finger work on my 7-inch Fire tablet, but I did read the book, and it was better than ordering up a commercial copy. So that was what? -- 8.5 by 11 inches? Of course I only had to read half of it at any given moment, so maybe that's not a fair example.

BetterRed · 03-12-2017, 03:29 PM

Quote:

Originally Posted by Difermo

It is to bad that kindle or kobo aren't so powerful. In tablet, pdf, I can't hold word and see what it means (if i do not understand it). That was the second reason for me to place books on my kindle paperwhite..

I assume you mean something like this

Click image for larger version

Name: Capture.JPG
Views: 237
Size: 181.2 KB
ID: 155615

All I did was highlight the word 'manuscript' (double click) in the PDF (opened in PDF xChange), and press Ctrl+Shift+`, voila WordWeb popped up with a definition, the only customisation was to assign that particular key sequence as the WordWeb hotkey. I have another gadget (clickto) that will lookup the highlighted text on the Web (Google, Wikipedia, Wiktionary, etc, or even MobileRead)

If something similar can't be done on Android or iOS I'd be surprised, but if not get a Surface, its a walk in park with Windows.

One of my dislikes of Android (and iOS I would guess, were I to use it) is the difficulty of getting 'apps' to interoperate, walled gardens full of walled gardens, or as I prefer - nests of recursive arboreta.

BR

PeterT · 03-12-2017, 04:08 PM

I expect that the ability to press on a word in a PDF will depend on whether or not there is a text layer (not sure of the real term) in the PDF. If the PDF is purely a scanned image I doubt that any lookup would be possible.

BetterRed · 03-12-2017, 04:56 PM

True - maybe the Textify TSR (or whatever they're called now) might help.

Failing that, screenshoot the page [Alt/PrtScn], paste into a Onenote note [WinKey+N, Ctrl+V], use its Copy Text from picture feature [Menu Key, x], paste the resultant text into the note [Ctrl+V], now do the look up from the just pasted text (correct if necessary).

That might even be doable with the Android and iOS versions of OneNote.

But for scanned image PDF's, Evernote might be better, it can embed a PDF in a note and do the OCR in one fell swoop - lawyers love it. Evernote runs on most platforms except Linux (or last time I looked it didn't). And you can access your notebooks via the web without buggering about with DropBox and the like.

BR

Difermo · 03-12-2017, 05:39 PM

@BetterRed

But that works only if text can be selected.
I do not have original pdf of all books. I'm taking pictures of them. So to be able to select text, they must be OCR.
I think OCR will be very bad and destroy lines tables etc. So the work to fix all will probably be huge.
I'm still searching the best way to create PDF from pictures. They are not all same size since hand is not always on same distance. I will have to make some diy book scanner

Tex2002ans · 03-12-2017, 07:43 PM

Quote:

Originally Posted by Difermo

I do not have original pdf of all books. I'm taking pictures of them. So to be able to select text, they must be OCR.
I think OCR will be very bad and destroy lines tables etc. So the work to fix all will probably be huge.

As PeterT said, you could create a PDF with the image layer on top and the invisible text layer (OCR) on the bottom.

For example, that is how you can search through all the books on Archive.org:

https://archive.org/details/engineeringbook00yeom

The most accurate Open Source program is probably tesseract:

https://github.com/tesseract-ocr/tesseract

but it is commandline only (there are a few programs based off of it that do have a GUI).

I haven't tested it in years, but last I tested there was serious inaccuracies with Formatted Text (carrying over Italics/Bold/Smallcaps/Superscript/Subscript) and you had to do a ton of finagling with dictionaries + training. I also have no idea how well it handles complex formatting like Tables or Charts/Graphs with captions.

The most accurate Proprietary OCR is ABBYY Finereader (this is what I use):

https://www.abbyy.com/en-us/finereader/

It costs a bit of money ($199 for the latest version), but if you value your time, it will save you A TON of headaches.

The examples you gave of written Maths or complex equations is just not going to work well with ANY OCR programs... but at least you would be able to have all of the normal text in a book OCRed/searchable + accurate. :P

Quote:

Originally Posted by Difermo

I'm still searching the best way to create PDF from pictures. They are not all same size since hand is not always on same distance. I will have to make some diy book scanner

The worse your input, the worse the OCR... and the worse your output will be.

Taking pictures with your shaky hand/phone is not ideal because you would most likely get very fuzzy text. This is ok if you are a human trying to quickly read the image, but disastrous for OCR.

The DIY Book Scanner forums discusses quite a few designs people have rigged up + their workflows:

https://forum.diybookscanner.org/

and we also discussed quite a lot of this in the topic, "Delicate text digitalizing + scanning issues":

https://www.mobileread.com/forums/sh...d.php?t=234146

BetterRed · 03-12-2017, 08:14 PM

Quote:

Originally Posted by Difermo

@BetterRed

But that works only if text can be selected.
I do not have original pdf of all books. I'm taking pictures of them. So to be able to select text, they must be OCR.
I think OCR will be very bad and destroy lines tables etc. So the work to fix all will probably be huge.
I'm still searching the best way to create PDF from pictures. They are not all same size since hand is not always on same distance. I will have to make some diy book scanner

Ah-ha, I assumed, as others did, that the books in question were already scanned into image PDF's.

Difermo · 03-12-2017, 08:28 PM

thanks for answer. I have ABBYY but I'm not gona use it this time
To OCR only parts and to merge with diagrams and pictures to create pdf that is searchable is to much work and time. And lots of books to do. I will probably cancel work after 10-20 pages when I realise how much work is there to be done. Since i have very little free time, it is best just to take pictures and merge in one pdf
Maybe from 10 or 20 years from now, when some1 do it I will buy books.

I'll just make some bookscaner, take pictures, create pdf from them and then use it like that. If I need to look for some word I dont know, I will check it manualy in the dictionary.

Quote:

Originally Posted by BetterRed

Ah-ha, I assumed, as others did, that the books in question were already scanned into image PDF's.

Some books are in pdf formats (maybe 20%), but lots of theme are not. And there are lots of my notes

03-11-2017, 06:58 AM	#1
Difermo Member Posts: 20 Karma: 10 Join Date: Nov 2013 Device: none	mathematics equations, technical mechanics and all kind of diagrams I'm sorry for openin new thread if one is already there. I searched on google and all what i found is to old. Date is 2012 or 2009. And I gess a lot have change since then. I need to put some books in my eReader. It is kindle paperwhite. But if mobi or azw3 can not suport that i'm ready to buy some epub reader. The books I need to put is for civil engineering. So there is a lot of math trigonometry, equations, integrals, lots of diagrams, tables. Here are some examples Any sugestions how to resolve this..

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Diagrams, equations and tables in e-books	Linton	Kobo Reader	4	01-15-2014 02:57 AM
Need to create ePub from PDF with equations, diagrams and tables	prankie	ePub	4	04-25-2013 10:21 PM
Forum mechanics	jbcohen	Feedback	1	03-21-2012 08:41 AM
Troubleshooting Synchronization mechanics	sirmaru	Amazon Kindle	0	08-28-2010 06:39 PM
issues with Technical PDF docs (equations; matrice...)	tristouille	Calibre	1	01-27-2010 07:52 AM

03-11-2017, 08:23 PM	#3
Nate the great Sir Penguin of Edinburgh Posts: 12,375 Karma: 23555235 Join Date: Apr 2007 Location: DC Metro area Device: Shake a stick plus 1	Get an iPad. Use Goodreader. Seriously, this type of content requires a _large_ screen. And honestly, the iPad is the best for this.

03-12-2017, 06:32 AM	#4
Difermo Member Posts: 20 Karma: 10 Join Date: Nov 2013 Device: none	Thanks @Tex2002ans for very detail answer @Nate the great thanks to you2 I gess it would be best to use some 9.7" tablet (ipad or some android device) Some of pictures are a4 paper format some are smaller. But the 90% are books dimension 16cm x 24cm. These are books for the building profession. The professor advised to make maximum use today's technology. Earlier, it was difficult, actually not possible, to pull all the books on the construction site. You always need a little of something to be reminded. So I thought to process books for kindle. Given that this is too complicated, it is better to keep them in pdf format on the tablet. It is better to drag tablet than suitcase with 20 books. It is to bad that kindle or kobo aren't so powerful. In tablet, pdf, I can't hold word and see what it means (if i do not understand it). That was the second reason for me to place books on my kindle paperwhite..

03-12-2017, 04:08 PM	#7
PeterT Grand Sorcerer Posts: 13,998 Karma: 82524140 Join Date: Nov 2007 Location: Toronto Device: Libra H2O, Libra Colour	I expect that the ability to press on a word in a PDF will depend on whether or not there is a text layer (not sure of the real term) in the PDF. If the PDF is purely a scanned image I doubt that any lookup would be possible.

03-12-2017, 04:56 PM	#8
BetterRed null operator (he/him) Posts: 22,710 Karma: 33011292 Join Date: Mar 2012 Location: Sydney Australia Device: none	True - maybe the Textify TSR (or whatever they're called now) might help. Failing that, screenshoot the page [Alt/PrtScn], paste into a Onenote note [WinKey+N, Ctrl+V], use its Copy Text from picture feature [Menu Key, x], paste the resultant text into the note [Ctrl+V], now do the look up from the just pasted text (correct if necessary). That might even be doable with the Android and iOS versions of OneNote. But for scanned image PDF's, Evernote might be better, it can embed a PDF in a note and do the OCR in one fell swoop - lawyers love it. Evernote runs on most platforms except Linux (or last time I looked it didn't). And you can access your notebooks via the web without buggering about with DropBox and the like. BR

03-12-2017, 05:39 PM	#9
Difermo Member Posts: 20 Karma: 10 Join Date: Nov 2013 Device: none	@BetterRed But that works only if text can be selected. I do not have original pdf of all books. I'm taking pictures of them. So to be able to select text, they must be OCR. I think OCR will be very bad and destroy lines tables etc. So the work to fix all will probably be huge. I'm still searching the best way to create PDF from pictures. They are not all same size since hand is not always on same distance. I will have to make some diy book scanner