Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 06-05-2012, 12:31 PM   #1
pouzzler
Junior Member
pouzzler began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Apr 2012
Device: PRS-T1
Saving old magazines in a useful way

Hello everyone,

I have a complete collection of what was probably the best ever french language magazine. Various sites have it in .jpg scans, which is not very useful if you want to copy & paste in particular.

OCR won't work well enough as far as I'm concerned/on my platform (multi-column text, images, warped/rotated text, ...), therefore I'm aware I will have to put in a lot of hard work to create usable documents, but my motivation is high (for now ).

I do believe in open formats, so a rough first search led me to believe Sigil as a tool, and epub as a format would be adapted to what I have in mind, which is to create a usable (in particular the text parts must be copy/pastable) copy of my magazine collection, which looks as close as possible to the original presentation of the mag.

Would you concur? And if you don't, could you suggest something more useful?
An ideal tool would take a jpg scan, and allow me to select text zones and OCR / edit these zones, then produce an open format, such as epub.

Best regards,
Sebastian
pouzzler is offline   Reply With Quote
Old 06-06-2012, 01:54 PM   #2
Terrysaurus
Junior Member
Terrysaurus ought to be getting tired of karma fortunes by now.Terrysaurus ought to be getting tired of karma fortunes by now.Terrysaurus ought to be getting tired of karma fortunes by now.Terrysaurus ought to be getting tired of karma fortunes by now.Terrysaurus ought to be getting tired of karma fortunes by now.Terrysaurus ought to be getting tired of karma fortunes by now.Terrysaurus ought to be getting tired of karma fortunes by now.Terrysaurus ought to be getting tired of karma fortunes by now.Terrysaurus ought to be getting tired of karma fortunes by now.Terrysaurus ought to be getting tired of karma fortunes by now.Terrysaurus ought to be getting tired of karma fortunes by now.
 
Posts: 4
Karma: 501112
Join Date: Mar 2012
Device: Kindle
Wow, good luck. I'm really curious to see what others say about this. I have the same issue with some old magazines too.
Terrysaurus is offline   Reply With Quote
Advert
Old 06-07-2012, 04:11 AM   #3
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
ABBYY FineReader is currently considered the best OCR-ing software around. But it's not free, nor open. It does, however let you export to ePub, and in older versions at least in HTML format (which you can then import into Sigil). But ePub is not very good with complex layouts found in magazines... It's great for novels and such but anything above a single column is just asking for trouble.

PDF is better suited for complex layouts.

Unless you want to spend a lot of time proofreading the articles and go "vanilla" (no foreground image and background text), I suggest you apply the standard "good enough" OCR with FineReader and be done with it. It's still better than no text at all. Else you'll have to track down the fonts (or fonts that look similar), learn how to vectorize graphics, perform a lot of micro retouches and proofread the final product one last time. The quality will be amazing and it's always a pleasure to read something done right. But the time spent will be significant. You'll need to set aside a couple of hours each day to learn this stuff and in about a month, maybe two, the first magazine will be done. Sometimes it will seem like a chore, a repetitive, life sucking chore but you'll eventually start to get better at it and work faster. It's also very easy to get discouraged. Very few people stick to it.

Considered to be the best at layout and vectorizing: Adobe InDesign and Illustrator.
Open source and multi platform: Scribus and Inkscape

Have fun!
DSpider is offline   Reply With Quote
Old 06-07-2012, 04:12 AM   #4
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Personally I'd go for PDF page scans, with a searchable text layer. With a magazine, you generally want to preserve the appearance of the page, not merely the text.
HarryT is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
saving books iomari Calibre 11 10-04-2011 10:34 AM
Losing files when saving saving to disk theaccountant Library Management 4 03-10-2011 02:38 PM
Journal Not Saving Writing When Moving to a New Page or Saving it eberhardt333 enTourage Archive 5 11-24-2010 12:47 AM
saving changes only DaleDe Sigil 3 06-26-2010 07:26 AM
Saving to disk htaylor Calibre 2 01-04-2009 08:29 PM


All times are GMT -4. The time now is 07:38 PM.


MobileRead.com is a privately owned, operated and funded community.