![]() |
#1 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 248
Karma: 1312
Join Date: Mar 2010
Device: jetbook lite
|
PDF to text
Could anyone recommend a program that converts pdf to text that doesn't insert line breaks right smack in the middle of sentences? And I want to be able to convert hundreds of files at a time.
Been using nitro, but the damn thing still insert line breaks where the page breaks are. It does a fine job at recognizing paragraphs, but when the text goes from one page to the next the program inserts a line break. Many thanks. |
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,466
Karma: 6900052
Join Date: Dec 2009
Location: The Heart of Texas
Device: Boox Note2, AuraHD, PDA,
|
Don't know if this is a real answer, but the new JBL 0.16b BETA has PDF reflow.
Luck; Ken |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 248
Karma: 1312
Join Date: Mar 2010
Device: jetbook lite
|
I would still like to convert all my pdf ebooks to text. Did you know that pdf format was created by the devil? There were much human sacrifices involved.
|
![]() |
![]() |
![]() |
#4 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,556
Karma: 145863177
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
There is no program available that will take your PDF and convert it to text without errors. The only way to fix the errors is take the converted text and a/b compare with the original PDF and fix the errors. Otherwise, you are out of luck.
|
![]() |
![]() |
![]() |
#5 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
|
A pitch for the freebies: have you tried pdfreflow, or calibre, or Abiword?
No don’t expect perfection from any of these. PDF was meant as an output format, not an input format, and hence doesn’t contain the necessary semantic mark-up, and so all these conversions rely on somewhat imperfect algorithms for “guessing” document structure based on layout, which is not easy to do. They can all do batch jobs, since they can all do their work from the commandline. Probably best to settle on a tool before getting into details about batch conversions, however. I stand by my belief that PDF is still the best ebook format there is; it’s just not the best for distribution. Last edited by frabjous; 07-28-2010 at 12:08 AM. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Bookaholic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,391
Karma: 54969924
Join Date: Oct 2007
Location: Minnesota
Device: iPad Mini 4, AuraHD, iPhone XR +
|
I have good luck with Acrobat Pro 9 converting to HTML. Never tried it for plain text if that's what you're wanting as I want to keep formatting intact and TXT can't do that. If it's plain text you're after I think Acrobat Reader will export/save as that, but I don't know what kind of job it does. It's really dependent on how the PDF was created and from what kind of source.
|
![]() |
![]() |
![]() |
#7 | |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 248
Karma: 1312
Join Date: Mar 2010
Device: jetbook lite
|
Quote:
|
|
![]() |
![]() |
![]() |
#8 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,556
Karma: 145863177
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Even Acrobat 9 generates errors.
|
![]() |
![]() |
![]() |
#9 |
Enthusiast
![]() ![]() ![]() ![]() ![]() Posts: 47
Karma: 416
Join Date: Aug 2009
Device: Black PRS-650 with lighted cover & Kindle DX.
|
I use PDF Converter Pro from Nuance. My version is a little old. I have version 4 while version 6 is the most current version. Version 7 should be out in August according to a representative I talked to. This is a pay product, but worth the price I feel.
Attached is a sample that I just made. I have a paper I downloaded from the web that I converted so you can see the results. http://www.nuance.com/ |
![]() |
![]() |
![]() |
#10 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 248
Karma: 1312
Join Date: Mar 2010
Device: jetbook lite
|
You are right. Nuance is a good converter. But then I tried changing the font around and it didn't take me long to notice the weakness.
|
![]() |
![]() |
![]() |
#11 |
Enthusiast
![]() ![]() ![]() ![]() ![]() Posts: 47
Karma: 416
Join Date: Aug 2009
Device: Black PRS-650 with lighted cover & Kindle DX.
|
Can you upload a sample of a document you want to convert.
|
![]() |
![]() |
![]() |
#12 |
Bookaholic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,391
Karma: 54969924
Join Date: Oct 2007
Location: Minnesota
Device: iPad Mini 4, AuraHD, iPhone XR +
|
|
![]() |
![]() |
![]() |
#13 | |
Bookaholic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,391
Karma: 54969924
Join Date: Oct 2007
Location: Minnesota
Device: iPad Mini 4, AuraHD, iPhone XR +
|
Quote:
If you have an example file and you want to upload it I can run it through the exporter for you to see how it does (provided there aren't copyright issues). |
|
![]() |
![]() |
![]() |
#14 |
Digitally confused
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 500
Karma: 1500000
Join Date: Mar 2010
Location: London, UK
Device: KPW, K2i, Nexus 7 32gb, Kobo Mini
|
I can't understand why there aren't simple post processors to process the text. Take the text output and join the lines together unless they end in a full stop, a question mark or a double quote.
You may need to remove page numbers if present and any chapter titles that appear at the top of each page. I managed to get this far but then found there were various funny characters in the text to represent double ll's etc and these need to be converted. My aim was to finally generate HTML and then use the chapter titles to create a TOC. I got halfway there ![]() |
![]() |
![]() |
![]() |
#15 |
Digitally confused
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 500
Karma: 1500000
Join Date: Mar 2010
Location: London, UK
Device: KPW, K2i, Nexus 7 32gb, Kobo Mini
|
It's a PDF so I suspect there just might be copyright issues. Luckily the same problem occurs with just about any PDF on the web so you could just download any PDF and then try converting that. Remember to delete it after testing though!
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
eBook PDF - free tool for creating PDF eBooks from text files | KACartlidge | 6 | 01-04-2012 09:41 AM | |
Best pdf to text/rtf/whatever I have ever seen | jblitereader | Ectaco jetBook | 13 | 07-10-2010 12:02 AM |
PDF to Text | Laura81 | Calibre | 5 | 02-18-2010 07:27 PM |
PDF Text too small! | thacursedpie | iRex | 9 | 03-18-2008 02:53 PM |