Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 12-16-2018, 02:18 AM   #16
BetterRed
null operator
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 11,894
Karma: 10633600
Join Date: Mar 2012
Location: Sydney Australia
Device: none
@exaltedwombat - the screenshot in first post, presumably from Word, is riddled with anomalies - I've marked those I can see at a glance with my eyeballs, afaik conversion by almost any means will faithfully convert them to epub :

Click image for larger version

Name:	1.jpg
Views:	43
Size:	122.4 KB
ID:	168387

Given you have the document in Word, it should only take a short while to fix most of the anomalies with simple Word macros and Tox's ePub Tools.

I hazard it's a scanned PDF. As has so often been said - getting a perfect conversion of a scanned PDF is tedious. Hitch has explained how they do it in her business on numerous occasions - she probably has it tucked away in her paste buffer

BR
BetterRed is offline   Reply With Quote
Old 12-16-2018, 02:34 PM   #17
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 7,224
Karma: 65872031
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, and NookColor. 2 Droid, 1 Win8 ePUB rdrs
Quote:
Originally Posted by BetterRed View Post
@exaltedwombat - the screenshot in first post, presumably from Word, is riddled with anomalies - I've marked those I can see at a glance with my eyeballs, afaik conversion by almost any means will faithfully convert them to epub :

Attachment 168387

Given you have the document in Word, it should only take a short while to fix most of the anomalies with simple Word macros and Tox's ePub Tools.

I hazard it's a scanned PDF. As has so often been said - getting a perfect conversion of a scanned PDF is tedious. Hitch has explained how they do it in her business on numerous occasions - she probably has it tucked away in her paste buffer

BR
Actually, I think it might be a "save as...Word" type product, (when he says that Acrobat has "OCR"ed the file) but it's 6 of 1, half-dozen of another, almost literally. That's what I see, too.

I am struggling to figure out what's being asked. My last take on this is that when he puts the text into Sigil (?), he's NOT seeing indents. If that's the case, it's simply the Styles. (Or, if he's copy-pasting into Sigil, same thing--the styles have to be created/set up in the CSS.)

If that's not it, then I still don't understand the question. I thought it was "how do I NOT import the broken paragraph pilcrows/codes," but his last post indicates that isn't the question.

So....at this point, your guess is as good as mine.

Hitch
Hitch is offline   Reply With Quote
Advert
Old 12-16-2018, 04:42 PM   #18
BetterRed
null operator
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 11,894
Karma: 10633600
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by Hitch View Post
. . . Actually, I think it might be a "save as...Word" type product, (when he says that Acrobat has "OCR"ed the file) . . .
I overlooked the mention of Acrobat, probably because I've not used it since I stopped wearing button up boots :lol:

I suspect he's missed seeing the tiny second paragraph break (pilcrow) where I have written "scene break?", that would explain his complaint about EPub Tools OCR Postprocess inserting spurious scene breaks.

And those triple spaces (···) are probably missing paragraph breaks

The one I highlighted that reads: ". . . one."···"Oh, shit!" almost certainly is.

BR

Last edited by BetterRed; 12-29-2018 at 06:25 PM.
BetterRed is offline   Reply With Quote
Old 12-16-2018, 05:01 PM   #19
elibrarian
Imperfect Perfectionist
elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.
 
elibrarian's Avatar
 
Posts: 169
Karma: 345678
Join Date: Dec 2011
Location: Ĝlstykke, Denmark
Device: none
I think, the OP wants to connect the invalid linebreaks/paragraph breaks. If that assumption is correct, I don't think there is any possible way of coercing Word to do that while exporting. But there are some tools available to do some of the heavy work (but not necessarily for free):

TransTools for Word has an "UnBreaker" tool. It lists all the linebreaks, which are (probably) not valid, and the user can go through the list and mark those he thinks should be rectified. - (TransTools is a suite of VBA-macros, and as such rather slow. So it should be tried on a small subset of the file in question first to see if it's usable)

Pepito Cleaner (for OpenOffice/Libreoffice) has something similar. (It's broke on LibreOffice 6.1, but should work on 6.0)

If the PDF is something with a text-layer, the OP might try Softmakers FlexiPDF (it must be the Pro version!) to export the textlayer. It does a hell of a job figuring out the correct paragraph breaks. It has a somewhat steep price (I've never regretted buying it, though), but another software vendor, AShampoo, has leased it from Softmaker and sells it sometimes for 20$ or so.

Just some 0.02 cents suggestions - hope it might help.

Regards,

Kim
elibrarian is offline   Reply With Quote
Old 12-16-2018, 05:17 PM   #20
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 7,224
Karma: 65872031
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, and NookColor. 2 Droid, 1 Win8 ePUB rdrs
Quote:
Originally Posted by elibrarian View Post
I think, the OP wants to connect the invalid linebreaks/paragraph breaks. If that assumption is correct, I don't think there is any possible way of coercing Word to do that while exporting. But there are some tools available to do some of the heavy work (but not necessarily for free):

TransTools for Word has an "UnBreaker" tool. It lists all the linebreaks, which are (probably) not valid, and the user can go through the list and mark those he thinks should be rectified. - (TransTools is a suite of VBA-macros, and as such rather slow. So it should be tried on a small subset of the file in question first to see if it's usable)

Pepito Cleaner (for OpenOffice/Libreoffice) has something similar. (It's broke on LibreOffice 6.1, but should work on 6.0)

If the PDF is something with a text-layer, the OP might try Softmakers FlexiPDF (it must be the Pro version!) to export the textlayer. It does a hell of a job figuring out the correct paragraph breaks. It has a somewhat steep price (I've never regretted buying it, though), but another software vendor, AShampoo, has leased it from Softmaker and sells it sometimes for 20$ or so.

Just some 0.02 cents suggestions - hope it might help.

Regards,

Kim
Hey, Kim:

Long time no see. If the FlexiPDF works, then $80 is cheap, at least for anyone working with pdfs regularly. Hell, I'd be willing to try it, if it can even be used, let's say, only on fiction PDFs. You have the pro version, presumably?

(FYI, I think the OP is saying, in a roundabout way, that the export to ePUB is resulting in block-style paras. I know, it sounds daft, but that's my latest interpretation. When I tried my last to discuss broken paras, he sounded exasperated ["I'm still not getting this across, am I?"], so I don't think it's the broken paras. I hope we find out soon!)

Hitch
Hitch is offline   Reply With Quote
Advert
Old 12-16-2018, 05:30 PM   #21
elibrarian
Imperfect Perfectionist
elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.
 
elibrarian's Avatar
 
Posts: 169
Karma: 345678
Join Date: Dec 2011
Location: Ĝlstykke, Denmark
Device: none
Quote:
Originally Posted by Hitch View Post
Hey, Kim:

Long time no see. If the FlexiPDF works, then $80 is cheap, at least for anyone working with pdfs regularly. Hell, I'd be willing to try it, if it can even be used, let's say, only on fiction PDFs. You have the pro version, presumably?

(FYI, I think the OP is saying, in a roundabout way, that the export to ePUB is resulting in block-style paras. I know, it sounds daft, but that's my latest interpretation. When I tried my last to discuss broken paras, he sounded exasperated ["I'm still not getting this across, am I?"], so I don't think it's the broken paras. I hope we find out soon!)

Hitch
Oh well, I'm still here - And yes, I have the Pro version. I don't know how FlexiPDF will work on other books than fiction (I use it mostly on ocr'ed books and newspapers from various libraries), but I think they have a 30-days full trial to play with.

Regarding the question from the OP, it would be SO nice to have a sample of the actual file, since none of us here is clairvoyant (I'm not, AFAIK )

Regards,

Kim
elibrarian is offline   Reply With Quote
Old 12-16-2018, 06:01 PM   #22
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 7,224
Karma: 65872031
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, and NookColor. 2 Droid, 1 Win8 ePUB rdrs
Quote:
Originally Posted by elibrarian View Post
Oh well, I'm still here - And yes, I have the Pro version. I don't know how FlexiPDF will work on other books than fiction (I use it mostly on ocr'ed books and newspapers from various libraries), but I think they have a 30-days full trial to play with.

Regarding the question from the OP, it would be SO nice to have a sample of the actual file, since none of us here is clairvoyant (I'm not, AFAIK )

Regards,

Kim
Well, we have his screenshots. But I'm still murky on the actual question, as apparently, are we all. I'm going to dl that and give it a try, and also the TransTools. You'd be appalled at how many files we get, in Word and word-equivalents that are rampant with broken paragraphs--some are actually typed that way (I s**t thee not) and many are of course, "save as..." from Acrobat, etc. If the trans tools works, or the Flexi, it would be well worth it for us. I've never heard of either before today, so I thank you.

ETA:
Uh, FlexiPDF exports every PAGE in a PDF, as a Word file? (update: figured out how to work around that.) No, thanks. I downloaded and tried the export...hopefully, I've missed something obvious, but who the hell would want that?

And--can't say I've ever seen THIS before, it puts a paragraph mark BEFORE each paragraph, in addition to after each paragraph. Kim, have you actually used this? Honestly, at the moment, it seems that the native Adobe export would be far, far cleaner than this. Are you using this?

ETA2: and every paragraph came out, in a TABLE? Oy, run, do not walk, from that product. What a disaster! Maybe I'm a dimwit, and I used it wrongly, but so far, it's a bloody mess.

Hitch

Last edited by Hitch; 12-16-2018 at 10:14 PM. Reason: Tried FlexiPDF...
Hitch is offline   Reply With Quote
Old 12-17-2018, 12:43 PM   #23
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 7,224
Karma: 65872031
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, and NookColor. 2 Droid, 1 Win8 ePUB rdrs
Update 2 on the FlexiPDF program

So, sports fans:

I ran a test, using the FlexiPDF, and the short answer is, however Kim is using it, is 100% different than what we do here. Gotta be. Not only did I have to screw around to get it to export the entire document, but every time I do, the resulting 62 MB Word file (from a 400-page, super-clean PDF novel) crashes Word, causing serious errors. Maybe it works better in simpler environments, but...

On those pages that I did manage to export, without crashing, as a test, every paragraph is put inside a TABLE.

I did a "save as...Word" export, from the same PDF, and sure, I got broken paragraphs, but I also got the body text styles, etc. The Acrobat export was far, far superior to the FlexiPdf export.

In short, again--Kim must be using it very, very, VERY differently than we do, because this is not a program that I'd use, having seen the results. I"Ve tried probably 10x different "convert from PDF" programs, and honestly, this ranks amongst the worst. None of them are good, of course; that's the bottom line. But this one, putting each paragraph in a table? That's a whole new low.

I did buy the $25 TransTools suite; I figure if it's a disaster, I can afford to lose the money. I'm going to test it against the broken paras in the from-Acrobat export I did as part of the test. I'll report back how that does on broken paragraphs because that is a feature I could really use.

Hitch
Hitch is offline   Reply With Quote
Old 12-17-2018, 02:34 PM   #24
elibrarian
Imperfect Perfectionist
elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.
 
elibrarian's Avatar
 
Posts: 169
Karma: 345678
Join Date: Dec 2011
Location: Ĝlstykke, Denmark
Device: none
Quote:
Originally Posted by Hitch View Post
So, sports fans:

I ran a test, using the FlexiPDF, and the short answer is, however Kim is using it, is 100% different than what we do here. Gotta be. Not only did I have to screw around to get it to export the entire document, but every time I do, the resulting 62 MB Word file (from a 400-page, super-clean PDF novel) crashes Word, causing serious errors. Maybe it works better in simpler environments, but...

On those pages that I did manage to export, without crashing, as a test, every paragraph is put inside a TABLE.

I did a "save as...Word" export, from the same PDF, and sure, I got broken paragraphs, but I also got the body text styles, etc. The Acrobat export was far, far superior to the FlexiPdf export.

In short, again--Kim must be using it very, very, VERY differently than we do, because this is not a program that I'd use, having seen the results. I"Ve tried probably 10x different "convert from PDF" programs, and honestly, this ranks amongst the worst. None of them are good, of course; that's the bottom line. But this one, putting each paragraph in a table? That's a whole new low.

I did buy the $25 TransTools suite; I figure if it's a disaster, I can afford to lose the money. I'm going to test it against the broken paras in the from-Acrobat export I did as part of the test. I'll report back how that does on broken paragraphs because that is a feature I could really use.

Hitch
I only work in clean text/xhtml - not Word. That's probably the only difference. Every other piece of software I've ever used to get textlayers from pdfs have exported each and every space and linebreak for every (EVERY!) single line - except FlexiPDF.

That said, I just tried to export the textlayer from an OCR'ed pdf I've got from the Royal Library of Copenhagen, to Word, and using the standard settings, I'll admit it stinks. But if you press "Format" on the right bottom of the export dialogue, you can alter the standard settings. I removed everything, except "Text Output" and "De-hyphenate" for the Word-export, and got a nice Word-doc with none of the issues, you mention. Not perfect (because the OCR from the Royal Library is not perfect), but very, very usable.

You'll probably have to fiddle with the settings to get exactly what you want, but I think you might be a little too fast condemning FlexiPDF.

Regards,

Kim
elibrarian is offline   Reply With Quote
Old 12-17-2018, 05:30 PM   #25
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 7,224
Karma: 65872031
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, and NookColor. 2 Droid, 1 Win8 ePUB rdrs
Quote:
Originally Posted by elibrarian View Post
I only work in clean text/xhtml - not Word. That's probably the only difference. Every other piece of software I've ever used to get textlayers from pdfs have exported each and every space and linebreak for every (EVERY!) single line - except FlexiPDF.

That said, I just tried to export the textlayer from an OCR'ed pdf I've got from the Royal Library of Copenhagen, to Word, and using the standard settings, I'll admit it stinks. But if you press "Format" on the right bottom of the export dialogue, you can alter the standard settings. I removed everything, except "Text Output" and "De-hyphenate" for the Word-export, and got a nice Word-doc with none of the issues, you mention. Not perfect (because the OCR from the Royal Library is not perfect), but very, very usable.

You'll probably have to fiddle with the settings to get exactly what you want, but I think you might be a little too fast condemning FlexiPDF.

Regards,

Kim

Well, yes, exporting to plain text would certainly make things less complicated, but even the HTML export I tried was lame-ish. I shall try it again, to see what I get. I certainly don't want to send it to the Guillotine prematurely, but...we'll see!

Hitch
Hitch is offline   Reply With Quote
Old 12-29-2018, 04:34 PM   #26
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,497
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
Quote:
Originally Posted by exaltedwombat View Post
@Tex2002ans, Thanks for the response.

Yes, I have Toxaris's EPUB Tools. I'm afraid the Postprocess OCR function adds a lot of spurious scenebreaks. (And then, with this 500 page book, attempting to Generate EPUB fails with an 'out of memory' error on this powerful PC with 24GB RAM, but that's another problem.) Calibre, with Heuristic Processing turned on, does a rather better job, but there will still be a dozen false paragraph breaks in each chapter needing manual intervention.

The point is, the export from PDF to docx retains ALL the paragraph indentations. If they could only be marked in some way...?
I am late to the party, sorry about that. If you run out of memory, the Word document is not correct or you ran into the issue Word 365 had recently. That the document is 500 pages is not relevant, not even on 1GB (my development machine has 1 GB memory).

Also, I *never* use the export function for PDF. I rather re-OCR them. The results are much better than that. It was also mentioned that you can fine tune the scene detection.
Toxaris is offline   Reply With Quote
Old 12-29-2018, 05:29 PM   #27
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 7,224
Karma: 65872031
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, and NookColor. 2 Droid, 1 Win8 ePUB rdrs
Quote:
Originally Posted by elibrarian View Post
I only work in clean text/xhtml - not Word. That's probably the only difference. Every other piece of software I've ever used to get textlayers from pdfs have exported each and every space and linebreak for every (EVERY!) single line - except FlexiPDF.

That said, I just tried to export the textlayer from an OCR'ed pdf I've got from the Royal Library of Copenhagen, to Word, and using the standard settings, I'll admit it stinks. But if you press "Format" on the right bottom of the export dialogue, you can alter the standard settings. I removed everything, except "Text Output" and "De-hyphenate" for the Word-export, and got a nice Word-doc with none of the issues, you mention. Not perfect (because the OCR from the Royal Library is not perfect), but very, very usable.

You'll probably have to fiddle with the settings to get exactly what you want, but I think you might be a little too fast condemning FlexiPDF.

Regards,

Kim
P.S., Kim:

I did want to say, the Unbreaker in TransTools is simply AMAZEBALLS, and we all love it here. So, well done on that one! We are appreciative for the referral. LOVE IT.

(Still not wild about the FlexiPDF, but that may just be a matter of taste.)

Hitch
Hitch is offline   Reply With Quote
Old 12-30-2018, 04:22 PM   #28
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,497
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
Quote:
Originally Posted by Hitch View Post
P.S., Kim:

I did want to say, the Unbreaker in TransTools is simply AMAZEBALLS, and we all love it here. So, well done on that one! We are appreciative for the referral. LOVE IT.
It sounds very interesting, so I am going to take a look at it to see if I can take some of the ideas from it and implement that into the tools. Their language tool also looks interesting and I wonder how they get certain information from Word.
Toxaris is offline   Reply With Quote
Old 12-30-2018, 04:44 PM   #29
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 7,224
Karma: 65872031
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, and NookColor. 2 Droid, 1 Win8 ePUB rdrs
Quote:
Originally Posted by Toxaris View Post
It sounds very interesting, so I am going to take a look at it to see if I can take some of the ideas from it and implement that into the tools. Their language tool also looks interesting and I wonder how they get certain information from Word.
I don't know, but we REALLY like it. I've acquired licenses for everyone here that deals with cleaning Word files. Now, like all automated programs, if you get a genius that does this:

Typety-type-type-type (continuing sentence) ENTER
ENTER
typety-type-type-type-ENTER
ENTER

between lines--not paragraphs, and then does NOT hit another "enter" between paragraphs, it's deaf dumb and mute. (People who treat Word like a typewriter and type "manuscript style" with 2-line spacing, manually.) It's helpless, but so is every other automated program, even the ones we've written in-house. {shrug}. Without true AI, I don't know how anyone would address that one. Sure, you could look for lower-case letters, etc., but then, figuring out new paragraphs would be a mother. Ya know? (Well, yes, YOU do know, better than most.)

Hitch
Hitch is offline   Reply With Quote
Old 12-31-2018, 07:36 PM   #30
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,318
Karma: 2385865
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Quote:
Originally Posted by elibrarian View Post
.../...

Pepito Cleaner (for OpenOffice/Libreoffice) has something similar. (It's broke on LibreOffice 6.1, but should work on 6.0)

.../...
Kim
Linux. LO "fresh" 6.1.4.2.
Pepito cleaner is not broken. The extension icon is not displayed in the toolsbar but you can open it with "Edition/Pepito cleaner" and it works as usual.

0.01 cent.
roger64 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PRS-950 Converting a PDF? Klankster Sony Reader 5 01-07-2011 02:11 AM
Converting PDF dan1chris2 Sony Reader 1 12-08-2010 06:44 AM
Converting PDF cantona Amazon Kindle 8 06-10-2010 07:53 AM
Converting PDF cantona General Discussions 3 06-01-2010 12:53 PM
PDF Converting Help Akumag2 Calibre 0 09-04-2009 07:27 PM


All times are GMT -4. The time now is 01:20 AM.


MobileRead.com is a privately owned, operated and funded community.