View Single Post
Old 07-08-2020, 05:10 PM   #2
hobnail
Running with scissors
hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.
 
Posts: 1,589
Karma: 14328510
Join Date: Nov 2019
Device: none
I've converted some books that are available as PDF + TXT from archive.org. I use sumatra for opening the PDFs since it knows about the invisible/hidden text layer; maybe they all do.

I use sigil and don't need to manually remove the mid-line carriage returns; you can use a search and replace to replace the blank lines between paragraphs with end paragraph tag followed by beginning paragraph tag; </p><p>. Then jump to the top of the book and add the missing beginning of paragraph tag, then to the bottom of the book and add the missing end of paragraph tag. Then use sigil's Mend and Prettify to make it look good in sigil. The hyphens that were at the ends of lines can be found by searching for hyphen followed by a space; you can't remove them all because sometimes it was a word that's normally hyphenated.

What archive.org uses often sees a screechmark/! or ell/l as a 1 so search for digits; there are threads here about this and other common errors and regexps for searching with.

Last edited by hobnail; 07-08-2020 at 05:15 PM.
hobnail is online now   Reply With Quote