07-09-2018, 08:39 PM | #1 |
Bookmaker & Cat Slave
Posts: 11,447
Karma: 157030631
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Book File Brain Teasers, part 9,842...
Hey, guys:
Here's a new one. Now, this is a "Word" file, not an ePUB or whatever, but we all know enough to know how stuff goes from A-->Zed, right? I've received a Word file, which the client swears is a source file, typed by her. I suspect that this is bollocks, and here's why--the file is full of broken paragraphs. Now, that can happen from scans, from "save as Word" functions from any number of programs; we've all seen it. But here's the thing: Virtually all of the broken paragraphs come before EITHER a lower-case a, or an upper-case I. ALL of them. There's one exception, out of several hundred broken paragraphs. This woman can't SPELL HTML, must less use regex. Right? I mean, my brain immediately went to regex, but...not in a million years. Anyone have ANY ideas as to what the hell could have precipitated this? ANY ideas, no matter how crazy? I mean, there's no rush; I finally went through the source file manually and tagged all the bps, but, it frustrates me that I can't do any sort of forensic reconstruction on it (trust me, I've asked and asked), so I thought I'd ask you, lads and ladies. Anyone here ever seen this precise result? The Is have it? (couldn't resist). Hitch |
07-09-2018, 10:34 PM | #2 |
null operator (he/him)
Posts: 20,459
Karma: 26645808
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
@Hitch, Long time ago I saw instances of upper-case 'I' always breaking to a new paragraph. I can't be certain but I think it only happened if the 'I' followed punctuation i.e. .,!? then whitespace, but maybe not dialogue quotes then whitespace.
The source was newspaper and magazine articles, circa late 1980's early 90's, scanned for reading on Palm Pilots - PQA? We fixed them with sed, awk, etc Squarks that's about 30 years ago! BR Last edited by BetterRed; 07-09-2018 at 10:38 PM. |
07-09-2018, 11:25 PM | #3 | |
Grand Sorcerer
Posts: 11,306
Karma: 43993832
Join Date: Feb 2010
Location: Monroe Wisconsin
Device: K3, Kindle Paperwhite, Calibre, and Mobipocket for Pc (netbook)
|
Quote:
|
|
07-10-2018, 12:48 AM | #4 | |
Bookmaker & Cat Slave
Posts: 11,447
Karma: 157030631
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
Spoiler:
Hmmmmm...not me. We built our first desktop in the early 80's. Wow. That feels like ages ago. Hell, that's because it WAS ages ago! Hitch |
|
07-10-2018, 01:21 AM | #5 | |
null operator (he/him)
Posts: 20,459
Karma: 26645808
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
Spoiler:
BR |
|
07-10-2018, 09:10 AM | #6 |
Bookmaker & Cat Slave
Posts: 11,447
Karma: 157030631
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
|
07-11-2018, 02:33 AM | #7 |
Connoisseur
Posts: 57
Karma: 600000
Join Date: Jan 2018
Device: Galaxy Tab S2
|
If you think this Wordfile wasnt created manually but created from a scaninng software, did you had a look at the metadata?
|
07-11-2018, 09:26 AM | #8 | |
Bookmaker & Cat Slave
Posts: 11,447
Karma: 157030631
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
But it's still hinky. I don't care how much she protests--it's beyond the reasonable edge of probability that EVERY broken para comes at an "a" or an "I." That can't be natural causes. It simply can't. She claims that she had hired two other people to format it, and then SHE tried (don't get me started); one of those first two or perhaps both must have tried to S&R it, in Word, not HTML. (Or, maybe, exported it to HTML, then regexed it, then imported it back to Word, but...nyah, if they do that routinely, they'd have found these and regexed them out.) (sigh). Hitch |
|
07-11-2018, 08:45 PM | #9 |
Grand Sorcerer
Posts: 6,171
Karma: 16228536
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
|
Is it a coincidence that 'a' and 'I' are the only two single-letter words in English?
|
07-11-2018, 09:01 PM | #10 |
Bookmaker & Cat Slave
Posts: 11,447
Karma: 157030631
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
|
07-12-2018, 03:06 AM | #11 |
Grand Sorcerer
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
There's a regex-based LibreOffice/OpenOffice add-on: Pepito Cleaner.
Maybe your client used LibreOffice/OpenOffice like a typewriter and then tried to clean up the document with Pepito Cleaner. |
07-12-2018, 09:23 AM | #12 | |
Bookmaker & Cat Slave
Posts: 11,447
Karma: 157030631
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
You are so smart. Unfortunately, I'd guess, based on my last 59 (not kidding) conversations with her that she has never even heard of LO/OO. But thank you, and BTW, I'm going to keep that in mind. Hitch |
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Food for thought article Your paper brain and your Kindle brain aren't the same thing | avid01 | General Discussions | 19 | 06-22-2017 12:52 AM |
Free book (Kindle UK/Nook) The Addicted Brain [Neuroscience Self-Help] | ATDrake | Deals and Resources (No Self-Promotion or Affiliate Links) | 1 | 12-12-2011 08:26 AM |
Movies used to be my thing, but now a book makes my brain sing. | dozer250k | Introduce Yourself | 5 | 04-16-2011 08:27 AM |
Calibre only showing part of a file while adding to library | confusednow | Calibre | 2 | 09-20-2010 08:00 PM |