View Single Post
Old 05-22-2014, 08:31 PM   #14
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by rraod View Post
Acrobat Professional will allow you to Save a good pdf in [...] JPG formats using the SAVE AS command.
Ugh... just don't save images of TEXT DOCUMENTS as JPG. (This is one of my huge pet peeves)

I showed off an example of JPG haloing that made me pull my hair out:

https://www.mobileread.com/forums/sho...3&postcount=30

Quote:
Originally Posted by Hitch View Post
You must have been extraordinarily fortunate, or don't mind expending a LOT of time doing clean-up in HTML. I wouldn't use Acrobat Pro's export to ANYTHING feature for anything. The HTML it outputs is filthy. The Word files are just as bad. We have the entire suite of Acrobat programs--everything from InDesign to Acrobat Pro, etc., and nothing in Acrobat exports to html, Word, etc., worth a damn, in my fairly experienced opinion.
Thanks for the info... I am ALWAYS leery about these programs that convert (ESPECIALLY Adobe's programs, I know they love their bloat, and design their programs to work in THEIR ecosystem, and not play nice with others).

I hunted down a few videos/information trying to see how well the conversion ACTUALLY works, but they were not as technically in-depth as I would like.... or they were just the typical generic marketing/useless fluff that didn't say anything of substance.

I wish I knew of some trustworthy technically-minded review sites.

Quote:
Originally Posted by Toxaris View Post
Well, the Word export of ABBYY gets most footnotes right... It misses some, but that is actually rare.
The HTML/EPUB export MANGLES footnotes.

Finereader tries to create links back/forth, but it:
  • May/may not toss out the actual footnote numbers (no rhyme or reason that I can figure out).
    • I believe it is based on some sort of heuristics of a superscript number/symbol + if it is marked as a "footnote" style by Finereader
  • May or may not "combine" two footnotes into "one".
    • So Finereader sticks 1 auto-number/link, but includes the text for footnotes 1+2 as an endnote.
  • Whole footnote paragraphs may just go poof (again, no rhyme or reason that I can figure out).
    • This is especially true if the footnote is split across pages.
  • Finereader 12 has a very annoying bug that 11 did not have.
    • In certain books, let us say there are 5 footnotes on a page, it will insert five links at the END of the page, instead of where the superscripts actually are in the text.

Here is a real life example of a book I worked on earlier this month:

Click image for larger version

Name:	pg031.png
Views:	208
Size:	63.5 KB
ID:	123223Click image for larger version

Name:	pg032.png
Views:	240
Size:	58.8 KB
ID:	123224

These two pages get morphed into this on EPUB export:
  • Marked in BLUE, you can see, Finereader tries to auto-insert endnotes + renumber, but mangles it completely.
  • Marked in RED are footnotes that Finereader missed (Footnote 1 on Page 31 + Footnote 1 on page 32 just went poof).
  • Marked in GREEN is where you can see, the second half of Footnote 2 on Page 31 just went poof into thin air.
  • Marked in ORANGE, you can see the superscript went into thin air (because the link in blue = Finereader's auto-numbering).
    • Most of the time the superscript number is removed, but other times, it is STILL left there.

EPUB/HTML Exported from Finereader:

Spoiler:
Quote:
<p>During the years immediately after the war, the aid given in the tariff of 1816 was not sufficient to prevent severe depression in the cotton manufacture. Reference has already been made to the disadvantages which, under the circumstances of the years 1815-18, existed for all manufacturers who had to meet competition from abroad. But when the crisis of 1818-19 had brought about a rearrangement of prices more advantageous for manufacturers, matters began to mend. The minimum duty became more effective in handicapping foreign competitors. At the same time the power-loom was generally introduced. Looms made after an English model were introduced in the factories of Rhode Island, the first going into operation in 1817; while in Massachusetts and New Hampshire the loom invented by Lowell was generally adopted after 1816.<sup>1</sup> From these various causes the manufacture soon became profitable. There is abundant evidence to show that shortly after the crisis the cotton manufacture had fully recovered from the depression that followed the war.<a id="footnote1"></a><sup><a href="#bookmark0">1</a></sup> The profits made were such as to cause a rapid extension of the industry. The beginning of those man-ufacturing villages which now form the characteristic economic feature of New England falls in this period. Nashua was founded in 1823. Fall River, which had grown into some importance during the war of 1814, grew rapidly from 1820 to 1830.<sup>1</sup> By far the most important and the best known of the new ventures in cotton manufacturing was the foundation of the town of Lowell, which was undertaken by the same persons who had been engaged in the establishment of the first power-loom factory at Waltham. The new town was named after the inventor of the power-loom. The scheme of utilizing the falls of the Merrimac, at the point where Lowell now stands, had been suggested as early as 1821, and in the following year the Merrimac Manufacturing Company was incorporated. In 1823 manufacturing began, and was profitable from the beginning; and in 1824 the future growth of Lowell was clearly foreseen.<a id="footnote2"></a><sup><a href="#bookmark1">2</a></sup></p>

<p><a id="bookmark0"></a><a href="#footnote1">1</a></p>

<p> The following passage, referring to the general revival of manufactures, may be quoted: “The manufacture of cotton now yields a moderate profit to those who conduct the business with the requisite skill and economy. The extensive factories at Pawtucket are still in operation. ... In Philadelphia it is said that about 4,000 looms have been put in operation within the last six months, which are chiefly engaged in making cotton goods, and that in all probability they will, within six months more, be increased to four times that number. In Paterson, N. J., where, two years ago, only three out of sixteen of its extensive factories were in operation ... all are now in vigorous employment.”—“Niles’s Register,” XXI., 39 (1821). Com-</p>

<p><a id="bookmark1"></a><a href="#footnote2">2</a></p>

<p> See the account in Appleton, pp. 17-25. One of the originators of the enterprise said in 1824: “If our business succeeds, as we have reason to expect, we shall have here [at Lowell] as large a population in twenty</p>

<p>years from this time as there was in Boston twenty years ago.”—Batchel-</p>

<p>der, p. 69.</p>

<p>In Bishop, II., 309, is a list of the manufacturing villages of 1826. in which some twenty places are enumerated.</p>


If you export a large book, the footnote situation only gets much worse because of Finereader's horrible Chapter splitting, so the missing footnotes + Finereader's auto-numbering creates a huge mess.

My current method is just go through the book and do a manual pass of all of the footnotes. While I am double-checking that all of the text is there, I also just do all of the formatting (blockquotes).

Anyway, from what I gather, the DOC/ODT export doesn't have much text that magically goes poof, but those two formats come along with their own host of problems/bloat (and I don't have much experience with those formats, since my workflow is OCR -> EPUB/HTML -> Sigil -> completed EPUB).

This is what it looks the text from the two pages look like in the completed EPUB:

Spoiler:
Quote:
<p>During the years immediately after the war, the aid given in the tariff of 1816 was not sufficient to prevent severe depression in the cotton manufacture. Reference has already been made to the disadvantages which, under the circumstances of the years 1815–18, existed for all manufacturers who had to meet competition from abroad. But when the crisis of 1818–19 had brought about a rearrangement of prices more advantageous for manufacturers, matters began to mend. The minimum duty became more effective in handicapping foreign competitors. At the same time the power-loom was generally introduced. Looms made after an English model were introduced in the factories of Rhode Island, the first going into operation in 1817; while in Massachusetts and New Hampshire the loom invented by Lowell was generally adopted after 1816.<a href="#fn22" id="ft22">[22]</a> From these various causes the manufacture soon became profitable. There is abundant evidence to show that shortly after the crisis the cotton manufacture had fully recovered from the depression that followed the war.<a href="#fn23" id="ft23">[23]</a> The profits made were such as to cause a rapid extension of the industry. The beginning of those manufacturing villages which now form the characteristic economic feature of New England falls in this period. Nashua was founded in 1823. Fall River, which had grown into some importance during the war of 1814, grew rapidly from 1820 to 1830.<a href="#fn24" id="ft24">[24]</a> By far the most important and the best known of the new ventures in cotton manufacturing was the foundation of the town of Lowell, which was undertaken by the same persons who had been engaged in the establishment of the first power-loom factory at Waltham. The new town was named after the inventor of the power-loom. The scheme of utilizing the falls of the Merrimac, at the point where Lowell now stands, had been suggested as early as 1821, and in the following year the Merrimac Manufacturing Company was incorporated. In 1823 manufacturing began, and was profitable from the beginning; and in 1824 the future growth of Lowell was clearly foreseen.<a href="#fn25" id="ft25">[25]</a></p>

[...]

<p><a href="#ft22" id="fn22">[22]</a> Appleton, p. 13; Batchelder, pp. 70–73.</p>

<p><a href="#ft23" id="fn23">[23]</a> The following passage, referring to the general revival of manufactures, may be quoted: “The manufacture of cotton now yields a moderate profit to those who conduct the business with the requisite skill and economy. The extensive factories at Pawtucket are still in operation. . . . In Philadelphia it is said that about 4,000 looms have been put in operation within the last six months, which are chiefly engaged in making cotton goods, and that in all probability they will, within six months more, be increased to four times that number. In Paterson, N.J., where, two years ago, only three out of sixteen of its extensive factories were in operation ... all are now in vigorous employment.”—“Niles’s Register,” XXI., 39 (1821). Compare <i>Ibid</i>., XXII., 225, 250 (1822); XXIII., 35, 88 (1823); and <i>passim</i>. In Woodbury’s cotton report, cited above, it is said (p. 57) that “there was a great increase [in cotton manufacturing] in 1806 and 1807; again during the war of 1812; again from 1820 to 1825; and in 1831–32.”</p>

<p><a href="#ft24" id="fn24">[24]</a> Fox’s “History of Dunstable”; Earl’s “History of Fall River.” p. 20 <i>seq</i>.</p>

<p><a href="#ft25" id="fn25">[25]</a> See the account in Appleton, pp. 17–25. One of the originators of the enterprise said in 1824: “If our business succeeds, as we have reason to expect, we shall have here [at Lowell] as large a population in twenty years from this time as there was in Boston twenty years ago.”—Batchelder, p. 69.</p>

<p>In Bishop, II., 309, is a list of the manufacturing villages of 1826. in which some twenty places are enumerated.</p>


Anyway, as you can see, PDFs cause a whole host of formatting problems when trying to get it from PDF -> XYZ (particularly with split paragraphs, hard/soft hyphens, footnotes, headers/footers, numbered lists, tables, captions, etc. etc.).

Last edited by Tex2002ans; 05-22-2014 at 08:48 PM. Reason: Added some Spoiler Tags for the code.
Tex2002ans is offline   Reply With Quote