Quote:
Originally Posted by gg4u
Hi Jps,
whose option is asciidoc ? tessearct ? pandoc?
|
Sorry for not being clear and not having time to elaborate until now.
asciidoc is a standalone python script that converts a very lightly marked up plain text file straight to either HTML, EPUB, or PDF with a single command each.
Basically, you put an "=" character at the front of the line with the title, "==" in front of each chapter heading, "===" in front of section titles, etc. Links, references, index, embedding and linking to images are all easy. Table of Contents, if desired, is automatically generated.
The rationale for asciidoc is at:
https://asciidoctor.org/docs/what-is-asciidoc/
A reference for asciidoc markup is at:
https://asciidoctor.org/docs/asciido...ick-reference/
I think the above is also suitable as a tutorial, but I have also just found
http://www.vogella.com/tutorials/AsciiDoc/article.html which I think is relatively new; I had not seen it before.
asciidoc writer's guide:
https://asciidoctor.org/docs/asciidoc-writers-guide/
(asciidoctor is a ruby utility that that converts asciidoc markup. I use whichever I prefer at the moment and sometimes switch back and forth. asciidoctor has pretty much taken over stewardship of asciidoc syntax.)
If you have a PDF with a text layer, extract that without using OCR. If there is no text layer, then you just need OCR to get plain text. Formatting would just get in the way.