![]() |
#1 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Feb 2019
Device: Voice Dream Reader
|
Paragraph breaks being eaten in Markdown to EPUB conversion
I've got a collection of books in Markdown format, which is a good choice for them, there being little need for formatting in these books: chapter heads, maybe a little boldface and italic markup, maybe a horizontal rule to break up a second...that's enough.
However, I'd also like to have EPUB versions of these because that adds a few features I want such as remembering the last-read position, binding the author name into the book, etc. The problem is that I end up with a wall-of-text no matter how I format the source Markdown files or what adjustments I make to the various paragraph styling options in the dialog Calibre presents when you click it Convert Books button. I won't claim to have exhaustively tested all possible combinations, but I've got to be up to about 20 combinations so far. No matter what, all the paragraphs get run together until the next header break, horizontal rule, etc. Is this a bug, or is there some secret to getting proper paragraph breaks? In case it matters, my ideal Markdown flavor is UTF-8, LF-only line endings, and no hard line breaks in the source text. That is, each paragraph is on a single line, soft-wrapped to the window width in my Markdown editor, with at least two LFs separating each paragraph. (Sometimes I put in extra vertical space between major sections.) I want the resulting EPUB to show a blank line between paragraphs, with no first-line indent, so that it looks approximately like the source Markdown. You can correctly infer from my double LFs that I'm working on a POSIX type platform, macOS 10.14, specifically. These Markdown files render just fine in all the other Markdown renderers I've tried, so I'm confident that they're well-formed. I've even run them through "od -c" to make sure there aren't some odd hidden characters causing problems, but no, they're pretty much plain ASCII with the occasional UTF-8 character. (em dashes and curly quotes, mainly.) I've got the text input option set to utf-8 in the Calibre conversion dialog. I've also gone through these files to strip trailing spaces from lines, except in those rare cases where I put in 2 spaces at the end to force a soft line break. I've tried hard-wrapping these paragraphs to 72 columns, but that doesn't help, and I don't want to format these docs that way anyway. After fighting with Calibre conversion settings, I've gone and reset all the settings to the defaults to make sure it isn't some kind of configuration error on my part, and it still gives me wall-fo-text EPUB output. |
![]() |
![]() |
![]() |
#2 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 13,505
Karma: 78910112
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
|
Why not attach an example file so we can easily see what's going on?
Also what version of calibre is being used? Sent from my Nexus 7 using Tapatalk |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Feb 2019
Device: Voice Dream Reader
|
I didn't attach a sample because I'm not doing anything fancy here. The symptom is not particular to the input text.
As presented on this forum, both of our messages are valid Markdown, so you can cut and paste either message text, save it to a plain text file called x.md, drag that to Calibre, and click the "Convert books" button in the toolbar. It'll have the "Input format" set to "MD" due to the file name extension; set the "Output format" to EPUB if necessary. When the conversion completes, the EPUB will take precedence over the MD, so just double-click the book entry in Calibre to view the EPUB version. Here's what I see in the Calibre E-Book Reader for the first page of my own rendered text: ![]() So, wall-of-text, as I said. EDIT: As for the Calibre version, the above was produced with the latest, 3.39.1, but it's not a new symptom in that version. I'm only posting now because I've given up trying to hack my way around it from the end user side. Last edited by wyoung; 02-06-2019 at 06:31 PM. |
![]() |
![]() |
![]() |
#4 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,251
Karma: 16539642
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
|
@wyoung,
What calibre conversion settings have you used? Specifically those in TXT Input: - 'Paragraph style' - 'Formatting style' I used 'single' and 'markdown' respectively and it seems to work OK for me. |
![]() |
![]() |
![]() |
#5 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 13,505
Karma: 78910112
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
|
Sorry I have no further interest in helping. Had you been prepared to post sample MD file I would have experimented.
Good luck |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Feb 2019
Device: Voice Dream Reader
|
As I wrote in the original message, I've tried a lot of settings. The above screenshot was produced with the default settings.
I've previously tried the settings you give, but just out of completeness, I've tried them again, with those two changes being the only two differences from the defaults. Same result. That suggests a platform-specific bug, so I decided to try the Debug option in the conversion dialog. Immediately I see that the input/*.html file is being produced incorrectly: the whole document text is inside a single HTML <p> tag. I found a log by clicking the Jobs button in the lower right corner, but it doesn't tell me what I really want to know, which is who produces that HTML file, and according to what rules? I guess the "input plugin" it refers to is whatever's behind the "TXT input" tab in the Calibre Convert dialog, so is it entirely internal to Calibre? It isn't doing something like calling out to pandoc or similar, which would open us to platform-specific behavior? |
![]() |
![]() |
![]() |
#7 | |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Feb 2019
Device: Voice Dream Reader
|
Quote:
But if you must have one pre-prepared, I've attached the one I've been testing with. EDIT: ...which this forum appears to have eaten, perhaps because I'm too new to be trusted to create attachments? No matter, I can fake it. Cut and paste the following text into a file called x.md: Code:
This is a markdown file... ...with multiple paragraphs. ![]() Last edited by wyoung; 02-06-2019 at 07:24 PM. |
|
![]() |
![]() |
![]() |
#8 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
And you should be able to attach a file as a new user. But, the forum doesn't allow .md files, so that might be it. There would have been a message when you tried to attach it, but I know I missed it now. And I usually forget to hit the "Upload" button on the attachments dialog and wonder why the attachment isn't there later. In any case, I did what you said and put the three lines into a file. Added that to calibre (3.39.1 on a Linux box), hit the conversion without changing any options. Looked at the generated epub and I have two paragraphs. Then I repeated the test using a .txt file as .md cannot be attached here. Same results. I have attached the input .txt file and the generated epub for you to see. Below is the output log from the conversion. This includes the options used for the conversion. Comparing yours to this might give a hint for what is going on. Spoiler:
For the method that calibre does the conversion, calibre uses internal libraries. For Markdown, it looks to be a Python library from elsewhere, but it is included in the calibre codebase for all platforms. |
|
![]() |
![]() |
![]() |
#9 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Feb 2019
Device: Voice Dream Reader
|
By resetting my Calibre settings per these instructions and diffing my old settings against that default set, I've managed to narrow the problem down to a single setting: Prefs > Input options > TXT input > Remove indents at the beginning of lines.
Apparently the "Restore defaults" button is page-specific, and I didn't manage to reset this particular page in my less drastic settings resets earlier. Anyway, with this setting enabled, you get the symptom I've shown above. This has got to be a bug: there are no "indents at the beginning of lines" to remove! I've placed my example text above on a public web server in case someone feels the need to have a byte-for-byte perfect input source to test this with. But, I really need to stress this: the bug affects pretty much any Markdown input: I've been seeing this for quite a while now, and I've got hundreds of Markdown files in my Calibre library from many sources. I've got my solution, but I hope someone fixes this problem so it doesn't bite anyone else in the future. |
![]() |
![]() |
![]() |
#10 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Feb 2019
Device: Voice Dream Reader
|
Incidentally, if there's a setting one of the developers of Calibre can enable that will sort the keys of the JSON settings objects, that'd make it a lot easier to do this sort of settings directory diffing.
I get that JSON is based on Python dictionaries, and nodes are stored in the dictionary in a semi-unpredictable order, but it's an option in some JSON serialization libraries to sort the keys for this very sort of reason. Perl's JSON module calls this "canonical form", for instance. |
![]() |
![]() |
![]() |
#11 | ||
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
Quote:
|
||
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
.txt to .epub conversion with option to remove extra paragraph breaks | citac | Conversion | 9 | 12-01-2016 09:00 AM |
Losing paragraph format in txt to epub with block and markdown | Tattvadarzin | Conversion | 4 | 10-25-2013 02:49 AM |
EPUB to RTF w/out paragraph breaks | arslonga | Conversion | 2 | 02-06-2012 04:40 AM |
Paragraph breaks in ePub? | rocalisa | Calibre | 3 | 10-29-2010 03:53 PM |
PDF to EPUB - spurious paragraph breaks | RichieTheK | Calibre | 2 | 09-08-2010 11:27 AM |