![]() |
#1 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Jan 2013
Device: ipad, kindle
|
Any suggest for auto paragraphing?
Hi guys, weekend is coming and I want something new to read.
The Calibre is good, and the book creator is good too. But there's one litter problem when I try to change the pdf into a mobi. I need to regroup the sentence separate by newline in to a paragraph. I had try acrobat, I tried to copy it into the word processors -- notepad, writepad, word -- what ever, the newlines were still there. I tried to save as txt, this time itwas better, but adding some unknown character show as a square, and some sentence is broken, most don't but some do. And I tried to save as html, but their fit in <p> but separate by <br>, and also some sentence's broken. I tried to save as word, didn't work. So, anyone with any suggestion about tidy up the broken sentence? And good night, everyone. Hope that I get some advice tomorrow. |
![]() |
![]() |
![]() |
#2 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 98
Karma: 122982
Join Date: Apr 2010
Location: Humboldt County, California
Device: ipad, iPod touch, JetBook Lite
|
Try this utility PDFReflow. You have to start with the original PDF, which will be converted into an HTML file. From there you can move the HTML into another word processor.
Last edited by Pranananda; 02-21-2013 at 11:47 AM. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Jan 2013
Device: ipad, kindle
|
Ok, I tried pdfreflow, the pdf2html is fine. But something wrong with pdfreflow, first I can't used the program provided in your post, keep showing up can't find pdfreflow (is it because it's a link?). And I build one from source, and after that when I try
./pdfreflow -r --first=8 --last=15 xxx.xml I got a lot "unknown input at line xxx", and nothing show up in current directory. And same as the windows version of pdfreflown. if I do ./pdfreflow xxx.xml >out.html it came up with a out.html, but it's empty. Why's that happen? And I tried to generate a log in case you'll needed, but when I ./pdfreflow xxx.xml &2>out.html but with an empty error.log. My system is Xubuntu12 something I don't remember its 04 or 10. Last edited by xiongmao86; 02-22-2013 at 05:32 PM. Reason: add some information new and could probably relevant. |
![]() |
![]() |
![]() |
#4 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 98
Karma: 122982
Join Date: Apr 2010
Location: Humboldt County, California
Device: ipad, iPod touch, JetBook Lite
|
xiongmao86, can you send me via PM (private message) the .xml file that was the output of pdf2html. Also, when you use pdf2html, you have to use the following syntax:
pdftohtml -xml mybook.pdf Prana |
![]() |
![]() |
![]() |
#5 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Jan 2013
Device: ipad, kindle
|
Ok, I may had pm you. If you didn't get a message, please reply.
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 98
Karma: 122982
Join Date: Apr 2010
Location: Humboldt County, California
Device: ipad, iPod touch, JetBook Lite
|
xiongmao86,
It looks like pdf2html has changed the xml format. For now, you can simply change this line: <pdf2xml producer="poppler" version="0.20.3"> on line 4 of your .xml file to this: <pdf2xml> and things will start working again. Prana |
![]() |
![]() |
![]() |
#7 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 98
Karma: 122982
Join Date: Apr 2010
Location: Humboldt County, California
Device: ipad, iPod touch, JetBook Lite
|
xiongmao86,
It looks like the version of pdftohtml that I coded against was 0.36. When I use pdftohtml to create an HTML file, it leaves this marker in the HTML output: <META name="generator" content="pdftohtml 0.36"> At some point I will fix the pdfreflow command to read this newer format. Prana |
![]() |
![]() |
![]() |
#8 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Jan 2013
Device: ipad, kindle
|
Yes, I check the version. It's 0.20.3, and It's released at 2012-8-10. But I don't find any 0.36 there in the poppler utils' homepage.
Probably your are using the older version, I mean 12-8 is a recent date. Thanks for the help. I am interesting in your algorithm on the auto paragraphing, I have seen some program doing it on chinese, and I always wonder how they do it. Would you like to talk about your algorithm on that? Are you doing it by checking the ending character every line. Or are you counting the max length of the line? |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Is this possible: Auto-add book/magazine then auto e-mail to device? | runningwithbulls | Library Management | 5 | 09-10-2012 12:27 PM |
suggest default change | mark_e_h | Calibre | 0 | 02-17-2012 10:06 AM |
Suggest device | gonzus | Devices | 5 | 01-11-2012 06:04 PM |
Suggest a Story (Round 1) | Moejoe | Writers' Corner | 110 | 05-17-2009 10:18 PM |