Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 02-21-2013, 10:58 AM   #1
xiongmao86
Junior Member
xiongmao86 began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Jan 2013
Device: ipad, kindle
Any suggest for auto paragraphing?

Hi guys, weekend is coming and I want something new to read.
The Calibre is good, and the book creator is good too. But there's one litter problem when I try to change the pdf into a mobi. I need to regroup the sentence separate by newline in to a paragraph.
I had try acrobat, I tried to copy it into the word processors -- notepad, writepad, word -- what ever, the newlines were still there.
I tried to save as txt, this time itwas better, but adding some unknown character show as a square, and some sentence is broken, most don't but some do.
And I tried to save as html, but their fit in <p> but separate by <br>, and also
some sentence's broken.
I tried to save as word, didn't work.

So, anyone with any suggestion about tidy up the broken sentence?

And good night, everyone. Hope that I get some advice tomorrow.
xiongmao86 is offline   Reply With Quote
Old 02-21-2013, 11:38 AM   #2
Pranananda
Connoisseur
Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.
 
Pranananda's Avatar
 
Posts: 97
Karma: 115862
Join Date: Apr 2010
Location: Humboldt County, California
Device: ipad, iPod touch, JetBook Lite
Try this utility PDFReflow. You have to start with the original PDF, which will be converted into an HTML file. From there you can move the HTML into another word processor.

Last edited by Pranananda; 02-21-2013 at 11:47 AM.
Pranananda is offline   Reply With Quote
Old 02-22-2013, 05:22 PM   #3
xiongmao86
Junior Member
xiongmao86 began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Jan 2013
Device: ipad, kindle
Ok, I tried pdfreflow, the pdf2html is fine. But something wrong with pdfreflow, first I can't used the program provided in your post, keep showing up can't find pdfreflow (is it because it's a link?). And I build one from source, and after that when I try
./pdfreflow -r --first=8 --last=15 xxx.xml
I got a lot "unknown input at line xxx", and nothing show up in current directory.
And same as the windows version of pdfreflown.
if I do
./pdfreflow xxx.xml >out.html
it came up with a out.html, but it's empty.

Why's that happen?

And I tried to generate a log in case you'll needed, but when I
./pdfreflow xxx.xml &2>out.html
but with an empty error.log.
My system is Xubuntu12 something I don't remember its 04 or 10.

Last edited by xiongmao86; 02-22-2013 at 05:32 PM. Reason: add some information new and could probably relevant.
xiongmao86 is offline   Reply With Quote
Old 02-22-2013, 08:59 PM   #4
Pranananda
Connoisseur
Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.
 
Pranananda's Avatar
 
Posts: 97
Karma: 115862
Join Date: Apr 2010
Location: Humboldt County, California
Device: ipad, iPod touch, JetBook Lite
xiongmao86, can you send me via PM (private message) the .xml file that was the output of pdf2html. Also, when you use pdf2html, you have to use the following syntax:

pdftohtml -xml mybook.pdf

Prana
Pranananda is offline   Reply With Quote
Old 02-24-2013, 01:20 AM   #5
xiongmao86
Junior Member
xiongmao86 began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Jan 2013
Device: ipad, kindle
Ok, I may had pm you. If you didn't get a message, please reply.
xiongmao86 is offline   Reply With Quote
Old 02-24-2013, 02:13 PM   #6
Pranananda
Connoisseur
Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.
 
Pranananda's Avatar
 
Posts: 97
Karma: 115862
Join Date: Apr 2010
Location: Humboldt County, California
Device: ipad, iPod touch, JetBook Lite
xiongmao86,

It looks like pdf2html has changed the xml format. For now, you can simply change this line:

<pdf2xml producer="poppler" version="0.20.3">

on line 4 of your .xml file to this:

<pdf2xml>

and things will start working again.

Prana
Pranananda is offline   Reply With Quote
Old 02-24-2013, 02:27 PM   #7
Pranananda
Connoisseur
Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.
 
Pranananda's Avatar
 
Posts: 97
Karma: 115862
Join Date: Apr 2010
Location: Humboldt County, California
Device: ipad, iPod touch, JetBook Lite
xiongmao86,

It looks like the version of pdftohtml that I coded against was 0.36. When I use pdftohtml to create an HTML file, it leaves this marker in the HTML output:

<META name="generator" content="pdftohtml 0.36">

At some point I will fix the pdfreflow command to read this newer format.

Prana
Pranananda is offline   Reply With Quote
Old 02-25-2013, 04:24 AM   #8
xiongmao86
Junior Member
xiongmao86 began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Jan 2013
Device: ipad, kindle
Yes, I check the version. It's 0.20.3, and It's released at 2012-8-10. But I don't find any 0.36 there in the poppler utils' homepage.

Probably your are using the older version, I mean 12-8 is a recent date.

Thanks for the help.

I am interesting in your algorithm on the auto paragraphing, I have seen some program doing it on chinese, and I always wonder how they do it.

Would you like to talk about your algorithm on that?
Are you doing it by checking the ending character every line. Or are you counting the max length of the line?
xiongmao86 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Is this possible: Auto-add book/magazine then auto e-mail to device? runningwithbulls Library Management 5 09-10-2012 12:27 PM
suggest default change mark_e_h Calibre 0 02-17-2012 10:06 AM
Suggest device gonzus Devices 5 01-11-2012 06:04 PM
Suggest a Story (Round 1) Moejoe Writers' Corner 110 05-17-2009 10:18 PM


All times are GMT -4. The time now is 05:21 AM.


MobileRead.com is a privately owned, operated and funded community.