Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 02-19-2011, 04:07 AM   #1
gmarco
Junior Member
gmarco began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Feb 2011
Device: Kindle
Converting problem...

Hi, I am using Calibre 0.7.46 to convert some pdf in mobi for my kindle.

Everything is fine except when I found truncated last words of the line ...

Check this url to get the idea.
http://i.imgur.com/GmeY2.png

[skipped words in the first part of the line] giardi-
no [rest of the line]

should be read as "giardino" (garden in italian).

I have tried to use a simple regex to get the "-<br>" tag away as calibre convert in this way:

Code:
La signorina Jane Marple sedeva vicino alla finestra che dava sul giardi-<br>
no, una volta fonte d'orgoglio per lei. Ma adesso non lo era più. Ora, guar-<br>
But it removes the "-<br>" But add a blank space so I get "giardi no".

What I am missing ?

Mobipocket creator works fine in convert but I'd like to use only calibre if possible.

Thanks very much for your attention.
gmarco is offline   Reply With Quote
Old 02-19-2011, 04:36 AM   #2
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Your pdf source must be using a type of hyphenation that isn't commonly used. Open a bug with the book attached at bugs.calibre-ebook.com and it should be simple enough to resolve.
ldolse is offline   Reply With Quote
Old 02-19-2011, 04:58 AM   #3
gmarco
Junior Member
gmarco began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Feb 2011
Device: Kindle
Hi,
Thanks for your reply.

I have opened the ticket and attached the pdf book.

It is here:
http://bugs.calibre-ebook.com/ticket/9047

Just out of curiosity.
It was possible to fix it by regular expression change ?
As I said I was not able to "delete" the pattern "-<br>", but only let it changed by a "space".


Thanks again for your attention.
gmarco is offline   Reply With Quote
Old 02-19-2011, 05:02 AM   #4
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
you could try -<br>\s*

I'll look at the file to see why Calibre wasn't fixing it in the first place.
ldolse is offline   Reply With Quote
Old 02-19-2011, 09:20 AM   #5
gmarco
Junior Member
gmarco began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Feb 2011
Device: Kindle
Code:
-<br>\s*
As fix it seems to work like a charm :-)
Thanks very much for it now
gmarco is offline   Reply With Quote
Old 02-19-2011, 09:54 AM   #6
gmarco
Junior Member
gmarco began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Feb 2011
Device: Kindle
I see that you close the ticket saying it doesn't happens to you.

I was not able to reply to the ticket, so I write here.

I have tried again also with default converter settings, but I got a lot of words with a dash in the middle where there was the hyphenation in the src pdf.

Not "giardino" anymore ... a lot of others.

As you can see in the link:
http://i.imgur.com/Ka098.png

I used the same file I sent to you with default converters settings.
Do you succeded in converting without these iusses using some different settings ?

Thanks again.
gmarco is offline   Reply With Quote
Old 02-19-2011, 10:18 AM   #7
gmarco
Junior Member
gmarco began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Feb 2011
Device: Kindle
If you continue to check the conversion I saw there are other iussues.

For example:

In the pdf there is a paragraph in this way:
Quote:
L'attrice si rivolse di nuovo a Heather con molta cortesia, benché con fa-
re più meccanico. «Davvero interessante quello che mi avete raccontato. E
ora, non volete qualcosa da bere? Jason? Un cocktail?»

The xml produced in the conversion is:

Quote:
<p class="calibre_20">L’attrice si rivolse di nuovo a Heather con molta cortesia, benché con fare più meccanico. «Davvero interessante quello che mi avete raccontato. E</p><p class="calibre_20">ora, non volete qualcosa da bere? Jason? Un cocktail?»</p>
"...che mi avete raccontato. E" is the last part of the line.

which continues with the next line:
"ora, non volete qualcosa da bere? Jason? Un cocktail?»"

but calibre divide the paragraph at the "E" which IMHO is wrong.

Only to let you know. Don't want to stress so much. :-)
Calibre is almost perfectly for my needs and it works great also with this little iussues that I think are to be addressed to the pdf format.

Thanks
gmarco is offline   Reply With Quote
Old 02-19-2011, 10:32 AM   #8
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
You can still add info to a ticket even when it's closed, and the people subscribed to the bug will get updates.

Based on your original report I thought the hyphens were being retained in all cases. The behavior you're seeing is by design. Hyphens can't be universally eliminated, as some words/phrases are always supposed to be hyphenated. The only way to decide what hyphens can be safely removed is to use a dictionary. However users all over the world use Calibre in many languages, and most books also use proper names, made up words, or scientific words which won't appear in any dictionary. In order to work around all that Calibre uses the book itself as its' dictionary. Hyphens are only removed if the word appears in the book without a hyphen. So those cases where there is still a hyphen means that word didn't occur a second time in the book. (alla may be an exception, a side affect of some recent work on reducing false positives)

Last edited by ldolse; 02-19-2011 at 10:39 AM.
ldolse is offline   Reply With Quote
Old 02-19-2011, 10:33 AM   #9
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by gmarco View Post
If you continue to check the conversion I saw there are other iussues.


"...che mi avete raccontato. E" is the last part of the line.

which continues with the next line:
"ora, non volete qualcosa da bere? Jason? Un cocktail?»"

but calibre divide the paragraph at the "E" which IMHO is wrong.

Only to let you know. Don't want to stress so much. :-)
Calibre is almost perfectly for my needs and it works great also with this little iussues that I think are to be addressed to the pdf format.
Read the pdf faq:
https://www.mobileread.com/forums/sho...d.php?t=118605
ldolse is offline   Reply With Quote
Old 02-19-2011, 12:03 PM   #10
gmarco
Junior Member
gmarco began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Feb 2011
Device: Kindle
Quote:
Originally Posted by ldolse View Post
Based on your original report I thought the hyphens were being retained in all cases. The behavior you're seeing is by design. Hyphens can't be universally eliminated, as some words/phrases are always supposed to be hyphenated. The only way to decide what hyphens can be safely removed is to use a dictionary. However users all over the world use Calibre in many languages, and most books also use proper names, made up words, or scientific words which won't appear in any dictionary. In order to work around all that Calibre uses the book itself as its' dictionary. Hyphens are only removed if the word appears in the book without a hyphen. So those cases where there is still a hyphen means that word didn't occur a second time in the book. (alla may be an exception, a side affect of some recent work on reducing false positives)
Understood. This is very smart modus operandi ...
BTW I used also your regular expression to remove the -<br> and the final work is perfect I have to say.

Thanks for the really appreciated esplanation.

P.s.
I am reading also the faq on pdf :-)
gmarco is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem Converting .lit to anything Global Calibre 18 10-11-2010 09:23 AM
Problem converting to EPUB in 0.7.5 ould Calibre 6 06-27-2010 08:12 AM
Problem Converting Starfish07 Calibre 3 01-07-2010 06:14 AM
Problem converting an Ebook thafrogggg Calibre 4 10-09-2009 01:06 PM
Problem converting LIT AprilHare Calibre 4 08-08-2009 08:20 PM


All times are GMT -4. The time now is 02:09 AM.


MobileRead.com is a privately owned, operated and funded community.