12-14-2011, 10:46 AM | #1 |
Junior Member
Posts: 9
Karma: 10
Join Date: Dec 2011
Device: Sony PRS-T1, Kobo-Glo
|
Hyphens are not deleted
Hi everyone,
I am trying to convert a PDF File into EPub. I used the heuristic methode but all the hyphens are still there. So I tried it with search and replace. I learned a little bit of regular expressions and than I looked for hyphens before the linebreak with -<br> and replaced it with nothing. It deletes the hyphen but there is still a space where the hyphen has been when I look in the EPub. Any idea how I get it to work? minorum |
12-15-2011, 09:39 AM | #2 |
Evangelist
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
|
The best advice I can give you is to do the pdf->epub conversion as cleanly as possible, preserving the text. Then take the epub and open it in Sigil to do the regex work - You can use the latest Sigil beta which has a nice new regex engine.
There should not be a space with that replacement, however if you are replacing it with a space, or the following line starts with a space character, perhaps using something like -<br(\s*/?)>\s* will better match. In either case I would suggest doing work like this outside of Calibre itself. While it may seem like a bit of extra work, it often saves a lot of time in the long run and will get you the results you're looking for. |
Advert | |
|
12-15-2011, 09:44 AM | #3 | |
Well trained by Cats
Posts: 29,912
Karma: 55267620
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
With Sigil, you get to see the results of your mis-steps. Those hyphens could be ndash or minus signs. different search terms ar needed. In sigil, you an copy and paste the character and never worry about what flavor it really was |
|
12-15-2011, 10:58 AM | #4 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
Was the behavior that some were removed and some weren't? Some hyphens are intentionally preserved, unless Calibre can determine definitively that it should be removed.
If on the other hand every single hyphen from the source doc is still in the converted doc this sounds like a bug, and you can open a bug report with the pdf attached. |
12-16-2011, 04:03 AM | #5 |
Junior Member
Posts: 9
Karma: 10
Join Date: Dec 2011
Device: Sony PRS-T1, Kobo-Glo
|
Thanks for your answers.
@idolse In the automatic process of calibre none hyphens were removed. The linebreak was removed and the hyphen is then in the word li-ke this. But at all I think these PDF are by some ways not standard. It worked automaticly with others. When I tried the expression -<br>.* and tested it (Great praise for the regular expressions assistent ) nothing was marked. @all I tried it with sigil but it is a lot of work. So I started at the roots and used OCR on the document and then sigil which was a bit easyer. I also tried some of the commercial pfd to epub converter. But I have to say that most of the time the result is not better than that of calibre often worse. But then I found that the newest version of finereader converts scans and PDFs directly in epub. I got the trial and tested it. It worked marvellous! All hyphens removed. Pagenumbers invisible, even the big initials at the begining of a chapter were recognised and put in the flowing text. If you are willing to pay money for epub conversion finereader 11 is defintily worth it and you get one of the best OCR programs. I also like to say thank you to the developers of calibre. It is definiatly the best program for ebooks that is avalible. I got a reader only a short time ago and still working to get all my digital assest to work on that thing. Calibre helps a lot - so I can leave my laptop at home more often. A donation for this great project will follow. greatings from Germany minorum |
Advert | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Calibre remove soft hyphens? | zuli | Calibre | 3 | 11-08-2017 09:20 PM |
Soft Hyphens | wallcraft | Workshop | 29 | 06-12-2012 04:21 AM |
-webkit-hyphens: none; does it work in iBooks? | Balaji | Workshop | 2 | 08-23-2011 10:18 AM |
Soft Hyphens Deleted When Opened in Book View | rcgordon | Sigil | 4 | 06-16-2010 07:14 AM |
Feature request: soft hyphens | paulpeer | Sigil | 3 | 12-05-2009 01:43 PM |