View Single Post
Old 05-26-2022, 09:42 AM   #524
xxyzz
Evangelist
xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.
 
Posts: 442
Karma: 2666666
Join Date: Nov 2020
Device: none
Hi, jhowell:

The `content` string from `convert_to_json_content()` contains footnote number, which causes spaCy to mark words around this number as a single named entity. For example: "Viktor Chebrikov.69 Gorbachev", the number 69 is the footnote number and "Gorbachev" is the first word of the next sentence.

Could you please split the paragraph contains footnote number(also remove this number) to separate paragraphs if this feature doesn't require too much time to implement?

Last edited by xxyzz; 05-26-2022 at 09:48 AM.
xxyzz is offline   Reply With Quote