Hi, jhowell:
The `content` string from `convert_to_json_content()` contains footnote number, which causes spaCy to mark words around this number as a single named entity. For example: "Viktor Chebrikov.69 Gorbachev", the number 69 is the footnote number and "Gorbachev" is the first word of the next sentence.
Could you please split the paragraph contains footnote number(also remove this number) to separate paragraphs if this feature doesn't require too much time to implement?
Last edited by xxyzz; 05-26-2022 at 09:48 AM.
|