Score:0

Selection with mouse from PDF produces weird characters

cz flag

I try selecting text with mouse from this Slovak document: https://fphil.uniba.sk/fileadmin/fif/katedry_pracoviska/sas/Publikacie/Foneticka_prirucka.pdf .

In Browser (Chromium) and in Okular I have weird characters in selection.

When I extract text in Okular from this document, I have also unrecognized characters, in a different way.

EDIT: I have found this library/tool: https://pypi.org/project/multilingual-pdf2text/ that probably would help me, but I don't know how to use it.

Is the way extract text from this document with correctly recognized characters?

vanadium avatar
cn flag
You are not indicating which text to select so other users could reproduce the issue. I see no issues copy/pasting text using Evince but I may be testing the wrong spot.
vanadium avatar
cn flag
I can confirm with Chromium: no characters are recognized. With Firefox, it goes equally well as with Evince, but now I see your problem where some characters, like Ľ and Ž, are not properly recognized.
user545 avatar
cz flag
I'm concerned about the misrecognition of letters "Ľ" and "Ž", for example, in the name Ľudmila Žigová on the front page. I figured out how the program multilingual-pdf2text works. It gives good results. I'm waiting for my friend's confirmation of results.
user545 avatar
cz flag
The result is mediocre, as my friend told me when he looked at the generated text file.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.