Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with text extracting #108

Open
KaradasV3 opened this issue Apr 12, 2022 · 1 comment
Open

Problems with text extracting #108

KaradasV3 opened this issue Apr 12, 2022 · 1 comment

Comments

@KaradasV3
Copy link

I have a problem with extracting parts of the text from some pdf files. It looks as if several threads are returning their result at the same time. There may also be problems with recognizing images in the file, although it doesn't seem to be the rule. Has anyone else had such a problem? Can I add any options to fix this?

This is how it looks like after extracting with Cermine:
finally we selected the best learning model to classify the samples into the three groups.
and finally we selected the best learning model to classify the samples into the three groups.
analysis and a feature selection procedure. Then, we developed a Cluster analysis to evaluate dataset homogeneity, and
22..33..DDaattaaPPrree--PPrroocceessssiinngg
TToo rereaaddanadnpdrepprroecpersoscreasws draatwa, ddifafetare,ndtpifafeckreangtes pinacthkea gfreasmeinwotrhkeR f(rhatmtpesw:/o/rwkwRw.
(rh-tptprosj:e//cwt.owrwg/.r,-apcrcoejesscet.dorogn/,2a4ccMesasyed20o2n1)2[415M] awyer2e02u1s)ed[1.5T] hweefroeuursdeadt.aTsehtes fwoeurreddaetraisveetds
wfreorme dtweroivdeidffefrreonmt ttewchondoilfofegrieens:t Atefcfyhmnoeltorgixieasn:dAAffygmileentrt.ixToanpdroAcegsisleAntf.fyTmoeptrrioxceCsEsLAfifflyes-,
mweetruisxeCdEthLef‘iolelisg, ow’ e[1u6s]epdacthkeag‘oe.liTghoe
And then the correct text

Same part of tekst extracted with other tool:

Figure 1. Flowchart of the proposed approach. After merging the four datasets, we implemented a differential expression
analysis and a feature selection procedure. Then, we developed a Cluster analysis to evaluate dataset homogeneity, and
finally we selected the best learning model to classify the samples into the three groups.
2.3. Data Pre-Processing
To read and preprocess raw data, different packages in the framework R
(https://www.r-project.org/, accessed on 24 May 2021) [15] were used. The four datasets were derived from two different technologies: Affymetrix and Agilent. To process Affy-metrix CEL files, we used the ‘oligo’ [16] package.

@kwhkim
Copy link

kwhkim commented Sep 6, 2022

@KaradasV3 I am looking for the alternative for the same reason. Can you tell me what is the other tool you used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants