Skip to content

Swas99/TA_FleschKincaidScore

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP PROJECT: Calculates content word(Adj, Noun, Verb, Advb) count; Lexical diversity through MLTD index score; and Content Diversity
Flesch Kincaid PROJECT: Calculates FK/Readability Score

Results of Flesch Kincaid Experiment

Country Grade Subject FK_1_FK_SCORE FK_2_FK_SCORE FK_3_FK_SCORE
IN 1 English 78.7564 78.6004938 85.6429158
IN 2 English 78.75034 79.16481397 87.25040766
IN 3 English 81.13185 80.12721863 87.4707646
IN 3 EVS 79.773674 76.68253131 79.89474519
IN 4 English 75.55459 78.5118645 87.1746998
IN 4 EVS 77.393875 76.97076949 81.55813931
IN 5 English 72.50561 86.85849504 99.87443113
IN 5 EVS 78.00329 77.89676758 83.27548355
IN 6 English 73.184364 72.40202305 82.76578546
IN 6 Civics 61.238235 61.62235488 65.06695025
IN 6 Geography 74.89401 67.694 72.63284615
IN 6 History 62.610245 63.35347744 70.45513174
IN 6 Science 69.58013 70.45118193 75.51216795
IN 7 English 70.3413 83.57279795 108.7149643
IN 7 Civics 57.874634 57.09877892 63.30081326
IN 7 Geography 58.7731 107.3990657 27.59568804
IN 7 Science 60.962463 62.62396532 70.4113148
IN 8 English 69.28932 69.68952256 78.99708142
IN 8 Civics 45.01239 46.46991349 49.88488596
IN 8 Geography NaN NaN NaN
IN 8 Science NaN NaN NaN
IN 9 English 65.769165 65.05337103 75.26529366
IN 9 Civics 50.020126 50.11871466 54.29702858
IN 9 Geography 77.905014 119.19 139.325
IN 9 Science NaN NaN NaN
IN 9 Economics 47.612244 49.75904821 55.49940093
IN 9 History 49.506897 50.79922903 57.16043564
IN 10 English 56.620834 52.90289317 63.54353948
IN 10 Civics 48.900223 50.1559527 54.75700409
IN 10 Science NaN NaN NaN
IN 10 Economics 48.250137 51.74598801 58.07641945
IN 10 History 44.2041 45.63070733 52.77843976
US 1 clean 61.82631 61.0796291 65.97500644
US 2 clean 66.78671 65.05292409 67.8154142
US 3 clean NaN NaN NaN
US 4 clean 59.779907 58.67125661 61.52595357
US 5 clean 55.95436 56.30388241 61.94939398
US 6 Geography 53.71678 53.50150528 58.66767653
US 7 History 53.410324 53.30423631 58.48332796
US 8 History 45.15564 45.82171003 51.26182277
US 10 Civics 39.93254 39.32387567 45.54257897
US 10 Economics 45.460953 45.01303514 49.56973716
US 10 US History 39.814484 40.27068718 45.68215252
US 10 World History 40.877853 42.68998621 50.01773736
IN 1 ALL 78.56155 78.6004938 85.6429158
IN 2 ALL 78.6005 79.16481397 87.25040766
IN 3 ALL 80.27481 77.97066607 82.71232665
IN 4 ALL 76.737236 77.48481761 83.4199491
IN 5 ALL 76.66155 81.29792901 89.16297183
IN 6 ALL 66.82397 67.31020716 73.3904691
IN 7 ALL 62.718765 79.0550348 103.3503081
IN 8 ALL 58.98477 59.86673964 67.05808622
IN 9 ALL 53.064407 53.73544249 60.49609918
IN 10 ALL 46.762863 48.52968506 54.95003176
US 1 ALL 61.82631 61.0796291 65.97500644
US 2 ALL 66.78671 65.05292409 67.8154142
US 3 ALL NaN NaN NaN
US 4 ALL 59.779907 58.67125661 61.52595357
US 5 ALL 55.95436 56.30388241 61.94939398
US 6 ALL 53.71678 53.50150528 58.66767653
US 7 ALL 53.410324 53.30423631 58.48332796
US 8 ALL 45.15564 45.82171003 51.26182277
US 10 ALL 41.6243 42.19290697 48.32529647

---------------------------------------------------------------------------------------------------------------------------------------- To run Flesch Kincaid PROJECT: 1. Download/Clone the project 2. Open in eclipse 4. Change all path variables in "DriverClass.java" & "CleanData.java" (changing value of 'projectPath' should be sufficient) 4. Run CleanData, if required 5. Run DriverClass (make sure output file path exists)

Instructions to run the NLP Project: To run this project first download Stanford CoreNLP from https://stanfordnlp.github.io/CoreNLP/download.html

Start the NLP Server: cd path/stanford-corenlp-full-2018-10-05 java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -annotators "tokenize,ssplit,pos,lemma,parse,sentiment" -port 9000 -timeout 30000

Python Install Anaconda https://www.anaconda.com/distribution/

Anaconda Prompt

Create a Virtual ENV for Python 3 conda create -n yourenvname python=x.x anaconda

Activate source activate yourenvname Ref: https://uoa-eresearch.github.io/eresearch-cookbook/recipe/2014/11/20/conda/

Install SNLP Library pip install stanfordcorenlp

N.B. The path "os.walk( "C:/Users/souro/OneDrive/Desktop/workstation/Text Analytics/CleanTextFiles")" mentioned here is the absolute directory structure for the clean text files to be processed. You can put them in any folder structure inside this path.


Results of NLP:

runfile('C:/Users/souro/Projects/TA_FleschKincaidScore/NLP/NlpAnalyser.py', wdir='C:/Users/souro/Projects/TA_FleschKincaidScore/NLP')
C:/Users/souro/OneDrive/Desktop/workstation/Text Analytics/CleanTextFiles/Class A/Subject A/2.txt
MLTD = 55.90109533994426
Factors = 169.54945054945054, word count = 9478, noun count = 2421, verb count = 1392, adjective count = 805, adverb count = 292, foreign word count = 12, content diversity = 0.29063136456211813
Verb-Noun Ratio = 1.7915632754342432
 
C:/Users/souro/OneDrive/Desktop/workstation/Text Analytics/CleanTextFiles/Class A/Subject B/1.txt
MLTD = 63.73662001017274
Factors = 103.88062622309198, word count = 6621, noun count = 1691, verb count = 941, adjective count = 541, adverb count = 292, foreign word count = 7, content diversity = 0.3217893217893218
Verb-Noun Ratio = 1.9964285714285714
 
http://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests

Results Of FLesch Kinkaid Exp:

![](_results/results.csv)
C:\USERS\SWASTIK\DESKTOP\MASTERSDEGREE_CS\SEMESTER_2\TEXTANALYTICS\PROJECT\US TEXTBOOKS\10\10TH_FULL.TXT
FleschKincaid Implementation 1:
--------------------------------------------------------------
Stats:[words: 137258, sentences: 8015, text: 0, blank: 0, syllables: 233625, complex: 26197]
calcFog: 14.484439
calcKincaid: 11.173426
wordsPerSentence: 17.125141
percentComplexWords: 19.085957
syllablesPerWords: 1.7020866
Flesch Score: 45.456467


FleschKincaid Implementation 2:
----------------------------------
Flesch Score: 47.92764487521539

FleschKincaid Implementation 3(StanfordNLP_Lexer):
----------------------------------
Flesch Score: 53.96019238193372

***********************

C:\USERS\SWASTIK\DESKTOP\MASTERSDEGREE_CS\SEMESTER_2\TEXTANALYTICS\PROJECT\INDIANTEXTBOOKS\10\10TH_FULL.TXT
FleschKincaid Implementation 1:
--------------------------------------------------------------
Stats:[words: 111463, sentences: 6101, text: 0, blank: 0, syllables: 192184, complex: 22524]
calcFog: 15.390892
calcKincaid: 11.880661
wordsPerSentence: 18.269629
percentComplexWords: 20.207602
syllablesPerWords: 1.7241955
Flesch Score: 42.424408


FleschKincaid Implementation 2:
----------------------------------
Flesch Score: 43.499227290521006

FleschKincaid Implementation 3(StanfordNLP_Lexer):
----------------------------------
Flesch Score: 49.34652862764942

***********************

C:\USERS\SWASTIK\DESKTOP\MASTERSDEGREE_CS\SEMESTER_2\TEXTANALYTICS\PROJECT\US TEXTBOOKS\10\CIVICS\CLEAN\FULL.TXT
FleschKincaid Implementation 1:
--------------------------------------------------------------
Stats:[words: 29955, sentences: 1534, text: 0, blank: 0, syllables: 51845, complex: 6244]
calcFog: 16.148792
calcKincaid: 12.448681
wordsPerSentence: 19.52738
percentComplexWords: 20.844599
syllablesPerWords: 1.7307628
Flesch Score: 40.59218


FleschKincaid Implementation 2:
----------------------------------
Flesch Score: 40.67393698419058

FleschKincaid Implementation 3(StanfordNLP_Lexer):
----------------------------------
Flesch Score: 46.32944858413444

***********************

C:\USERS\SWASTIK\DESKTOP\MASTERSDEGREE_CS\SEMESTER_2\TEXTANALYTICS\PROJECT\US TEXTBOOKS\10\ECONOMICS\CLEAN\FULL.TXT
FleschKincaid Implementation 1:
--------------------------------------------------------------
Stats:[words: 31271, sentences: 1731, text: 0, blank: 0, syllables: 52613, complex: 5854]
calcFog: 14.7142
calcKincaid: 11.308786
wordsPerSentence: 18.06528
percentComplexWords: 18.72022
syllablesPerWords: 1.6824853
Flesch Score: 46.160492


FleschKincaid Implementation 2:
----------------------------------
Flesch Score: 46.38033469265342

FleschKincaid Implementation 3(StanfordNLP_Lexer):
----------------------------------
Flesch Score: 50.665430765669285

***********************

C:\USERS\SWASTIK\DESKTOP\MASTERSDEGREE_CS\SEMESTER_2\TEXTANALYTICS\PROJECT\US TEXTBOOKS\10\WORLD HISTORY\CLEAN\FULL.TXT
FleschKincaid Implementation 1:
--------------------------------------------------------------
Stats:[words: 50237, sentences: 2836, text: 0, blank: 0, syllables: 87726, complex: 10426]
calcFog: 15.387064
calcKincaid: 11.924137
wordsPerSentence: 17.714033
percentComplexWords: 20.753628
syllablesPerWords: 1.7462428
Flesch Score: 41.123123


FleschKincaid Implementation 2:
----------------------------------
Flesch Score: 43.27081832649341

FleschKincaid Implementation 3(StanfordNLP_Lexer):
----------------------------------
Flesch Score: 50.245411561573846

***********************

C:\USERS\SWASTIK\DESKTOP\MASTERSDEGREE_CS\SEMESTER_2\TEXTANALYTICS\PROJECT\INDIANTEXTBOOKS\10\CIVICS\CLEAN\FULL.TXT
FleschKincaid Implementation 1:
--------------------------------------------------------------
Stats:[words: 36103, sentences: 2258, text: 0, blank: 0, syllables: 62280, complex: 7385]
calcFog: 14.577716
calcKincaid: 11.001442
wordsPerSentence: 15.988928
percentComplexWords: 20.455364
syllablesPerWords: 1.7250644
Flesch Score: 44.665802


FleschKincaid Implementation 2:
----------------------------------
Flesch Score: 47.011660348410516

FleschKincaid Implementation 3(StanfordNLP_Lexer):
----------------------------------
Flesch Score: 51.33794058592589

***********************

C:\USERS\SWASTIK\DESKTOP\MASTERSDEGREE_CS\SEMESTER_2\TEXTANALYTICS\PROJECT\INDIANTEXTBOOKS\10\HISTORY\CLEAN\FULL.TXT
FleschKincaid Implementation 1:
--------------------------------------------------------------
Stats:[words: 66324, sentences: 3804, text: 0, blank: 0, syllables: 113595, complex: 12583]
calcFog: 14.56294
calcKincaid: 11.419975
wordsPerSentence: 17.435331
percentComplexWords: 18.972017
syllablesPerWords: 1.7127284
Flesch Score: 44.241333


FleschKincaid Implementation 2:
----------------------------------
Flesch Score: 45.847846045732716

FleschKincaid Implementation 3(StanfordNLP_Lexer):
----------------------------------
Flesch Score: 52.85156426012828

***********************

C:\USERS\SWASTIK\DESKTOP\MASTERSDEGREE_CS\SEMESTER_2\TEXTANALYTICS\PROJECT\INDIANTEXTBOOKS\10\ECONOMICS\CLEAN\FULL.TXT
FleschKincaid Implementation 1:
--------------------------------------------------------------
Stats:[words: 34831, sentences: 1953, text: 0, blank: 0, syllables: 57750, complex: 6229]
calcFog: 14.287244
calcKincaid: 10.929968
wordsPerSentence: 17.834614
percentComplexWords: 17.883495
syllablesPerWords: 1.6580058
Flesch Score: 48.46559


FleschKincaid Implementation 2:
----------------------------------
Flesch Score: 52.59889991492061

FleschKincaid Implementation 3(StanfordNLP_Lexer):
----------------------------------
Flesch Score: 58.56803755949679

***********************

C:\USERS\SWASTIK\DESKTOP\MASTERSDEGREE_CS\SEMESTER_2\TEXTANALYTICS\PROJECT\US TEXTBOOKS\6\GEOGRAPHY\CLEAN\FULL.TXT
FleschKincaid Implementation 1:
--------------------------------------------------------------
Stats:[words: 34127, sentences: 2105, text: 0, blank: 0, syllables: 54968, complex: 5428]
calcFog: 12.847058
calcKincaid: 9.738953
wordsPerSentence: 16.21235
percentComplexWords: 15.905295
syllablesPerWords: 1.6106895
Flesch Score: 54.115143


FleschKincaid Implementation 2:
----------------------------------
Flesch Score: 54.70419580161195

FleschKincaid Implementation 3(StanfordNLP_Lexer):
----------------------------------
Flesch Score: 59.01602051006759

***********************

C:\USERS\SWASTIK\DESKTOP\MASTERSDEGREE_CS\SEMESTER_2\TEXTANALYTICS\PROJECT\INDIANTEXTBOOKS\6\GEOGRAPHY\CLEAN\FULL.TXT
FleschKincaid Implementation 1:
--------------------------------------------------------------
Stats:[words: 14869, sentences: 949, text: 0, blank: 0, syllables: 22301, complex: 1848]
calcFog: 11.2386465
calcKincaid: 8.218565
wordsPerSentence: 15.668072
percentComplexWords: 12.428543
syllablesPerWords: 1.4998319
Flesch Score: 64.046135


FleschKincaid Implementation 2:
----------------------------------
Flesch Score: 69.23944981644134

FleschKincaid Implementation 3(StanfordNLP_Lexer):
----------------------------------
Flesch Score: 76.58229677419358

***********************

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published