NLP PROJECT: Calculates content word(Adj, Noun, Verb, Advb) count; Lexical diversity through MLTD index score; and Content Diversity
Flesch Kincaid PROJECT: Calculates FK/Readability Score
Country | Grade | Subject | FK_1_FK_SCORE | FK_2_FK_SCORE | FK_3_FK_SCORE |
---|---|---|---|---|---|
IN | 1 | English | 78.7564 | 78.6004938 | 85.6429158 |
IN | 2 | English | 78.75034 | 79.16481397 | 87.25040766 |
IN | 3 | English | 81.13185 | 80.12721863 | 87.4707646 |
IN | 3 | EVS | 79.773674 | 76.68253131 | 79.89474519 |
IN | 4 | English | 75.55459 | 78.5118645 | 87.1746998 |
IN | 4 | EVS | 77.393875 | 76.97076949 | 81.55813931 |
IN | 5 | English | 72.50561 | 86.85849504 | 99.87443113 |
IN | 5 | EVS | 78.00329 | 77.89676758 | 83.27548355 |
IN | 6 | English | 73.184364 | 72.40202305 | 82.76578546 |
IN | 6 | Civics | 61.238235 | 61.62235488 | 65.06695025 |
IN | 6 | Geography | 74.89401 | 67.694 | 72.63284615 |
IN | 6 | History | 62.610245 | 63.35347744 | 70.45513174 |
IN | 6 | Science | 69.58013 | 70.45118193 | 75.51216795 |
IN | 7 | English | 70.3413 | 83.57279795 | 108.7149643 |
IN | 7 | Civics | 57.874634 | 57.09877892 | 63.30081326 |
IN | 7 | Geography | 58.7731 | 107.3990657 | 27.59568804 |
IN | 7 | Science | 60.962463 | 62.62396532 | 70.4113148 |
IN | 8 | English | 69.28932 | 69.68952256 | 78.99708142 |
IN | 8 | Civics | 45.01239 | 46.46991349 | 49.88488596 |
IN | 8 | Geography | NaN | NaN | NaN |
IN | 8 | Science | NaN | NaN | NaN |
IN | 9 | English | 65.769165 | 65.05337103 | 75.26529366 |
IN | 9 | Civics | 50.020126 | 50.11871466 | 54.29702858 |
IN | 9 | Geography | 77.905014 | 119.19 | 139.325 |
IN | 9 | Science | NaN | NaN | NaN |
IN | 9 | Economics | 47.612244 | 49.75904821 | 55.49940093 |
IN | 9 | History | 49.506897 | 50.79922903 | 57.16043564 |
IN | 10 | English | 56.620834 | 52.90289317 | 63.54353948 |
IN | 10 | Civics | 48.900223 | 50.1559527 | 54.75700409 |
IN | 10 | Science | NaN | NaN | NaN |
IN | 10 | Economics | 48.250137 | 51.74598801 | 58.07641945 |
IN | 10 | History | 44.2041 | 45.63070733 | 52.77843976 |
US | 1 | clean | 61.82631 | 61.0796291 | 65.97500644 |
US | 2 | clean | 66.78671 | 65.05292409 | 67.8154142 |
US | 3 | clean | NaN | NaN | NaN |
US | 4 | clean | 59.779907 | 58.67125661 | 61.52595357 |
US | 5 | clean | 55.95436 | 56.30388241 | 61.94939398 |
US | 6 | Geography | 53.71678 | 53.50150528 | 58.66767653 |
US | 7 | History | 53.410324 | 53.30423631 | 58.48332796 |
US | 8 | History | 45.15564 | 45.82171003 | 51.26182277 |
US | 10 | Civics | 39.93254 | 39.32387567 | 45.54257897 |
US | 10 | Economics | 45.460953 | 45.01303514 | 49.56973716 |
US | 10 | US History | 39.814484 | 40.27068718 | 45.68215252 |
US | 10 | World History | 40.877853 | 42.68998621 | 50.01773736 |
IN | 1 | ALL | 78.56155 | 78.6004938 | 85.6429158 |
IN | 2 | ALL | 78.6005 | 79.16481397 | 87.25040766 |
IN | 3 | ALL | 80.27481 | 77.97066607 | 82.71232665 |
IN | 4 | ALL | 76.737236 | 77.48481761 | 83.4199491 |
IN | 5 | ALL | 76.66155 | 81.29792901 | 89.16297183 |
IN | 6 | ALL | 66.82397 | 67.31020716 | 73.3904691 |
IN | 7 | ALL | 62.718765 | 79.0550348 | 103.3503081 |
IN | 8 | ALL | 58.98477 | 59.86673964 | 67.05808622 |
IN | 9 | ALL | 53.064407 | 53.73544249 | 60.49609918 |
IN | 10 | ALL | 46.762863 | 48.52968506 | 54.95003176 |
US | 1 | ALL | 61.82631 | 61.0796291 | 65.97500644 |
US | 2 | ALL | 66.78671 | 65.05292409 | 67.8154142 |
US | 3 | ALL | NaN | NaN | NaN |
US | 4 | ALL | 59.779907 | 58.67125661 | 61.52595357 |
US | 5 | ALL | 55.95436 | 56.30388241 | 61.94939398 |
US | 6 | ALL | 53.71678 | 53.50150528 | 58.66767653 |
US | 7 | ALL | 53.410324 | 53.30423631 | 58.48332796 |
US | 8 | ALL | 45.15564 | 45.82171003 | 51.26182277 |
US | 10 | ALL | 41.6243 | 42.19290697 | 48.32529647 |
---------------------------------------------------------------------------------------------------------------------------------------- To run Flesch Kincaid PROJECT: 1. Download/Clone the project 2. Open in eclipse 4. Change all path variables in "DriverClass.java" & "CleanData.java" (changing value of 'projectPath' should be sufficient) 4. Run CleanData, if required 5. Run DriverClass (make sure output file path exists)
Instructions to run the NLP Project: To run this project first download Stanford CoreNLP from https://stanfordnlp.github.io/CoreNLP/download.html
Start the NLP Server: cd path/stanford-corenlp-full-2018-10-05 java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -annotators "tokenize,ssplit,pos,lemma,parse,sentiment" -port 9000 -timeout 30000
Python Install Anaconda https://www.anaconda.com/distribution/
Anaconda Prompt
Create a Virtual ENV for Python 3 conda create -n yourenvname python=x.x anaconda
Activate source activate yourenvname Ref: https://uoa-eresearch.github.io/eresearch-cookbook/recipe/2014/11/20/conda/
Install SNLP Library pip install stanfordcorenlp
N.B. The path "os.walk( "C:/Users/souro/OneDrive/Desktop/workstation/Text Analytics/CleanTextFiles")" mentioned here is the absolute directory structure for the clean text files to be processed. You can put them in any folder structure inside this path.
Results of NLP:
runfile('C:/Users/souro/Projects/TA_FleschKincaidScore/NLP/NlpAnalyser.py', wdir='C:/Users/souro/Projects/TA_FleschKincaidScore/NLP')
C:/Users/souro/OneDrive/Desktop/workstation/Text Analytics/CleanTextFiles/Class A/Subject A/2.txt
MLTD = 55.90109533994426
Factors = 169.54945054945054, word count = 9478, noun count = 2421, verb count = 1392, adjective count = 805, adverb count = 292, foreign word count = 12, content diversity = 0.29063136456211813
Verb-Noun Ratio = 1.7915632754342432
C:/Users/souro/OneDrive/Desktop/workstation/Text Analytics/CleanTextFiles/Class A/Subject B/1.txt
MLTD = 63.73662001017274
Factors = 103.88062622309198, word count = 6621, noun count = 1691, verb count = 941, adjective count = 541, adverb count = 292, foreign word count = 7, content diversity = 0.3217893217893218
Verb-Noun Ratio = 1.9964285714285714
http://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests
Results Of FLesch Kinkaid Exp:

C:\USERS\SWASTIK\DESKTOP\MASTERSDEGREE_CS\SEMESTER_2\TEXTANALYTICS\PROJECT\US TEXTBOOKS\10\10TH_FULL.TXT
FleschKincaid Implementation 1:
--------------------------------------------------------------
Stats:[words: 137258, sentences: 8015, text: 0, blank: 0, syllables: 233625, complex: 26197]
calcFog: 14.484439
calcKincaid: 11.173426
wordsPerSentence: 17.125141
percentComplexWords: 19.085957
syllablesPerWords: 1.7020866
Flesch Score: 45.456467
FleschKincaid Implementation 2:
----------------------------------
Flesch Score: 47.92764487521539
FleschKincaid Implementation 3(StanfordNLP_Lexer):
----------------------------------
Flesch Score: 53.96019238193372
***********************
C:\USERS\SWASTIK\DESKTOP\MASTERSDEGREE_CS\SEMESTER_2\TEXTANALYTICS\PROJECT\INDIANTEXTBOOKS\10\10TH_FULL.TXT
FleschKincaid Implementation 1:
--------------------------------------------------------------
Stats:[words: 111463, sentences: 6101, text: 0, blank: 0, syllables: 192184, complex: 22524]
calcFog: 15.390892
calcKincaid: 11.880661
wordsPerSentence: 18.269629
percentComplexWords: 20.207602
syllablesPerWords: 1.7241955
Flesch Score: 42.424408
FleschKincaid Implementation 2:
----------------------------------
Flesch Score: 43.499227290521006
FleschKincaid Implementation 3(StanfordNLP_Lexer):
----------------------------------
Flesch Score: 49.34652862764942
***********************
C:\USERS\SWASTIK\DESKTOP\MASTERSDEGREE_CS\SEMESTER_2\TEXTANALYTICS\PROJECT\US TEXTBOOKS\10\CIVICS\CLEAN\FULL.TXT
FleschKincaid Implementation 1:
--------------------------------------------------------------
Stats:[words: 29955, sentences: 1534, text: 0, blank: 0, syllables: 51845, complex: 6244]
calcFog: 16.148792
calcKincaid: 12.448681
wordsPerSentence: 19.52738
percentComplexWords: 20.844599
syllablesPerWords: 1.7307628
Flesch Score: 40.59218
FleschKincaid Implementation 2:
----------------------------------
Flesch Score: 40.67393698419058
FleschKincaid Implementation 3(StanfordNLP_Lexer):
----------------------------------
Flesch Score: 46.32944858413444
***********************
C:\USERS\SWASTIK\DESKTOP\MASTERSDEGREE_CS\SEMESTER_2\TEXTANALYTICS\PROJECT\US TEXTBOOKS\10\ECONOMICS\CLEAN\FULL.TXT
FleschKincaid Implementation 1:
--------------------------------------------------------------
Stats:[words: 31271, sentences: 1731, text: 0, blank: 0, syllables: 52613, complex: 5854]
calcFog: 14.7142
calcKincaid: 11.308786
wordsPerSentence: 18.06528
percentComplexWords: 18.72022
syllablesPerWords: 1.6824853
Flesch Score: 46.160492
FleschKincaid Implementation 2:
----------------------------------
Flesch Score: 46.38033469265342
FleschKincaid Implementation 3(StanfordNLP_Lexer):
----------------------------------
Flesch Score: 50.665430765669285
***********************
C:\USERS\SWASTIK\DESKTOP\MASTERSDEGREE_CS\SEMESTER_2\TEXTANALYTICS\PROJECT\US TEXTBOOKS\10\WORLD HISTORY\CLEAN\FULL.TXT
FleschKincaid Implementation 1:
--------------------------------------------------------------
Stats:[words: 50237, sentences: 2836, text: 0, blank: 0, syllables: 87726, complex: 10426]
calcFog: 15.387064
calcKincaid: 11.924137
wordsPerSentence: 17.714033
percentComplexWords: 20.753628
syllablesPerWords: 1.7462428
Flesch Score: 41.123123
FleschKincaid Implementation 2:
----------------------------------
Flesch Score: 43.27081832649341
FleschKincaid Implementation 3(StanfordNLP_Lexer):
----------------------------------
Flesch Score: 50.245411561573846
***********************
C:\USERS\SWASTIK\DESKTOP\MASTERSDEGREE_CS\SEMESTER_2\TEXTANALYTICS\PROJECT\INDIANTEXTBOOKS\10\CIVICS\CLEAN\FULL.TXT
FleschKincaid Implementation 1:
--------------------------------------------------------------
Stats:[words: 36103, sentences: 2258, text: 0, blank: 0, syllables: 62280, complex: 7385]
calcFog: 14.577716
calcKincaid: 11.001442
wordsPerSentence: 15.988928
percentComplexWords: 20.455364
syllablesPerWords: 1.7250644
Flesch Score: 44.665802
FleschKincaid Implementation 2:
----------------------------------
Flesch Score: 47.011660348410516
FleschKincaid Implementation 3(StanfordNLP_Lexer):
----------------------------------
Flesch Score: 51.33794058592589
***********************
C:\USERS\SWASTIK\DESKTOP\MASTERSDEGREE_CS\SEMESTER_2\TEXTANALYTICS\PROJECT\INDIANTEXTBOOKS\10\HISTORY\CLEAN\FULL.TXT
FleschKincaid Implementation 1:
--------------------------------------------------------------
Stats:[words: 66324, sentences: 3804, text: 0, blank: 0, syllables: 113595, complex: 12583]
calcFog: 14.56294
calcKincaid: 11.419975
wordsPerSentence: 17.435331
percentComplexWords: 18.972017
syllablesPerWords: 1.7127284
Flesch Score: 44.241333
FleschKincaid Implementation 2:
----------------------------------
Flesch Score: 45.847846045732716
FleschKincaid Implementation 3(StanfordNLP_Lexer):
----------------------------------
Flesch Score: 52.85156426012828
***********************
C:\USERS\SWASTIK\DESKTOP\MASTERSDEGREE_CS\SEMESTER_2\TEXTANALYTICS\PROJECT\INDIANTEXTBOOKS\10\ECONOMICS\CLEAN\FULL.TXT
FleschKincaid Implementation 1:
--------------------------------------------------------------
Stats:[words: 34831, sentences: 1953, text: 0, blank: 0, syllables: 57750, complex: 6229]
calcFog: 14.287244
calcKincaid: 10.929968
wordsPerSentence: 17.834614
percentComplexWords: 17.883495
syllablesPerWords: 1.6580058
Flesch Score: 48.46559
FleschKincaid Implementation 2:
----------------------------------
Flesch Score: 52.59889991492061
FleschKincaid Implementation 3(StanfordNLP_Lexer):
----------------------------------
Flesch Score: 58.56803755949679
***********************
C:\USERS\SWASTIK\DESKTOP\MASTERSDEGREE_CS\SEMESTER_2\TEXTANALYTICS\PROJECT\US TEXTBOOKS\6\GEOGRAPHY\CLEAN\FULL.TXT
FleschKincaid Implementation 1:
--------------------------------------------------------------
Stats:[words: 34127, sentences: 2105, text: 0, blank: 0, syllables: 54968, complex: 5428]
calcFog: 12.847058
calcKincaid: 9.738953
wordsPerSentence: 16.21235
percentComplexWords: 15.905295
syllablesPerWords: 1.6106895
Flesch Score: 54.115143
FleschKincaid Implementation 2:
----------------------------------
Flesch Score: 54.70419580161195
FleschKincaid Implementation 3(StanfordNLP_Lexer):
----------------------------------
Flesch Score: 59.01602051006759
***********************
C:\USERS\SWASTIK\DESKTOP\MASTERSDEGREE_CS\SEMESTER_2\TEXTANALYTICS\PROJECT\INDIANTEXTBOOKS\6\GEOGRAPHY\CLEAN\FULL.TXT
FleschKincaid Implementation 1:
--------------------------------------------------------------
Stats:[words: 14869, sentences: 949, text: 0, blank: 0, syllables: 22301, complex: 1848]
calcFog: 11.2386465
calcKincaid: 8.218565
wordsPerSentence: 15.668072
percentComplexWords: 12.428543
syllablesPerWords: 1.4998319
Flesch Score: 64.046135
FleschKincaid Implementation 2:
----------------------------------
Flesch Score: 69.23944981644134
FleschKincaid Implementation 3(StanfordNLP_Lexer):
----------------------------------
Flesch Score: 76.58229677419358
***********************