Skip to content

jatinmandav/Authorship-Attribution

Folders and files

NameName
Last commit message
Last commit date
Apr 5, 2019
Apr 12, 2019
Apr 21, 2019
Apr 14, 2019
Apr 14, 2019
Apr 14, 2019
Apr 5, 2019
Apr 21, 2019
Apr 9, 2019
Apr 12, 2019
Apr 14, 2019
Apr 5, 2019
Apr 21, 2019
Apr 8, 2019
Apr 21, 2019
Apr 21, 2019
Apr 21, 2019

Repository files navigation

Authorship Attribution

Text2Vector

Text2Vector generated word-embeddings for a given text.

Dependencies

  • FastText - pip3 install fasttext
  • NLTK - pip3 install nltk
  • NumPy - pip3 install numpy

Algorithm

Word embedding model - Skipgram - trained on raw text, with target size of 100-dim.

  • For every word in text
  • extract word-embedding from pre-trained model
  • Element-wise addition of word-embedding with earlier embedding
  • Averaging the word embeddings to obtain text embedding

Usage

Clone the repo, save word-embedding model from "Release" under the top-directory.

from text2vector import Text2Vector

path_to_word_embedding_model = 'skipgram-100/skipgram.bin'
text2vec = Text2Vector(path_to_word_embedding_model)
print('Generating embedding ..')

text = "Any text sample here!!!"

print(text2vec.convert(text))

"""
[ 0.44153227  0.12361106  0.00607821  0.53515722  0.28272    -0.11348521
 -0.22341401  0.16395177 -0.01178203  0.06159557  0.01883419 -0.12652569
 -0.15079305 -0.04731999  0.20319002 -0.05538354 -0.33085034  0.37050098
  0.09624304 -0.42489241 -0.12226875 -0.2626278  -0.06713643 -0.31778053
  0.34330154  0.27109923  0.15488822  0.17079256 -0.15065777  0.1636747
 -0.12178582  0.08251881 -0.21455672  0.47482748 -0.31790714 -0.12989535
  0.22780022 -0.08566074 -0.21855019 -0.31932005 -0.13109301  0.00146381
  0.26255177  0.33713142  0.19220728 -0.10542174 -0.37171204  0.03750945
  0.1633902   0.12948658 -0.19956239 -0.1510046   0.12710889 -0.10108179
 -0.07666321  0.42590444 -0.07935622  0.07618748 -0.11696555 -0.12906784
  0.07358533 -0.34065487  0.8037034  -0.02860275  0.06394197  0.23639533
 -0.2513136  -0.28632619  0.12479368 -0.24286127  0.05824339 -0.17754047
 -0.04253732  0.34774542 -0.10136852  0.44451588  0.07106352  0.62934348
  0.38469637  0.13533326  0.00748966  0.17116917  0.14566171 -0.01778589
  0.22916658  0.22703299  0.17693119 -0.1667371  -0.44423185  0.23692855
 -0.0115481   0.081972    0.04752844  0.45813912 -0.13476718  0.23231126
  0.06251153  0.07475965  0.25681703  0.02036907]
"""