Skip to content

Building word2vec

linuxonz edited this page Jan 9, 2025 · 13 revisions

Building word2vec

The instructions provided below specify the steps to build word2vec version 0.1c on Linux on IBM Z for following distributions:

  • RHEL (8.8, 8.10, 9.2, 9.4, 9.5)
  • SLES 15 SP6
  • Ubuntu (20.04, 22.04, 24.04, 24.10)

General notes:

  • When following the steps below please use a standard permission user unless otherwise specified.

  • A directory /<source_root>/ will be referred to in these instructions, this is a temporary writable directory anywhere you'd like to place it.

Build word2vec

1. Install standard utilities, packages and platform specific dependencies

  • RHEL (8.8, 8.10, 9.2, 9.4, 9.5)

     sudo yum install -y gcc make wget tar unzip
  • SLES 15 SP6

     sudo zypper install -y gcc make wget tar unzip
  • Ubuntu (20.04, 22.04, 24.04, 24.10)

     sudo apt-get update
     sudo apt-get install -y gcc make wget tar unzip

2. Create a working directory and download word2vec source code

 cd $SOURCE_ROOT
 wget https://storage.googleapis.com/google-code-archive-source/v2/code.google.com/word2vec/source-archive.zip
 unzip source-archive.zip

3. Build word2vec

 cd word2vec/trunk
 make CFLAGS="-lm -pthread -O3 -Wall -funroll-loops"

4. Set environment variables

 export PATH=$PATH:$SOURCE_ROOT/word2vec/trunk

5. Test word2vec using demo scripts

 ./demo-word.sh
 ./demo-phrases.sh
_**Note:**_ Enter test corpus as input and get word vectors as output, e.g. Input=france

6. Run word2vec binary

 word2vec
_**Note:**_ The word2vec tool takes a text corpus as input and produces the word vectors as output.

References:

Clone this wiki locally