From 52601e48dd543db25ce5020aface25368a139765 Mon Sep 17 00:00:00 2001 From: Alexander Chalk Date: Fri, 28 Jun 2019 11:50:37 -0400 Subject: [PATCH] Add data fetching shellscript for fasttext --- .../cnn-text-classification/README.md | 2 +- .../get_fasttext_data.sh | 24 +++++++++++++++++++ 2 files changed, 25 insertions(+), 1 deletion(-) create mode 100755 contrib/clojure-package/examples/cnn-text-classification/get_fasttext_data.sh diff --git a/contrib/clojure-package/examples/cnn-text-classification/README.md b/contrib/clojure-package/examples/cnn-text-classification/README.md index cdae5ff0d308..8f8e6200ec7c 100644 --- a/contrib/clojure-package/examples/cnn-text-classification/README.md +++ b/contrib/clojure-package/examples/cnn-text-classification/README.md @@ -54,7 +54,7 @@ Using fastText instead of glove is fairly straightforward, as the pretrained emb Download the 'Simple English' pretrained wiki word vectors (text) from the fastText [site](https://fasttext.cc/docs/en/pretrained-vectors.html) and place them in the -`data/fasttext` directory. +`data/fasttext` directory. Alternatively just run `./get_fasttext_data.sh`. Then you can run training on a subset of examples through the repl using: ``` diff --git a/contrib/clojure-package/examples/cnn-text-classification/get_fasttext_data.sh b/contrib/clojure-package/examples/cnn-text-classification/get_fasttext_data.sh new file mode 100755 index 000000000000..2bfe96659402 --- /dev/null +++ b/contrib/clojure-package/examples/cnn-text-classification/get_fasttext_data.sh @@ -0,0 +1,24 @@ +#!/bin/bash + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +set -evx + +mkdir -p data/fasttext +cd data/fasttext +wget https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.simple.vec