Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
RunDevelopment committed May 5, 2020
0 parents commit 7596043
Show file tree
Hide file tree
Showing 39 changed files with 4,161 additions and 0 deletions.
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
bin
build
.gradle
.ideaout/
.idea/
.settings/
*.iml
.project
.classpath
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT LICENSE

Copyright (c) 2019 Webis group

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
174 changes: 174 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
# Netspeak 4 indexing

This project contains all necessities to create a new Netspeak 4 index.

This project is mainly intended for developers that want to build a new Netspeak 4 index from a given data set.


---

## Contributors

Michael Schmidt (2018 - 2020)

Martin Trenkmann (2008 - 2013)

Martin Potthast (2008 - 2020)

Benno Stein (2008 - 2020)



---

# Old Notes

% NETSPEAK 4 JAVA NOTES
% [email protected]
% November 22, 2013



Notation
--------

# <command> Does need admin permissions (sudo).
$ <command> Does not need admin permissions.


Project description
-------------------

<http://www.uni-weimar.de/en/media/chairs/webis/research/projects/netspeak/>


Library dependencies
--------------------

This Java project is a language binding for the C++ project netspeak4-application-cpp whose
implementation comes in form of a shared library (.so file). The present Java
application loads the library at runtime and invokes their native routines via
the Java Native Interface (JNI) method. Precompiled libraries for Ubuntu 10.04
and 12.04 can be found in the lib sub-directory of this project. The native
library itself has some dependencies you need to install as well. To do so run
the following script:

# <project>/build/install-dependencies.sh


Build and install the native library
------------------------------------

In the case that there is no precompiled native library available for your
platform, you need to compile the corresponding C++ project by yourself.

- Checkout netspeak4-application-cpp from webis CVS.
- Build target "Library" with Qt Creator IDE.

# cp <project>/lib/<arch>/<lib>.so /usr/lib


Load native library
-------------------

Set "-Djava.library.path=/usr/lib" as VM argument.


Build Netspeak from n-gram collection
-------------------------------------

To build Netspeak from a collection of n-grams you have to provide a dedicated
directory with one or more text files as input. Each of these files have to
list a number of n-grams together with their frequencies, one by line. The
format of a single line is defined as follows:

word_1 SPACE word_2 SPACE ... word_n TAB frequency

In words: Each line defines an n-gram with its frequency. The delimiter between
the n-gram and the frequency is a single tabulator ('\t'). The delimiter to
separate the n-gram's words is a single whitespace (' ').

Note: Follow this specification strictly to prevent parsing errors. In
particular, ensure the single `\t` delimiter between n-gram and frequency.


Getting Started
---------------

- `usage.NetspeakBuilderUsage.java` shows how to build Netspeak from a
collection of n-grams.
- `usage.NetspeakTerminal.java` runs a simple command line to search a Netspeak
instance interactively for testing purposes.
- `usage.NetspeakUsage.java` demonstrates how to search Netspeak in more detail
using the Request and Response objects.

In some cases, if your local hardware, storage space or operating system
(Netspeak runs only on Linux) does not fit, it might be necessary to setup
Netspeak running on a Linux server and to request that instance remotely.

For that reason build your Netspeak application as usual and run it as a Java
servlet, e.g. with Tomcat, using the project `netspeak4-server`. A running
Netspeak server can then be requested with `netspeak3-client-java` project from
any Java application.


Netspeak query language
-----------------------

The Netspeak query syntax as described here should be used as reference. There
might be other syntax information out there, e.g. at netspeak.org, which
provides some syntactical simplifications in form of easier to use wildcards or
operators. However, these modified syntaxes are just front-ends and do not work
with the original Netspeak interface. Here is the truth:

? is a placeholder for exactly one word and can be sequenced to search for
exaclty two, three, four ... words.

Example: how to ? this
-> how to use this
-> how to do this
-> how to cite this

* is a placeholder for zero or many words.

Example: see * works
-> see how it works
-> see if it works
-> see what works

[] compares options, i.e. it checks each word or phrase between these
brackets plus the so called empty word at that position in the query.

Example: it's [ great well "so good" ]
-> it's
-> it's great
-> it's well
-> it's so good

{} checks the order, i.e. it tries to find each permutation of the given
sequence of words or phrases at that position in the query.

Example: for { "very important people" only }
-> for very important people only
-> for only very important people

# searches for alternatives of the word following. This operator requests
the optional Netspeak hash-dictionary component and uses [] to compare
each retrieved alternative (except that the empty word is not checked).
The mapping from word to alternatives is completely up to the user when
building Netspeak, for netspeak.org we use this operator for a synonym
search providing the Wordnet dictionary.

Example: waiting for #response
-> waiting for response
-> waiting for answer
-> waiting for reply

You can combine the introduced wildcards and operators as you want, but with the
exception that you may not place any wildcard within bracket operators. Also
nested brackets are not allowed. As you can see in the examples above you can
quote phrases to be handled as one entity is `[]` and `{}`.



% Compile via: pandoc from.txt > to.html
92 changes: 92 additions & 0 deletions artifactory.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
// Fetch Artifactory publishing plugin
buildscript {
repositories {
jcenter()
}
dependencies {
classpath 'org.jfrog.buildinfo:build-info-extractor-gradle:4+'
}
}

// Apply plugins
apply plugin: 'maven-publish'
apply plugin: org.jfrog.gradle.plugin.artifactory.ArtifactoryPlugin

// Determine which repositories to pull from and publish to
def pullRelease = 'libs-release'
def pullSnapshot = 'libs-snapshot'
def pushRelease = 'libs-snapshot-webis-gradle'
def pushSnapshot = 'libs-release-webis-gradle'

if (project.ext.has("nonFree") && project.ext.get("nonFree")) {
pullRelease += '-nonfree'
pullSnapshot += '-nonfree'
pushRelease += '-nonfree'
pushSnapshot += '-nonfree'
}

repositories {
maven {
url = 'https://repo.webis.de/artifactory/' + pullRelease
credentials {
username = project.findProperty("artifactoryUsername") ?: ""
password = project.findProperty("artifactoryPassword") ?: ""
}
}
maven {
url = 'https://repo.webis.de/artifactory/' + pullSnapshot
credentials {
username = project.findProperty("artifactoryUsername") ?: ""
password = project.findProperty("artifactoryPassword") ?: ""
}
}
}

// Configure Artifactory remote
artifactory {
contextUrl = "https://repo.webis.de/artifactory"
publish {
repository {
repoKey = version.endsWith('SNAPSHOT') ? pushRelease : pushSnapshot
username = project.findProperty("artifactoryUsername") ?: ""
password = project.findProperty("artifactoryPassword") ?: ""
maven = true
}
defaults {
publications('mavenJava')
}
}
}

// Create tasks for generating source and JavaDoc JARs
task sourcesJar(type: Jar, dependsOn: classes) {
classifier = 'sources'
from sourceSets.main.allSource
}

task javadocJar(type: Jar, dependsOn: javadoc) {
classifier = 'javadoc'
from javadoc.destinationDir
}

artifacts {
archives javadocJar
archives sourcesJar
}

// Configure Maven Publishing Information
publishing {
publications {
mavenJava(MavenPublication) {
// Publish binary, source, and JavaDoc JARs
from components.java
artifact sourcesJar
artifact javadocJar

// Set POM definition
if (project.ext.has("pomDef")) {
pom project.ext.get("pomDef")
}
}
}
}
53 changes: 53 additions & 0 deletions build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
// Apply plugins
apply plugin: 'java'
apply plugin: 'jacoco'
apply plugin: 'application'

// Basic configuration and settings for all (sub-)projects
allprojects {
group = 'org.netspeak'
version = '1.0'
mainClassName = 'org.netspeak.usage.NetspeakTerminal'
sourceCompatibility = 1.8
targetCompatibility = 1.8

// Set source file encoding
compileJava.options.encoding = "UTF-8"
compileTestJava.options.encoding = "UTF-8"
javadoc.options.encoding = 'UTF-8'

// Declare global dependencies
dependencies {
compile group: 'org.netspeak', name: 'netspeak4-application-java', version: '1.0'
compile group: 'org.apache.commons', name: 'commons-compress', version: '1.19'

testImplementation 'junit:junit:4.12'
}

// Set MANIFEST.MF contents
jar {
manifest {
attributes('Main-Class': mainClassName)
}
}
}

// Set POM definition
project.ext.pomDef = {
name = 'Netspeak 4 stuff'
description = 'An application with lots of miscellaneous functionality related to Netspeak 4'
url = 'http://netspeak.org'
//licenses {
// license {
// name = 'The Apache License, Version 2.0'
// url = 'http://www.apache.org/licenses/LICENSE-2.0.txt'
// }
//}
organization {
name = 'Netspeak'
url = 'http://netspeak.org'
}
}

// Include Artifactory configuration
apply from: 'artifactory.gradle'
Binary file added gradle/wrapper/gradle-wrapper.jar
Binary file not shown.
6 changes: 6 additions & 0 deletions gradle/wrapper/gradle-wrapper.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#Thu Aug 02 10:12:04 CEST 2018
distributionBase=GRADLE_USER_HOME
distributionPath=wrapper/dists
zipStoreBase=GRADLE_USER_HOME
zipStorePath=wrapper/dists
distributionUrl=https\://services.gradle.org/distributions/gradle-4.9-all.zip
Loading

0 comments on commit 7596043

Please sign in to comment.