Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch Accumulo into multiple modules, one per minor release #1034

Merged
merged 3 commits into from
Oct 8, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,11 @@ jdk:
- oraclejdk8
- openjdk7

addons:
hosts:
- myshorthost
hostname: myshorthost

install: mvn install -q -DskipTests=true

script: mvn test -q
Expand All @@ -35,5 +40,5 @@ services:
# - riak


# Use the Container based infrastructure.
sudo: false
# Can't use container based infra because of hosts/hostname
sudo: true
118 changes: 118 additions & 0 deletions accumulo1.6/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
<!--
Copyright (c) 2015 YCSB contributors. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you
may not use this file except in compliance with the License. You
may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied. See the License for the specific language governing
permissions and limitations under the License. See accompanying
LICENSE file.
-->

## Quick Start

This section describes how to run YCSB on [Accumulo](https://accumulo.apache.org/).

### 1. Start Accumulo

See the [Accumulo Documentation](https://accumulo.apache.org/1.6/accumulo_user_manual.html#_installation)
for details on installing and running Accumulo.

Before running the YCSB test you must create the Accumulo table. Again see the
[Accumulo Documentation](https://accumulo.apache.org/1.6/accumulo_user_manual.html#_basic_administration)
for details. The default table name is `ycsb`.

### 2. Set Up YCSB

Git clone YCSB and compile:

git clone https://github.com/brianfrankcooper/YCSB.git
cd YCSB
mvn -pl com.yahoo.ycsb:aerospike-binding -am clean package

### 3. Create the Accumulo table

By default, YCSB uses a table with the name "usertable". Users must create this table before loading
data into Accumulo. For maximum Accumulo performance, the Accumulo table must be pre-split. A simple
Ruby script, based on the HBase README, can generate adequate split-point. 10's of Tablets per
TabletServer is a good starting point. Unless otherwise specified, the following commands should run
on any version of Accumulo.

$ echo 'num_splits = 20; puts (1..num_splits).map {|i| "user#{1000+i*(9999-1000)/num_splits}"}' | ruby > /tmp/splits.txt
$ accumulo shell -u <user> -p <password> -e "createtable usertable"
$ accumulo shell -u <user> -p <password> -e "addsplits -t usertable -sf /tmp/splits.txt"
$ accumulo shell -u <user> -p <password> -e "config -t usertable -s table.cache.block.enable=true"

Additionally, there are some other configuration properties which can increase performance. These
can be set on the Accumulo table via the shell after it is created. Setting the table durability
to `flush` relaxes the constraints on data durability during hard power-outages (avoids calls
to fsync). Accumulo defaults table compression to `gzip` which is not particularly fast; `snappy`
is a faster and similarly-efficient option. The mutation queue property controls how many writes
that Accumulo will buffer in memory before performing a flush; this property should be set relative
to the amount of JVM heap the TabletServers are given.

Please note that the `table.durability` and `tserver.total.mutation.queue.max` properties only
exists for >=Accumulo-1.7. There are no concise replacements for these properties in earlier versions.

accumulo> config -s table.durability=flush
accumulo> config -s tserver.total.mutation.queue.max=256M
accumulo> config -t usertable -s table.file.compress.type=snappy

On repeated data loads, the following commands may be helpful to re-set the state of the table quickly.

accumulo> createtable tmp --copy-splits usertable --copy-config usertable
accumulo> deletetable --force usertable
accumulo> renametable tmp usertable
accumulo> compact --wait -t accumulo.metadata

### 4. Load Data and Run Tests

Load the data:

./bin/ycsb load accumulo1.6 -s -P workloads/workloada \
-p accumulo.zooKeepers=localhost \
-p accumulo.columnFamily=ycsb \
-p accumulo.instanceName=ycsb \
-p accumulo.username=user \
-p accumulo.password=supersecret \
> outputLoad.txt

Run the workload test:

./bin/ycsb run accumulo1.6 -s -P workloads/workloada \
-p accumulo.zooKeepers=localhost \
-p accumulo.columnFamily=ycsb \
-p accumulo.instanceName=ycsb \
-p accumulo.username=user \
-p accumulo.password=supersecret \
> outputLoad.txt

## Accumulo Configuration Parameters

- `accumulo.zooKeepers`
- The Accumulo cluster's [zookeeper servers](https://accumulo.apache.org/1.6/accumulo_user_manual.html#_connecting).
- Should contain a comma separated list of of hostname or hostname:port values.
- No default value.

- `accumulo.columnFamily`
- The name of the column family to use to store the data within the table.
- No default value.

- `accumulo.instanceName`
- Name of the Accumulo [instance](https://accumulo.apache.org/1.6/accumulo_user_manual.html#_connecting).
- No default value.

- `accumulo.username`
- The username to use when connecting to Accumulo.
- No default value.

- `accumulo.password`
- The password for the user connecting to Accumulo.
- No default value.

8 changes: 4 additions & 4 deletions accumulo/pom.xml → accumulo1.6/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ LICENSE file.
<version>0.14.0-SNAPSHOT</version>
<relativePath>../binding-parent</relativePath>
</parent>
<artifactId>accumulo-binding</artifactId>
<name>Accumulo DB Binding</name>
<artifactId>accumulo1.6-binding</artifactId>
<name>Accumulo 1.6 DB Binding</name>
<properties>
<!-- This should match up to the one from your Accumulo version -->
<hadoop.version>2.2.0</hadoop.version>
Expand All @@ -37,7 +37,7 @@ LICENSE file.
<dependency>
<groupId>org.apache.accumulo</groupId>
<artifactId>accumulo-core</artifactId>
<version>${accumulo.version}</version>
<version>${accumulo.1.6.version}</version>
</dependency>
<!-- Needed for hadoop.io.Text :( -->
<dependency>
Expand Down Expand Up @@ -66,7 +66,7 @@ LICENSE file.
<dependency>
<groupId>org.apache.accumulo</groupId>
<artifactId>accumulo-minicluster</artifactId>
<version>${accumulo.version}</version>
<version>${accumulo.1.6.version}</version>
<scope>test</scope>
</dependency>
<!-- needed directly only in test, but transitive
Expand Down
4 changes: 2 additions & 2 deletions accumulo/README.md → accumulo1.7/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ On repeated data loads, the following commands may be helpful to re-set the stat

Load the data:

./bin/ycsb load accumulo -s -P workloads/workloada \
./bin/ycsb load accumulo1.7 -s -P workloads/workloada \
-p accumulo.zooKeepers=localhost \
-p accumulo.columnFamily=ycsb \
-p accumulo.instanceName=ycsb \
Expand All @@ -85,7 +85,7 @@ Load the data:

Run the workload test:

./bin/ycsb run accumulo -s -P workloads/workloada \
./bin/ycsb run accumulo1.7 -s -P workloads/workloada \
-p accumulo.zooKeepers=localhost \
-p accumulo.columnFamily=ycsb \
-p accumulo.instanceName=ycsb \
Expand Down
91 changes: 91 additions & 0 deletions accumulo1.7/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Copyright (c) 2011 YCSB++ project, 2014 - 2016 YCSB contributors.
All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you
may not use this file except in compliance with the License. You
may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied. See the License for the specific language governing
permissions and limitations under the License. See accompanying
LICENSE file.
-->

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.yahoo.ycsb</groupId>
<artifactId>binding-parent</artifactId>
<version>0.14.0-SNAPSHOT</version>
<relativePath>../binding-parent</relativePath>
</parent>
<artifactId>accumulo1.7-binding</artifactId>
<name>Accumulo 1.7 DB Binding</name>
<properties>
<!-- This should match up to the one from your Accumulo version -->
<hadoop.version>2.2.0</hadoop.version>
<!-- Tests do not run on jdk9 -->
<skipJDK9Tests>true</skipJDK9Tests>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.accumulo</groupId>
<artifactId>accumulo-core</artifactId>
<version>${accumulo.1.7.version}</version>
</dependency>
<!-- Needed for hadoop.io.Text :( -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
<exclusions>
<exclusion>
<groupId>jdk.tools</groupId>
<artifactId>jdk.tools</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.yahoo.ycsb</groupId>
<artifactId>core</artifactId>
<version>${project.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.accumulo</groupId>
<artifactId>accumulo-minicluster</artifactId>
<version>${accumulo.1.7.version}</version>
<scope>test</scope>
</dependency>
<!-- needed directly only in test, but transitive
at runtime for accumulo, hadoop, and thrift. -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.13</version>
</dependency>
</dependencies>
<build>
<testResources>
<testResource>
<directory>../workloads</directory>
<targetPath>workloads</targetPath>
</testResource>
<testResource>
<directory>src/test/resources</directory>
</testResource>
</testResources>
</build>
</project>
44 changes: 44 additions & 0 deletions accumulo1.7/src/main/conf/accumulo.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Copyright 2014 Cloudera, Inc. or its affiliates. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must be from before we switched to the "YCSB contributors" language. not sure if copying it into a new directory is a creative act worthy of updating though.

#
# Licensed under the Apache License, Version 2.0 (the "License"); you
# may not use this file except in compliance with the License. You
# may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied. See the License for the specific language governing
# permissions and limitations under the License. See accompanying
# LICENSE file.
#
# Sample Accumulo configuration properties
#
# You may either set properties here or via the command line.
#

# This will influence the keys we write
accumulo.columnFamily=YCSB

# This should be set based on your Accumulo cluster
#accumulo.instanceName=ExampleInstance

# Comma separated list of host:port tuples for the ZooKeeper quorum used
# by your Accumulo cluster
#accumulo.zooKeepers=zoo1.example.com:2181,zoo2.example.com:2181,zoo3.example.com:2181

# This user will need permissions on the table YCSB works against
#accumulo.username=ycsb
#accumulo.password=protectyaneck

# Controls how long our client writer will wait to buffer more data
# measured in milliseconds
accumulo.batchWriterMaxLatency=30000

# Controls how much data our client will attempt to buffer before sending
# measured in bytes
accumulo.batchWriterSize=100000

# Controls how many worker threads our client will use to parallelize writes
accumulo.batchWriterThreads=1
Loading