Skip to content

Commit

Permalink
Remove all trailing whitespace across the repo.
Browse files Browse the repository at this point in the history
  • Loading branch information
Graham Christensen committed Mar 28, 2017
1 parent 7bea316 commit 4353730
Show file tree
Hide file tree
Showing 31 changed files with 291 additions and 293 deletions.
11 changes: 5 additions & 6 deletions doc/capacity_plan.rdoc
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ First you want to run the following create table statment on a database you woul
Next you want to fill out the \Jetpants configuration file (either <tt>/etc/jetpants.yaml</tt> or <tt>~/.jetpants.yaml</tt>). For example you configuration might look like this:

# ... rest of Jetpants config here

plugins:
capacity_plan:
critical_mount: 0.85
Expand Down Expand Up @@ -54,7 +54,7 @@ Next you want to is create a cron to capture the historical data
0 * * * * /your_bin_path/jetpants capacity_snapshot 2>&1 > /dev/null

Then you want create a cron that will email you the report everyday (if you want that)

0 10 * * * /your_bin_path/jetpants capacity_plan [email protected] 2>&1 > /dev/null

If you want the hardware stats part of the email you have to create a function in Jetpants.topology.machine_status_counts that returns a hash that will be used to output the email
Expand All @@ -67,11 +67,10 @@ Also you should have the pony gem installed

== USAGE:

If you want to run the capacity plan you can do
If you want to run the capacity plan you can do

jetpants capacity_plan

To capture a one off snapshot of your data usage

jetpants capacity_snapshot

jetpants capacity_snapshot
8 changes: 4 additions & 4 deletions doc/faq.rdoc
Original file line number Diff line number Diff line change
Expand Up @@ -40,14 +40,14 @@ A sharding key is a core foreign key column that is present in most of your larg

For example, on a blogging site the sharding key might be <tt>blog_id</tt>. Most tables that contain a <tt>blog_id</tt> column can be sharded, which will mean that all data related to a particular blog (posts, comments on those posts, authors, etc) is found on the same shard. By organizing data this way, you can continue to use relational operations such as JOIN when querying data that lives on the same shard.

Regardless of sharding key, some tables will not be shardable. This includes any "global" table that doesn't contain your sharding key column, as well as any tables that have global lookup patterns. For this reason you might not be able to shard the core table which has your sharding_key as its primary key!
Regardless of sharding key, some tables will not be shardable. This includes any "global" table that doesn't contain your sharding key column, as well as any tables that have global lookup patterns. For this reason you might not be able to shard the core table which has your sharding_key as its primary key!

In other words: if your sharding key is <tt>user_id</tt>, you might not actually be able to shard your <tt>users</tt> table because you need to do global lookups (ie, by email address) on this table. Denormalization is a common work-around; you could split your users table into a "global lookup" portion in a global pool and an "extended data" portion that lives on shards.


== What is range-based sharding? Why use it, and what are the alternatives?

Range-based sharding groups data based on ranges of your sharding key. For example, with a sharding key of <tt>user_id</tt>, all sharded data for users 1-1000 may be on the first shard, users 1001-3000 on the second shard, and users 3001-infinity on the third and final shard.
Range-based sharding groups data based on ranges of your sharding key. For example, with a sharding key of <tt>user_id</tt>, all sharded data for users 1-1000 may be on the first shard, users 1001-3000 on the second shard, and users 3001-infinity on the third and final shard.

The main benefit of range-based sharding is simplicity. You can express the shard ranges in a language-neutral format like YAML or JSON, and the code to route queries to the correct DB can be implemented in a trivially small amount of code. There's no need for a lookup service, so we avoid a single point of failure. It's also easy for a human to look at the ranges and figure out which DB to query when debugging a problem by hand.

Expand All @@ -60,9 +60,9 @@ The main downside to the range-based approach is lack of even distribution of "h
* <b>Modulus or hash</b>: Apply a function to your sharding key to determine which shard the data lives on.

This approach helps to distribute data very evenly. Many sites find that their latest users behave differently than their oldest users, so grouping users together by ranges of ID (essentially ranges of account creation date) can be problematic. Using a modulus or hash avoids this problem.

The main issue with this approach is how to rebalance shards that are too large. A simple modulus can't do this unless you want to simultaneously split all of your shards in half, which leads to painful exponential growth. A hash function can be more versatile but can still lead to great complexity. Worse yet, there's no way to rebalance _quickly_ because data is not stored on disk in sorted order based on the hash function.

* <b>Lookup table</b>: Use a separate service or data store which takes a sharding key value as an input and returns the appropriate shard as an output.

This scheme allows you to very specifically allocate particular data to shards, and works well for sites that have a lot of "hot" data from celebrity users. However, the lookup service is essentially a single point of failure, which counteracts many of the attractive features of sharded architectures. Rebalancing can also be slow and tricky, since you need a notion of "locking" a sharding key value while its rows are being migrated.
Expand Down
10 changes: 5 additions & 5 deletions doc/jetpants_collins.rdoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,13 @@ remote_lookup:: Supply "remoteLookup" parameter for \Collins requests, to search
To enable this plugin, add it to your \Jetpants configuration file (either <tt>/etc/jetpants.yaml</tt> or <tt>~/.jetpants.yaml</tt>). For example, in a single-datacenter environment, you configuration might look like this:

# ... rest of Jetpants config here

plugins:
jetpants_collins:
user: jetpants
password: xxx
url: http://collins.yourdomain.com:8080

# ... other plugins configured here

== ASSUMPTIONS AND REQUIREMENTS:
Expand Down Expand Up @@ -53,17 +53,17 @@ Adding functional partitions (global / unsharded pools):

# Create the pool object, specifying pool name and IP of current master
p = Pool.new('my-pool-name', '10.42.3.4')

# Tell Jetpants about IPs of any existing active slaves (read slaves), if any.
# For example, say this pool has 2 active slaves and 2 standby slaves. \Jetpants
# can automatically figure out which slaves exist, but won't automatically know
# which ones are active for reads, so you need to tell it.
p.has_active_slave('10.42.3.30')
p.has_active_slave('10.42.3.32')

# Sync the information to Collins
p.sync_configuration

Repeat this process for each functional partition, if you have more than one.

Adding shard pools:
Expand Down
5 changes: 2 additions & 3 deletions doc/online_schema_change.rdoc
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ This plugin has no extra options, just add the name to your plugins section and
To enable this plugin, add it to your \Jetpants configuration file (either <tt>/etc/jetpants.yaml</tt> or <tt>~/.jetpants.yaml</tt>). For example you configuration might look like this:

# ... rest of Jetpants config here

plugins:
online_schema_change:
# ... other plugins configured here
Expand All @@ -25,7 +25,7 @@ Also you should be using \Collins and the jetpants_collins plugin
== EXAMPLES:

dry run of an alter on a single pool
jetpants alter_table --database=allmydata --table=somedata --pool=users --dry-run --alter='ADD COLUMN c1 INT'
jetpants alter_table --database=allmydata --table=somedata --pool=users --dry-run --alter='ADD COLUMN c1 INT'

alter a single pool
jetpants alter_table --database=allmydata --table=somedata --pool=users --alter='ADD COLUMN c1 INT'
Expand All @@ -42,4 +42,3 @@ the alter table does not drop the old table automatically, so to remove the tabl

to drop the tables on all your shards
jetpants alter_table_drop --database=allmydata --table=somedata --all_shards

4 changes: 2 additions & 2 deletions doc/plugins.rdoc
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ If you're writing your own asset-tracker plugin, you will need to override the f
* Jetpants::Topology#count_spares
* Returns a count of spare database nodes
* Jetpants::Pool#sync_configuration
* Updates the asset tracker with the current status of a pool.
* This should update the asset tracker's internal knowledge of the database topology immediately, but not necessarily cause the application's config file to be regenerated immediately.
* Updates the asset tracker with the current status of a pool.
* This should update the asset tracker's internal knowledge of the database topology immediately, but not necessarily cause the application's config file to be regenerated immediately.

You may also want to override or implement these, though it's not strictly mandatory:

Expand Down
2 changes: 1 addition & 1 deletion doc/requirements.rdoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
= Jetpants Requirements and Assumptions

The base classes of \Jetpants currently make a number of assumptions about your environment and database topology.
The base classes of \Jetpants currently make a number of assumptions about your environment and database topology.

Plugins may freely override these assumptions, and upstream patches are very welcome to incorporate support for alternative configurations. We're especially interested in plugins or pull requests that add support for: Postgres and other relational databases; Redis and other non-relational data stores; non-Redhat Linux distributions or *BSD operating systems; master-master topologies; multi-instance-per-host setups; etc. We have attempted to design \Jetpants in a way that is sufficiently flexible to eventually support a wide range of environments.

Expand Down
8 changes: 4 additions & 4 deletions doc/upgrade_helper.rdoc
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,14 @@ new_version:: major.minor string of MySQL version being upgraded to, for
Example usage:

# ... rest of Jetpants config here

plugins:
jetpants_collins:
# config for jetpants_collins here

upgrade_helper:
new_version: "5.5"

# ... other plugins configured here


Expand Down Expand Up @@ -65,4 +65,4 @@ For subsequent shard upgrades, you may optionally use this simplified process.
3. Use "jetpants shard_upgrade --writes" to regenerate your application configuration in a way that moves read AND write queries to the upgraded mirror shard's master.
4. Use "jetpants shard_upgrade --cleanup" to eject all non-upgraded nodes from the pool entirely. This will tear down replication between the version of the shard and the old version.

Using a custom Ruby script, this process can be automated to perform each step on several shards at once.
Using a custom Ruby script, this process can be automated to perform each step on several shards at once.
30 changes: 15 additions & 15 deletions lib/jetpants/callback.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,16 @@ module Jetpants
# Exception class used to halt further processing in callback chain. See
# description in CallbackHandler.
class CallbackAbortError < StandardError; end

# If you include CallbackHandler as a mix-in, it grants the base class support
# for Jetpants callbacks, as defined here:
#
# If you invoke a method "foo", Jetpants will first
# If you invoke a method "foo", Jetpants will first
# automatically call any "before_foo" methods that exist in the class or its
# superclasses. You can even define multiple methods named before_foo (in the
# same class!) and they will each be called. In other words, Jetpants
# callbacks "stack" instead of overriding each other.
#
#
# After calling any/all before_foo methods, the foo method is called, followed
# by all after_foo methods in the same manner.
#
Expand All @@ -37,39 +37,39 @@ def method_added(name)
# Intercept before_* and after_* methods and create corresponding Callback objects
if name.to_s.start_with? 'before_', 'after_'
Callback.new self, name.to_s.split('_', 2)[1].to_sym, name.to_s.split('_', 2)[0].to_sym, @callback_priority

# Intercept redefinitions of methods we've already wrapped, so we can
# wrap them again
elsif Callback.wrapped? self, name
Callback.wrap_method self, name
end
end
end

# Default priority for callbacks is 100
@callback_priority = 100
end
end
end

# Generic representation of a before-method or after-method callback.
# Used internally by CallbackHandler; you won't need to interact with Callback directly.
class Callback
@@all_callbacks = {} # hash of class obj -> method_name symbol -> type string -> array of callbacks
@@currently_wrapping = {} # hash of class obj -> method_name symbol -> bool

attr_reader :for_class # class object
attr_reader :method_name # symbol containing method name (the one being callback-wrapped)
attr_reader :type # :before or :after
attr_reader :priority # high numbers get triggered first
attr_reader :my_alias # method name alias OF THE CALLBACK

def initialize(for_class, method_name, type=:after, priority=100)
@for_class = for_class
@method_name = method_name
@type = type
@priority = priority

@@all_callbacks[for_class] ||= {}
@@all_callbacks[for_class][method_name] ||= {}
already_wrapped = Callback.wrapped?(for_class, method_name)
Expand All @@ -82,16 +82,16 @@ def initialize(for_class, method_name, type=:after, priority=100)
alias_method new_name, old_name
end
Callback.wrap_method(for_class, method_name) unless already_wrapped

@@all_callbacks[for_class][method_name][type] << self
end

def self.wrap_method(for_class, method_name)
@@currently_wrapping[for_class] ||= {}
@@currently_wrapping[for_class][method_name] ||= false
return if @@currently_wrapping[for_class][method_name] # prevent infinite recursion from the alias_method call
@@currently_wrapping[for_class][method_name] = true

for_class.class_eval do
alias_method "#{method_name}_without_callbacks".to_sym, method_name
define_method method_name do |*args|
Expand All @@ -110,10 +110,10 @@ def self.wrap_method(for_class, method_name)
result
end
end

@@currently_wrapping[for_class][method_name] = false
end

def self.trigger(for_object, method_name, type, *args)
my_callbacks = []
for_object.class.ancestors.each do |for_class|
Expand All @@ -124,7 +124,7 @@ def self.trigger(for_object, method_name, type, *args)
my_callbacks.sort_by! {|c| -1 * c.priority}
my_callbacks.each {|c| for_object.send(c.my_alias, *args)}
end

def self.wrapped?(for_class, method_name)
return false unless @@all_callbacks[for_class] && @@all_callbacks[for_class][method_name]
@@all_callbacks[for_class][method_name].count > 0
Expand Down
8 changes: 4 additions & 4 deletions lib/jetpants/db/import_export.rb
Original file line number Diff line number Diff line change
Expand Up @@ -356,7 +356,7 @@ def rebuild!(tables=false, min_id=false, max_id=false)

export_schemata tables
export_data tables, min_id, max_id

# We need to be paranoid and confirm nothing else has restarted mysql (re-enabling binary logging)
# out-of-band. Besides the obvious slowness of importing things while binlogging, this is outright
# dangerous if GTID is in-use. So we check before every method or statement that does writes
Expand All @@ -365,7 +365,7 @@ def rebuild!(tables=false, min_id=false, max_id=false)
import_schemata!
if respond_to? :alter_schemata
raise "Binary logging has somehow been re-enabled. Must abort for safety!" if binary_log_enabled?
alter_schemata
alter_schemata
# re-retrieve table metadata in the case that we alter the tables
pool.probe_tables
tables = pool.tables.select{|t| pool.tables.map(&:name).include?(t.name)}
Expand Down Expand Up @@ -430,7 +430,7 @@ def clone_to!(*targets)
}.reject { |s|
Jetpants.mysql_clone_ignore.include? s
}

# If using GTID, we need to remember the source's gtid_executed from the point-in-time of the copy.
# We also need to ensure that the targets match the same gtid-related variables as the source.
# Ordinarily this should be managed by my.cnf, but while a fleet-wide GTID rollout is still underway,
Expand Down Expand Up @@ -467,7 +467,7 @@ def clone_to!(*targets)
t.start_query_killer
t.enable_monitoring
end

# If the source is using GTID, we need to set the targets' gtid_purged to equal the
# source's gtid_executed. This is needed because we do not copy binlogs, which are
# the source of truth for gtid_purged and gtid_executed. (Note, setting gtid_purged
Expand Down
Loading

0 comments on commit 4353730

Please sign in to comment.