@@ -6,15 +6,12 @@ Back Up a Sharded Cluster with File System Snapshots
66
77.. default-domain:: mongodb
88
9-
10-
119.. contents:: On this page
1210 :local:
1311 :backlinks: none
1412 :depth: 1
1513 :class: singlecol
1614
17-
1815Overview
1916--------
2017
@@ -40,15 +37,15 @@ Encrypted Storage Engine (MongoDB Enterprise Only)
4037~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4138
4239.. include:: /includes/fact-aes256-backups.rst
43-
40+
4441Balancer
4542~~~~~~~~
4643
4744It is *essential* that you stop the :ref:`balancer
4845<sharding-internals-balancing>` before capturing a backup.
4946
5047If the balancer is active while you capture backups, the backup
51- artifacts may be incomplete and/ or have duplicate data, as :term:`chunks
48+ artifacts may be incomplete or have duplicate data, as :term:`chunks
5249<chunk>` may migrate while recording backups.
5350
5451Precision
@@ -58,28 +55,191 @@ In this procedure, you will stop the cluster balancer and take a backup
5855up of the :term:`config database`, and then take backups of each
5956shard in the cluster using a file-system snapshot tool. If you need an
6057exact moment-in-time snapshot of the system, you will need to stop all
61- application writes before taking the file system snapshots; otherwise
62- the snapshot will only approximate a moment in time.
63-
64- For approximate point-in-time snapshots, you can minimize the impact on
65- the cluster by taking the backup from a secondary member of each
66- replica set shard.
58+ writes before taking the file system snapshots; otherwise the snapshot will
59+ only approximate a moment in time.
6760
6861Consistency
6962~~~~~~~~~~~
7063
71- If the journal and data files are on the same logical volume, you can
72- use a single point-in-time snapshot to capture a consistent copy of the
73- data files.
74-
75- If the journal and data files are on different file systems, you must
76- use :method:`db.fsyncLock()` and :method:`db.fsyncUnlock()` to ensure
77- that the data files do not change, providing consistency for the
78- purposes of creating backups.
64+ To back up a sharded cluster, you must use the :dbcommand:`fsync` command or
65+ :method:`db.fsyncLock` method to stop writes on the cluster. This ensures that
66+ data files do not change during the backup.
7967
8068.. include:: /includes/fact-backup-snapshots-with-ebs-in-raid10.rst
8169
82- Procedure
83- ---------
70+ Steps
71+ -----
72+
73+ To take a self-managed backup of a sharded cluster, complete the following
74+ steps:
75+
76+ .. procedure::
77+ :style: normal
78+
79+ .. step:: Find a Backup Window
80+
81+ Chunk migrations, resharding, and schema migration operations can cause
82+ inconsistencies in backups. To find a good time to perform a backup,
83+ monitor your application and database usage and find a time when these
84+ operations are unlikely to occur.
85+
86+ For more information, see :ref:`sharded-schedule-backup`.
87+
88+ .. step:: Stop the Balancer
89+
90+ To prevent chunk migrations from disrupting the backup, use
91+ the :method:`sh.stopBalancer` method to stop the balancer:
92+
93+ .. code-block:: javascript
94+
95+ sh.stopBalancer()
96+
97+ If a balancing round is currently in progress, the operation waits for
98+ balancing to complete.
99+
100+ To confirm that the balancer is stopped, use the
101+ :method:`sh.getBalancerState` method:
102+
103+ .. io-code-block::
104+
105+ .. input::
106+ :language: javascript
107+
108+ sh.getBalancerState()
109+
110+ .. output::
111+ :language: javascript
112+
113+ false
114+
115+ The command returns ``false`` when the balancer is stopped.
116+
117+ .. step:: Lock the Cluster
118+
119+ Writes to the database can cause backup inconsistencies. Lock your
120+ sharded cluster to protect the database from writes.
121+
122+ To lock a sharded cluster, use the :method:`db.fsyncLock` method:
123+
124+ .. code-block:: javascript
125+
126+ db.getSiblingDB("admin").fsyncLock()
127+
128+ Run the following aggregation pipeline on both :program:`mongos` and
129+ the primary :program:`mongod` of the config servers. To confirm the
130+ lock, ensure that the ``fysncLocked`` field returns ``true`` and
131+ ``fsyncUnlocked`` field returns ``false``.
132+
133+ .. io-code-block::
134+
135+ .. input::
136+ :language: javascript
137+
138+ db.getSiblingDB("admin").aggregate( [
139+ { $currentOp: { } },
140+ { $facet: {
141+ "locked": [
142+ { $match: { $and: [
143+ { fsyncLock: { $exists: true } },
144+ { fsyncLock: true }
145+ ] } }],
146+ "unlocked": [
147+ { $match: { fsyncLock: { $exists: false } } }
148+ ]
149+ } },
150+ { $project: {
151+ "fsyncLocked": { $gt: [ { $size: "$locked" }, 0 ] },
152+ "fsyncUnlocked": { $gt: [ { $size: "$unlocked" }, 0 ] }
153+ } }
154+ ] )
155+
156+ .. output::
157+ :language: json
158+
159+ [ { fsyncLocked: true }, { fsyncUnlocked: false } ]
160+
161+ .. step:: Back up the Primary Config Server
162+
163+ .. note::
164+
165+ Backing up a :ref:`config server <sharding-config-server>` backs
166+ up the sharded cluster's metadata. You only need to back up one
167+ config server, as they all hold the same data. Perform this step
168+ against the CSRS primary member.
169+
170+ To create a filesystem snapshot of the config server, follow the
171+ procedure in :ref:`lvm-backup-operation`.
172+
173+ .. step:: Back up the Primary Shards
174+
175+ Perform a filesystem snapshot against the primary member of each shard,
176+ using the procedure found in :ref:`backup-restore-filesystem-snapshots`.
177+
178+ .. step:: Unlock the Cluster
179+
180+ After the backup completes, you can unlock the cluster to allow writes
181+ to resume.
182+
183+ To unlock the cluster, use the :method:`db.fsyncUnlock` method:
184+
185+ .. code-block:: bash
186+
187+ db.getSibling("admin").fsyncUnlock()
188+
189+ Run the following aggregation pipeline on both :program:`mongos` and
190+ the primary :program:`mongod` of the config servers. To confirm the
191+ unlock, ensure that the ``fysncLocked`` field returns ``false`` and
192+ ``fsyncUnlocked`` field returns ``true``.
193+
194+ .. io-code-block::
195+
196+ .. input::
197+ :language: javascript
198+
199+ db.getSiblingDB("admin").aggregate( [
200+ { $currentOp: { } },
201+ { $facet: {
202+ "locked": [
203+ { $match: { $and: [
204+ { fsyncLock: { $exists: true } },
205+ { fsyncLock: true }
206+ ] } }],
207+ "unlocked": [
208+ { $match: { fsyncLock: { $exists: false } } }
209+ ]
210+ } },
211+ { $project: {
212+ "fsyncLocked": { $gt: [ { $size: "$locked" }, 0 ] },
213+ "fsyncUnlocked": { $gt: [ { $size: "$unlocked" }, 0 ] }
214+ } }
215+ ] )
216+
217+ .. output::
218+ :language: json
219+
220+ [ { fsyncLocked: false }, { fsyncUnlocked: true } ]
221+
222+ .. step:: Restart the Balancer
223+
224+ To restart the balancer, use the :method:`sh.startBalancer` method:
225+
226+ .. code-block:: javascript
227+
228+ sh.startBalancer()
229+
230+ To confirm that the balancer is running, use the
231+ :method:`sh.getBalancerState` method:
232+
233+ .. io-code-block::
234+
235+ .. input::
236+ :language: javascript
237+
238+ sh.getBalancerState()
239+
240+ .. output::
241+ :language: javascript
242+
243+ true
84244
85- .. include:: /includes/steps/backup-sharded-cluster-with-snapshots.rst
245+ The command returns ``true`` when the balancer is running.
0 commit comments