Transfer assets and submissions between two projects.
- Clone a copy of this repo somewhere on your local machine:
git clone https://github.com/kobotoolbox/kobo-transfer-
Install
pippackages fromrequirements.txt. See detailed steps here. -
Copy
sample-config.jsontoconfig.jsonand add your configuration details for the source (src) and destination (dest) projects. If both projects are located on the same server, then just duplicate the URLs and token values. -
If only syncing submissions, ensure the destination project is deployed and has the same content as the source project.
-
If transferring assets and submissions for the first time, leave the
dest.asset_uidfield empty in the config file:
{
...
"dest": {
...
"asset_uid": ""
}
}
Note: Kobo offers two public servers, the Global and EU servers. For each of these, the config URLs are the following:
- EU:
kc_url: https://kc-eu.kobotoolbox.orgkf_url: https://eu.kobotoolbox.org
- Global:
kc_url: https://kc.kobotoolbox.orgkf_url: https://kf.kobotoolbox.org
python3 run.py \
[--config-file/-c <file path>] [--asset/-a] [--sync/-s] [--no-validate/-N] \
[--validation-status/-vs] [--analysis-data/-ad] [--keep-media/-k] \
[--src-asset-uid/-sau <uid>] [--limit/-l <limit>] [--chunk-size/-cs <size>] \
[--regenerate-uuids/-R] [--last-failed/-lf] [--quiet/-q]To transfer the asset, its form media and versions from the src to dest
servers, use the --asset flag, in addition to any other flags described below.
Once the asset has finished transferring, the submissions will be transferred
next. Note that each time this flag is used, a new asset is created on the
dest side.
python3 run.py --assetThe original UUID for each submission is maintained across the transfer,
allowing for duplicate submissions to be rejected at the destination project if
the script is run multiple times. If this behaviour is not desired, pass the
--regenerate-uuids flag to create new UUIDs for each submission. This may be
necessary when transferring submissions to a project located on the same server.
Use the --sync option to keep the two projects in sync after an initial
transfer. This is useful if you are phasing from one server to the other and
there is still data being collected at the src. Without using --sync in this
case, if the submissions contain media attachments, they will be duplicated at
the dest project and therefore consume unnecessary storage in your account.
Use the --validation-status option to sync the validation statuses from src
submissions to the dest. If used in combination with the --sync option, it
will first transfer missing submissions and then sync the statuses. If used
alone, it will only sync the status and then end script operation -- no
submissions will be transferred. Since the validation statuses are metadata to
the submissions, this requires an additional step to the standard process.
python3 run.py --sync --validation-status
# or
python3 run.py --validation-statusUse the --analysis-data to transfer analysis data, which may include
transcripts, translations, analysis questions. As with the --validation-status
option, this is an additional step needed once the submissions have already been
transferred. It can be run with the --sync option (and other compatible
options) or on its own.
python3 run.py --sync --analysis-data
# or
python3 run.py --analysis-dataIf submissions contain media attachments, all media will be downloaded to a
local attachments/ directory before the transfer between projects begin.
Attachment files will be cleaned up after completion of the transfer unless the
--keep-media flag is passed.
The --limit option can be set to restrict the number of submissions processed
in a batch. For large projects, either in number of submissions or number of
questions or both, it may be necessary to reduce the limit below the default of
30000 to mitigate time-outs from the server.
Sometimes transfers will fail for whatever reason. A list of failed UUIDs is
stored in .log/failures.txt after each run. You can run the transfer again
with only these failed submissions by passing the flag --last-failed.
If you would like to have a configuration file other than config.json, such as
when different configurations are kept in the directory, then specify the file
path with --config-file:
python3 run.py --config-file config-2.jsonBy default, the configuration file will be validated before the transfer is
attempted. Pass the --no-validate flag to skip this step.
Example usage with syncing submissions, validation statuses and analysis data in one go, noting that this will be three different phases of the transfer that will run sequentially:
python3 run.py --config-file config-project-abc.json --sync \
--validation-status --analysis-data \
--keep-media --no-validate
# additionally let's transfer the asset itself with the `--asset` flag
python3 run.py --config-file config-project-abc.json --sync \
--asset --validation-status --analysis-data \
--keep-media --no-validateUse the --src-asset-uid to pass an asset UID through the args rather than in
the config file. This allows for iterating through a list of assets UIDs and
transfer them in bulk. Example usage of transferring all assets and submission
data from one user account to another (note that a config file is still required
for configuring URLs and tokens):
TOKEN=<your src token>
# note this requires `jq` to be installed
curl -s 'https://kf.kobotoolbox.org/api/v2/assets.json' \
-H "Authorization: Token $TOKEN" | \
jq '.results[] | select(.asset_type == "survey" and .has_deployment == true) | .uid' | \
xargs -I {} python3 run.py --src-asset-uid "{}" --asset --sync -c <config file>Media attachments are written to the local attachments/ directory and follow
the tree structure of:
{asset_uid}
├── {submission_uid}
│ ├── {filename}
│ └── {filename}
├── {submission_uid}
│ └── {filename}
├── {submission_uid}
│ └── {filename}
├── {submission_uid}
│ └── {filename}
└── {submission_uid}
├── {filename}
└── {filename}- Although submissions will generally not be duplicated across multiple runs of
the script, if the submissions contain attachment files, they are duplicated
on the server unless the
--syncoption is used. - The script does not check if the source and destination projects are identical and will transfer submission data regardless.
- The script does not account for multiple versions that the form may have had.
Rather use the
--assetflag to fully transfer thesrcproject to thedestside to account for this. It naively uses the latest version of thedestform for the submissions'__version__attribute. This will be updated at some point to match the version history at thedestproject. - Currently it's not possible to sync the asset versions from
srctodest. Once a project has been transferred with all its versions, it's best not to continue updating the form and submitting data to thesrcproject to avoid complications. - If the
destform is updated and redeployed, it will have a new version UID. If the script is run again, this will result in duplicates at thedestbecause the submissions contain the new__version__value, therefore are no longer unique, and therefore won't be rejected from thedestproject. This will be addressed once transferred submissions have their__version__value matching the new version UIDs at thedestproject. - Due to a known KoboToolbox issue, projects may contain submissions with
duplicate submission UUIDs. Some of these submissions may be full duplicates
of themselves, while others are unique submissions but contain a duplicate
UUID value. If an initial sync between
srcanddesthas been done, only unique submissions will be transferred (or accepted by thedestproject). If more submissions are collected at thesrcafter this point and they contain duplicate UUIDs from the previous sync (the UUID already exists at thedest), those submissions will not be transferred.
To ensure that the necessary Python packages are installed correctly, follow the
steps below to set up a virtual environment and install the packages listed in
the requirements.txt file. These instructions cover both Windows and
macOS/Linux systems.
- Install Python and pip
Make sure you have Python and pip installed. You can download Python from the official website, which includes pip by default.
- Create a virtual environment
Open the Command Prompt and navigate to the directory where your script is located. Then, run the following command to create a virtual environment:
python -m venv venv- Activate the virtual environment
Activate the virtual environment with the following command:
.\venv\Scripts\activate- Install the required packages
With the virtual environment activated, install the required packages by running:
pip install -r requirements.txt- Install Python and pip
Ensure you have Python and pip installed. Most macOS/Linux systems come with
Python pre-installed. If not, you can install Python via a package manager
(e.g., brew for macOS or apt for Ubuntu).
- Create a virtual environment
Open a terminal and navigate to the directory where your script is located. Then, run the following command to create a virtual environment:
python3 -m venv venv- Activate the virtual environment
Activate the virtual environment with the following command:
source venv/bin/activate- Install the required packages
With the virtual environment activated, install the required packages by running:
pip3 install -r requirements.txt