This project came about as a means to track video content on concert shows, but is written generically. It recursively scans folders, which can be mapped network drives or streamed cloud storage like Dropbox, Drive, LucidLink or Suite and stores a list of the files & folders on AirTable.
npx @garethnunns/cli-content-tracker
Then you just need to edit the config.json
file that was created where the command was executed and re-run with the config command line option tracker -c
.
Will require Node to be installed, if on Mac probably install with Homebrew:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install node
You can duplicate this base or create your own detailed below. See the section on configuring the settings for AirTable on how to link this to the script.
You'll need a base with the two following tables for folders & files.
Field | Description | Type |
---|---|---|
_path |
Unix path to this folder in the project | Single line text |
_fullPath |
Will store the full path of the folder | Single line text |
_size |
Size of the folder in bytes | Number - 0 decimal places |
_ctime |
Creation time of the folder | Date - include time, Use same time for all collaborators |
_mtime |
Last modified time of the folder | Date - include time, Use same time for all collaborators |
_items |
Number of items in the folder | Number - 0 decimal places |
_parent |
Parent folder of this folder | Link to record - this table |
Only use this smaller table when the mediaMetadata setting is disabled.
Field | Description | Type |
---|---|---|
_path |
Unix path to this folder in the project | Single line text |
_fullPath |
Will store the full path of the folder | Single line text |
_size |
Size of the file in bytes | Number - 0 decimal places |
_ctime |
Creation time of the file | Date - include time, Use same time for all collaborators |
_mtime |
Last modified time of the file | Date - include time, Use same time for all collaborators |
_parent |
Parent folder of this file | Link to record - folders table |
All the fields of the in the files table plus:
Field | Description | Type |
---|---|---|
_duration |
Duration of the media | Number - 2 decimal places |
_video |
Whether the media has a video stream | checkbox |
_videoStill |
Whether the file is a image | checkbox |
_videoCodec |
Video codec of the media | Single line text |
_videoWidth |
Width of the media | Number - 0 decimal places |
_videoHeight |
Height of the media | Number - 0 decimal places |
_videoFormat |
Pixel format of the media | Single line text |
_videoAlpha |
Whether the media has an alpha channel | checkbox |
_videoFPS |
Frames Per Second of the video, 0 for stills | Number - 2 decimal places |
_videoBitRate |
Bit rate of the video (b) | Number - 0 decimal places |
_audio |
Whether the media has a audio stream | checkbox |
_audioCodec |
Audio codec of the media | Single line text |
_audioSampleRate |
Sample rate of the audio (KHz) | Number - 0 decimal places |
_audioChannels |
Number of channels of audio | Number - 0 decimal places |
_audioBitRate |
Bit rate of the audio (b) | Number - 0 decimal places |
The fields are named like this so it's clear which fields are entered via the tracker, you can easily do other fields that reference these values, but do not edit the values in these columns or they will just get overwritten/deleted.
If all the records are being updated every time, it is likely because of a mismatch on the created/modified time - ensure the Use same time for all collaborators option is selected on these fields and you can try setting the timezone to match the computer which is running the script. Despite sending them as UTC strings, AirTable is a bit funky on how it handles the dates.
Get the latest options by running tracker -h
which will return something like:
Usage: tracker [options]
CLI File Content Tracking
Options:
-V, --version output the version number
-c, --config <path> config file path (if none specified template will be created)
-d, --dry-run will do everything apart from updating AirTable
-l, --logging <level> set the logging level
-nd, --no-delete never remove records from AirTable
-w, --wipe-cache clear the metadata cache
-h, --help display help for command
When you run tracker
for the first time without a path option it will generate the config.json
file which you will then need to update. Once updated run the command again with tracker -c config.json
, for more info on the contents of the config file, check the section on config files.
Inherently, it's nice to know this isn't going to wreak havoc on your AirTable, so if you run tracker -c config.json -d
it will show you what it's going to do but stop short of modifying the AirTable. It will still run on at the frequency you specify. You will need to run with a (#logging-option--l---logging) logging level of verbose to see all the details.
Specify one of the following levels of logging:
- error
- warn
- info
- http (default)
- verbose
- debug
- silly
By default the script will stop and confirm before it removes more than 10% of the table (if you don't respond in under a minute then it will assume not), but when run with -nc
it will always delete the records in AirTable.
This will perform still insert and update records in AirTable, however will not remove them if they have been deleted in the local file system - handy if you are freeing up space on your local disk by deleting media but still want the reference in AirTable.
There's a local cache built up in ./db
of the metadata of files so they don't have to keep getting queried via FFMpeg which is a tad computationally expensive, but if you need to rescan all files run with this option. Files will already be automatically scanned if they change size or the modified time changed.
Create the default config as described above, which will generate something like this:
{
"settings": {
"files": {
"rootPath": "/Users/user/Documents/cli-content-tracker",
"dirs": [
"/Users/user/Documents/cli-content-tracker"
],
"frequency": 30,
"rules": {
"dirs": {
"includes": [],
"excludes": []
},
"files": {
"includes": [],
"excludes": [
"/\\.DS_Store$/"
]
}
},
"mediaMetadata": true,
"limitToFirstFile": false,
"concurrency": 100
},
"airtable": {
"api": "",
"base": "",
"foldersID": "",
"filesID": "",
"view": "API"
}
}
}
In general, leave all the parameters in the JSON file, there is some error handling if they're not present but probably for the best to leave everything in there.
This section relates to all the local file scanning. The script in general builds up a list of all the files and folders and here you get a bit of control over that.
In an effort to make the script less dependent on exactly where the files are stored on your computer/where the folder is mounted, this string will be removed from the start of file paths.
"rootPath": "/Volumes/Suite/Project"
Array of irectory to recursively search through, e.g.
"dirs": [ "/Volumes/Suite/Project/Item 1", "/Volumes/Suite/Project/Item 2" ]
How often the directory is scanned (in seconds), e.g. if you wanted to scan it every minute:
"frequency": 60
You can also set this to 0
and the script will only run once - this is intended if you want to automate this as part of a cron job.
So I'll be honest, this is the only slightly faffy bit... but it definitely beats entering them as command-line arguments that was my first plan. The thought process here is you're filtering the paths of the files and folders will get included in the tables which are pushed to AirTable.
Whatever you specify in these fields the script will still have to traverse all of the folders in the directory - if you have specified a pattern like /.*\/Delivery\/.*/
which would match any folder with /Delivery/
in the path, by the nature of the task you're still going to have to search through every folder.
Now the bit that makes it faffy is you have to stringify JS regex patterns, which usually just means escaping the slashes - a handy one to make use of the dry run option. Note you're matching the entire path of the file/folder in both dirs
& files
.
For example, in the example below we're limiting the folders which are stored on AirTable to only be ones that include /05 Delivery/
somewhere in the path, then only including specific image sequence TIFFs:
"rules": {
"dirs": {
"includes": [
"/.*\/05 Delivery\/.*/"
],
"excludes": []
},
"files": {
"includes": [
"/[_.]v\\d{2,3}\\.tif/",
"/[_.]0{4,5}\\.tif/"
],
"excludes": []
}
}
Whether you want to get all the metadata for the media, which will scrape all the fields in the (files with metadata table)[#file-table-with-metadata], e.g.
"mediaMetadata": true
This works in conjunction with the file rules but limits it to only the first file in each folder, with the intention of finding the first image in sequences, e.g. this just gets the first TIFF file in the deliveries folder:
"rules": {
"dirs": {
"includes": [],
"excludes": []
},
"files": {
"includes": [
"/05_Delivery/.*\\.tif/"
],
"excludes": []
}
},
"mediaMetadata": true,
"limitToFirstFile": true
This limits the number of concurrent file operations, introduced to throttle the load on Suite. If you have a computer with enough RAM and the files are static then this value can be larger, however if it's a weak computer and the files are being streamed you might want to limit this to < 10, e.g.
"concurrency": 1
Once you've setup your AirTable, please configure the following settings:
Get your API key from the AirTable Tokens page, it will need the following permissions:
- data.records:read
- data.records:write
Plus access to the workspace where your base is located.
You'll need to go to the API page for your base and get the base ID and enter this here, e.g.
"base": "app**************"
Technically you can just put the folders table name here, but on the same page as the base ID you can get the IDs for the tables, then if the table names get updated later it won't affect the script, e.g.
"foldersID": "tbl**************",
"filesID": "tbl**************"
This is the view the script compares the local file list against - e.g. you could technically store other items in the table and filter them out in this view; you could have multiple scripts all writing into the same table and filter them out per view (this might be better achieved by writing to multiple tables).
This defaults to a view called API
:
"view": "API"
Clone the repo and run it locally like so:
git clone https://github.com/garethnunns/cli-content-tracker.git
cd cli-content-tracker
npm install
npm link
tracker
You can always npm unlink
this later.
Very welcome to pull requests!