CLI Content Tracker to AirTable

This project came about as a means to track video content on concert shows, but is written generically. It recursively scans folders, which can be mapped network drives or streamed cloud storage like Dropbox, Drive, LucidLink or Suite and stores a list of the files & folders on AirTable.

Quick Start

npx @garethnunns/cli-content-tracker

Then you just need to edit the config.json file that was created where the command was executed and re-run with the config command line option tracker -c.

Will require Node to be installed, if on Mac probably install with Homebrew:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install node

AirTable Setup

You can duplicate this base or create your own detailed below. See the section on configuring the settings for AirTable on how to link this to the script.

Manual Base Creation

You'll need a base with the two following tables for folders & files.

Folders Table

Field	Description	Type
`_path`	Unix path to this folder in the project	Single line text
`_fullPath`	Will store the full path of the folder	Single line text
`_size`	Size of the folder in bytes	Number - 0 decimal places
`_ctime`	Creation time of the folder	Date - include time, Use same time for all collaborators
`_mtime`	Last modified time of the folder	Date - include time, Use same time for all collaborators
`_items`	Number of items in the folder	Number - 0 decimal places
`_parent`	Parent folder of this folder	Link to record - this table

Files Table

Only use this smaller table when the mediaMetadata setting is disabled.

Field	Description	Type
`_path`	Unix path to this folder in the project	Single line text
`_fullPath`	Will store the full path of the folder	Single line text
`_size`	Size of the file in bytes	Number - 0 decimal places
`_ctime`	Creation time of the file	Date - include time, Use same time for all collaborators
`_mtime`	Last modified time of the file	Date - include time, Use same time for all collaborators
`_parent`	Parent folder of this file	Link to record - folders table

File Table with Metadata

All the fields of the in the files table plus:

Field	Description	Type
`_duration`	Duration of the media	Number - 2 decimal places
`_video`	Whether the media has a video stream	checkbox
`_videoStill`	Whether the file is a image	checkbox
`_videoCodec`	Video codec of the media	Single line text
`_videoWidth`	Width of the media	Number - 0 decimal places
`_videoHeight`	Height of the media	Number - 0 decimal places
`_videoFormat`	Pixel format of the media	Single line text
`_videoAlpha`	Whether the media has an alpha channel	checkbox
`_videoFPS`	Frames Per Second of the video, 0 for stills	Number - 2 decimal places
`_videoBitRate`	Bit rate of the video (b)	Number - 0 decimal places
`_audio`	Whether the media has a audio stream	checkbox
`_audioCodec`	Audio codec of the media	Single line text
`_audioSampleRate`	Sample rate of the audio (KHz)	Number - 0 decimal places
`_audioChannels`	Number of channels of audio	Number - 0 decimal places
`_audioBitRate`	Bit rate of the audio (b)	Number - 0 decimal places

Field Notes

The fields are named like this so it's clear which fields are entered via the tracker, you can easily do other fields that reference these values, but do not edit the values in these columns or they will just get overwritten/deleted.

If all the records are being updated every time, it is likely because of a mismatch on the created/modified time - ensure the Use same time for all collaborators option is selected on these fields and you can try setting the timezone to match the computer which is running the script. Despite sending them as UTC strings, AirTable is a bit funky on how it handles the dates.

Command Line Options

Get the latest options by running tracker -h which will return something like:

Usage: tracker [options]

CLI File Content Tracking

Options:
  -V, --version          output the version number
  -c, --config <path>    config file path (if none specified template will be created)
  -d, --dry-run          will do everything apart from updating AirTable
  -l, --logging <level>  set the logging level
  -nd, --no-delete       never remove records from AirTable
  -w, --wipe-cache       clear the metadata cache
  -h, --help             display help for command

Config Option: -c, --config

When you run tracker for the first time without a path option it will generate the config.json file which you will then need to update. Once updated run the command again with tracker -c config.json, for more info on the contents of the config file, check the section on config files.

Dry-run Option: -d, --dry-run

Inherently, it's nice to know this isn't going to wreak havoc on your AirTable, so if you run tracker -c config.json -d it will show you what it's going to do but stop short of modifying the AirTable. It will still run on at the frequency you specify. You will need to run with a (#logging-option--l---logging) logging level of verbose to see all the details.

Logging Option: -l, --logging

Specify one of the following levels of logging:

error
warn
info
http (default)
verbose
debug
silly

No Confirm Option: -nc, --no-confirm

By default the script will stop and confirm before it removes more than 10% of the table (if you don't respond in under a minute then it will assume not), but when run with -nc it will always delete the records in AirTable.

No Delete Option: -nd, --no-delete

This will perform still insert and update records in AirTable, however will not remove them if they have been deleted in the local file system - handy if you are freeing up space on your local disk by deleting media but still want the reference in AirTable.

Wipe Cache: -w, --wipe-cache

There's a local cache built up in ./db of the metadata of files so they don't have to keep getting queried via FFMpeg which is a tad computationally expensive, but if you need to rescan all files run with this option. Files will already be automatically scanned if they change size or the modified time changed.

Config File

Create the default config as described above, which will generate something like this:

{
  "settings": {
    "files": {
      "rootPath": "/Users/user/Documents/cli-content-tracker",
      "dirs": [
        "/Users/user/Documents/cli-content-tracker"
      ],
      "frequency": 30,
      "rules": {
        "dirs": {
          "includes": [],
          "excludes": []
        },
        "files": {
          "includes": [],
          "excludes": [
            "/\\.DS_Store$/"
          ]
        }
      },
      "mediaMetadata": true,
      "limitToFirstFile": false,
      "concurrency": 100
    },
    "airtable": {
      "api": "",
      "base": "",
      "foldersID": "",
      "filesID": "",
      "view": "API"
    }
  }
}

In general, leave all the parameters in the JSON file, there is some error handling if they're not present but probably for the best to leave everything in there.

config.settings

config.settings.files

This section relates to all the local file scanning. The script in general builds up a list of all the files and folders and here you get a bit of control over that.

config.settings.files.rootPath

In an effort to make the script less dependent on exactly where the files are stored on your computer/where the folder is mounted, this string will be removed from the start of file paths.

"rootPath": "/Volumes/Suite/Project"

config.settings.files.dirs

Array of irectory to recursively search through, e.g.

"dirs": [ "/Volumes/Suite/Project/Item 1", "/Volumes/Suite/Project/Item 2" ]

config.settings.files.frequency

How often the directory is scanned (in seconds), e.g. if you wanted to scan it every minute:

"frequency": 60

You can also set this to 0 and the script will only run once - this is intended if you want to automate this as part of a cron job.

config.settings.files.rules

So I'll be honest, this is the only slightly faffy bit... but it definitely beats entering them as command-line arguments that was my first plan. The thought process here is you're filtering the paths of the files and folders will get included in the tables which are pushed to AirTable.

Whatever you specify in these fields the script will still have to traverse all of the folders in the directory - if you have specified a pattern like /.*\/Delivery\/.*/ which would match any folder with /Delivery/ in the path, by the nature of the task you're still going to have to search through every folder.

Now the bit that makes it faffy is you have to stringify JS regex patterns, which usually just means escaping the slashes - a handy one to make use of the dry run option. Note you're matching the entire path of the file/folder in both dirs & files.

For example, in the example below we're limiting the folders which are stored on AirTable to only be ones that include /05 Delivery/ somewhere in the path, then only including specific image sequence TIFFs:

"rules": {
  "dirs": {
    "includes": [
      "/.*\/05 Delivery\/.*/"
    ],
    "excludes": []
  },
  "files": {
    "includes": [
      "/[_.]v\\d{2,3}\\.tif/",
      "/[_.]0{4,5}\\.tif/"
    ],
    "excludes": []
  }
}

config.settings.files.mediaMetadata

Whether you want to get all the metadata for the media, which will scrape all the fields in the (files with metadata table)[#file-table-with-metadata], e.g.

"mediaMetadata": true

config.settings.files.limitToFirstFile

This works in conjunction with the file rules but limits it to only the first file in each folder, with the intention of finding the first image in sequences, e.g. this just gets the first TIFF file in the deliveries folder:

"rules": {
    "dirs": {
      "includes": [],
      "excludes": []
    },
    "files": {
      "includes": [
        "/05_Delivery/.*\\.tif/"
      ],
      "excludes": []
    }
  },
  "mediaMetadata": true,
  "limitToFirstFile": true

config.settings.files.concurrency

This limits the number of concurrent file operations, introduced to throttle the load on Suite. If you have a computer with enough RAM and the files are static then this value can be larger, however if it's a weak computer and the files are being streamed you might want to limit this to < 10, e.g.

"concurrency": 1

config.settings.airtable

Once you've setup your AirTable, please configure the following settings:

config.settings.airtable.api

Get your API key from the AirTable Tokens page, it will need the following permissions:

data.records:read
data.records:write

Plus access to the workspace where your base is located.

config.settings.airtable.base

You'll need to go to the API page for your base and get the base ID and enter this here, e.g.

"base": "app**************"

config.settings.airtable.foldersID / filesID

Technically you can just put the folders table name here, but on the same page as the base ID you can get the IDs for the tables, then if the table names get updated later it won't affect the script, e.g.

"foldersID": "tbl**************",
"filesID": "tbl**************"

config.settings.airtable.view

This is the view the script compares the local file list against - e.g. you could technically store other items in the table and filter them out in this view; you could have multiple scripts all writing into the same table and filter them out per view (this might be better achieved by writing to multiple tables).

This defaults to a view called API:

"view": "API"

Development

Clone the repo and run it locally like so:

git clone https://github.com/garethnunns/cli-content-tracker.git
cd cli-content-tracker
npm install
npm link
tracker

You can always npm unlink this later.

Very welcome to pull requests!

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
Metadata.js		Metadata.js
README.md		README.md
Tracker.js		Tracker.js
index.js		index.js
logger.js		logger.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLI Content Tracker to AirTable

Quick Start

AirTable Setup

Manual Base Creation

Folders Table

Files Table

File Table with Metadata

Field Notes

Command Line Options

Config Option: -c, --config

Dry-run Option: -d, --dry-run

Logging Option: -l, --logging

No Confirm Option: -nc, --no-confirm

No Delete Option: -nd, --no-delete

Wipe Cache: -w, --wipe-cache

Config File

config.settings

config.settings.files

config.settings.files.rootPath

config.settings.files.dirs

config.settings.files.frequency

config.settings.files.rules

config.settings.files.mediaMetadata

config.settings.files.limitToFirstFile

config.settings.files.concurrency

config.settings.airtable

config.settings.airtable.api

config.settings.airtable.base

config.settings.airtable.foldersID / filesID

config.settings.airtable.view

Development

About

Releases 29

Languages

garethnunns/cli-content-tracker

Folders and files

Latest commit

History

Repository files navigation

CLI Content Tracker to AirTable

Quick Start

AirTable Setup

Manual Base Creation

Folders Table

Files Table

File Table with Metadata

Field Notes

Command Line Options

Config Option: -c, --config

Dry-run Option: -d, --dry-run

Logging Option: -l, --logging

No Confirm Option: -nc, --no-confirm

No Delete Option: -nd, --no-delete

Wipe Cache: -w, --wipe-cache

Config File

config.settings

config.settings.files

config.settings.files.rootPath

config.settings.files.dirs

config.settings.files.frequency

config.settings.files.rules

config.settings.files.mediaMetadata

config.settings.files.limitToFirstFile

config.settings.files.concurrency

config.settings.airtable

config.settings.airtable.api

config.settings.airtable.base

config.settings.airtable.foldersID / filesID

config.settings.airtable.view

Development

About

Topics

Resources

Stars

Watchers

Forks

Releases 29

Languages