Skip to content

garethnunns/cli-content-tracker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CLI Content Tracker to AirTable

This project came about as a means to track video content on concert shows, but is written generically. It recursively scans folders, which can be mapped network drives or streamed cloud storage like Dropbox, Drive, LucidLink or Suite and stores a list of the files & folders on AirTable.

Quick Start

npx @garethnunns/cli-content-tracker

Then you just need to edit the config.json file that was created where the command was executed and re-run with the config command line option tracker -c.

Will require Node to be installed, if on Mac probably install with Homebrew:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install node

AirTable Setup

You can duplicate this base or create your own detailed below. See the section on configuring the settings for AirTable on how to link this to the script.

Manual Base Creation

You'll need a base with the two following tables for folders & files.

Folders Table

Field Description Type
_path Unix path to this folder in the project Single line text
_fullPath Will store the full path of the folder Single line text
_size Size of the folder in bytes Number - 0 decimal places
_ctime Creation time of the folder Date - include time, Use same time for all collaborators
_mtime Last modified time of the folder Date - include time, Use same time for all collaborators
_items Number of items in the folder Number - 0 decimal places
_parent Parent folder of this folder Link to record - this table

Files Table

Only use this smaller table when the mediaMetadata setting is disabled.

Field Description Type
_path Unix path to this folder in the project Single line text
_fullPath Will store the full path of the folder Single line text
_size Size of the file in bytes Number - 0 decimal places
_ctime Creation time of the file Date - include time, Use same time for all collaborators
_mtime Last modified time of the file Date - include time, Use same time for all collaborators
_parent Parent folder of this file Link to record - folders table

File Table with Metadata

All the fields of the in the files table plus:

Field Description Type
_duration Duration of the media Number - 2 decimal places
_video Whether the media has a video stream checkbox
_videoStill Whether the file is a image checkbox
_videoCodec Video codec of the media Single line text
_videoWidth Width of the media Number - 0 decimal places
_videoHeight Height of the media Number - 0 decimal places
_videoFormat Pixel format of the media Single line text
_videoAlpha Whether the media has an alpha channel checkbox
_videoFPS Frames Per Second of the video, 0 for stills Number - 2 decimal places
_videoBitRate Bit rate of the video (b) Number - 0 decimal places
_audio Whether the media has a audio stream checkbox
_audioCodec Audio codec of the media Single line text
_audioSampleRate Sample rate of the audio (KHz) Number - 0 decimal places
_audioChannels Number of channels of audio Number - 0 decimal places
_audioBitRate Bit rate of the audio (b) Number - 0 decimal places

Field Notes

The fields are named like this so it's clear which fields are entered via the tracker, you can easily do other fields that reference these values, but do not edit the values in these columns or they will just get overwritten/deleted.

If all the records are being updated every time, it is likely because of a mismatch on the created/modified time - ensure the Use same time for all collaborators option is selected on these fields and you can try setting the timezone to match the computer which is running the script. Despite sending them as UTC strings, AirTable is a bit funky on how it handles the dates.

Command Line Options

Get the latest options by running tracker -h which will return something like:

Usage: tracker [options]

CLI File Content Tracking

Options:
  -V, --version          output the version number
  -c, --config <path>    config file path (if none specified template will be created)
  -d, --dry-run          will do everything apart from updating AirTable
  -l, --logging <level>  set the logging level
  -nd, --no-delete       never remove records from AirTable
  -w, --wipe-cache       clear the metadata cache
  -h, --help             display help for command

Config Option: -c, --config

When you run tracker for the first time without a path option it will generate the config.json file which you will then need to update. Once updated run the command again with tracker -c config.json, for more info on the contents of the config file, check the section on config files.

Dry-run Option: -d, --dry-run

Inherently, it's nice to know this isn't going to wreak havoc on your AirTable, so if you run tracker -c config.json -d it will show you what it's going to do but stop short of modifying the AirTable. It will still run on at the frequency you specify. You will need to run with a (#logging-option--l---logging) logging level of verbose to see all the details.

Logging Option: -l, --logging

Specify one of the following levels of logging:

  1. error
  2. warn
  3. info
  4. http (default)
  5. verbose
  6. debug
  7. silly

No Confirm Option: -nc, --no-confirm

By default the script will stop and confirm before it removes more than 10% of the table (if you don't respond in under a minute then it will assume not), but when run with -nc it will always delete the records in AirTable.

No Delete Option: -nd, --no-delete

This will perform still insert and update records in AirTable, however will not remove them if they have been deleted in the local file system - handy if you are freeing up space on your local disk by deleting media but still want the reference in AirTable.

Wipe Cache: -w, --wipe-cache

There's a local cache built up in ./db of the metadata of files so they don't have to keep getting queried via FFMpeg which is a tad computationally expensive, but if you need to rescan all files run with this option. Files will already be automatically scanned if they change size or the modified time changed.

Config File

Create the default config as described above, which will generate something like this:

{
  "settings": {
    "files": {
      "rootPath": "/Users/user/Documents/cli-content-tracker",
      "dirs": [
        "/Users/user/Documents/cli-content-tracker"
      ],
      "frequency": 30,
      "rules": {
        "dirs": {
          "includes": [],
          "excludes": []
        },
        "files": {
          "includes": [],
          "excludes": [
            "/\\.DS_Store$/"
          ]
        }
      },
      "mediaMetadata": true,
      "limitToFirstFile": false,
      "concurrency": 100
    },
    "airtable": {
      "api": "",
      "base": "",
      "foldersID": "",
      "filesID": "",
      "view": "API"
    }
  }
}

In general, leave all the parameters in the JSON file, there is some error handling if they're not present but probably for the best to leave everything in there.

config.settings

config.settings.files

This section relates to all the local file scanning. The script in general builds up a list of all the files and folders and here you get a bit of control over that.

config.settings.files.rootPath

In an effort to make the script less dependent on exactly where the files are stored on your computer/where the folder is mounted, this string will be removed from the start of file paths.

"rootPath": "/Volumes/Suite/Project"

config.settings.files.dirs

Array of irectory to recursively search through, e.g.

"dirs": [ "/Volumes/Suite/Project/Item 1", "/Volumes/Suite/Project/Item 2" ]

config.settings.files.frequency

How often the directory is scanned (in seconds), e.g. if you wanted to scan it every minute:

"frequency": 60

You can also set this to 0 and the script will only run once - this is intended if you want to automate this as part of a cron job.

config.settings.files.rules

So I'll be honest, this is the only slightly faffy bit... but it definitely beats entering them as command-line arguments that was my first plan. The thought process here is you're filtering the paths of the files and folders will get included in the tables which are pushed to AirTable.

Whatever you specify in these fields the script will still have to traverse all of the folders in the directory - if you have specified a pattern like /.*\/Delivery\/.*/ which would match any folder with /Delivery/ in the path, by the nature of the task you're still going to have to search through every folder.

Now the bit that makes it faffy is you have to stringify JS regex patterns, which usually just means escaping the slashes - a handy one to make use of the dry run option. Note you're matching the entire path of the file/folder in both dirs & files.

For example, in the example below we're limiting the folders which are stored on AirTable to only be ones that include /05 Delivery/ somewhere in the path, then only including specific image sequence TIFFs:

"rules": {
  "dirs": {
    "includes": [
      "/.*\/05 Delivery\/.*/"
    ],
    "excludes": []
  },
  "files": {
    "includes": [
      "/[_.]v\\d{2,3}\\.tif/",
      "/[_.]0{4,5}\\.tif/"
    ],
    "excludes": []
  }
}

config.settings.files.mediaMetadata

Whether you want to get all the metadata for the media, which will scrape all the fields in the (files with metadata table)[#file-table-with-metadata], e.g.

"mediaMetadata": true

config.settings.files.limitToFirstFile

This works in conjunction with the file rules but limits it to only the first file in each folder, with the intention of finding the first image in sequences, e.g. this just gets the first TIFF file in the deliveries folder:

"rules": {
    "dirs": {
      "includes": [],
      "excludes": []
    },
    "files": {
      "includes": [
        "/05_Delivery/.*\\.tif/"
      ],
      "excludes": []
    }
  },
  "mediaMetadata": true,
  "limitToFirstFile": true

config.settings.files.concurrency

This limits the number of concurrent file operations, introduced to throttle the load on Suite. If you have a computer with enough RAM and the files are static then this value can be larger, however if it's a weak computer and the files are being streamed you might want to limit this to < 10, e.g.

"concurrency": 1

config.settings.airtable

Once you've setup your AirTable, please configure the following settings:

config.settings.airtable.api

Get your API key from the AirTable Tokens page, it will need the following permissions:

  • data.records:read
  • data.records:write

Plus access to the workspace where your base is located.

config.settings.airtable.base

You'll need to go to the API page for your base and get the base ID and enter this here, e.g.

"base": "app**************"

config.settings.airtable.foldersID / filesID

Technically you can just put the folders table name here, but on the same page as the base ID you can get the IDs for the tables, then if the table names get updated later it won't affect the script, e.g.

"foldersID": "tbl**************",
"filesID": "tbl**************"

config.settings.airtable.view

This is the view the script compares the local file list against - e.g. you could technically store other items in the table and filter them out in this view; you could have multiple scripts all writing into the same table and filter them out per view (this might be better achieved by writing to multiple tables).

This defaults to a view called API:

"view": "API"

Development

Clone the repo and run it locally like so:

git clone https://github.com/garethnunns/cli-content-tracker.git
cd cli-content-tracker
npm install
npm link
tracker

You can always npm unlink this later.

Very welcome to pull requests!