Skip to content

Latest commit

 

History

History
127 lines (109 loc) · 7.24 KB

DEPLOY.md

File metadata and controls

127 lines (109 loc) · 7.24 KB

Introduction

This file covers how to deploy the analysis framework in production. The detail is explained per analysis, including how to run, monitor and terminate, as well as some other instructions, such as the restrictions on machines, network etc.

All the analysis except for install/dynamic analysis runs celery workers inside docker containers. Since we want to enable sandobox and system call isolation for install/dynamic analysis, we run each package in a separate container, and celery workers sits natively and outside docker containers. This brings high overhead to the filesystem and docker daemon.

Description

  • Dependency analysis

    • input: list of packages from package managers (generated by crawl command)
    • output: the metadata and dependency information for all packages
    • desc: this analysis queries registries to get metadata and install them to get list of dependencies. do the following to deploy this analysis.
      • pull the image
        • sudo docker pull malossscan/maloss
      • for all the machines, create maloss/main/config from maloss/main/config.tmpl and customize it
        • comment out TRACING
        • customize CELERY_BROKER_URL and METADATA_DIR
      • for all the machines, create and customize maloss/main/.env
        • PARALLELISM can be set to 2 * num_of_cpus
        • METADATA_DIR should be the same as config
      • run rabbitmq on master node
        • cd main && sudo docker-compose --compatibility -f docker-compose-master.yml up -d && cd ..
        • master node should have port 5672, 15672, 25671 open for rabbitmq
      • run maloss on worker nodes
        • cd main && sudo docker-compose --compatibility up -d && cd ..
      • add jobs to rabbitmq on master node
        • cd main && python3 detector.py get_dep -i ../data/npmjs.with_stats.csv --native
  • Install analysis

    • input: list of packages from package managers (generated by crawl command)
    • output: the sysdig tracing files during installation process
    • desc: this analysis installs packages and uses sysdig to capture invoked system calls.
      • pull the image
        • sudo docker pull malossscan/maloss
      • for all the machines, create maloss/main/config from maloss/main/config.tmpl and customize it
        • customize CELERY_BROKER_URL and METADATA_DIR
      • for all the machines, customize TRACEPATH in maloss/sysdig/.env to different pacakge managers
      • run rabbitmq on master node
        • cd main && sudo docker-compose --compatibility -f docker-compose-master.yml up -d && cd ..
        • master node should have port 5672, 15672, 25671 open for rabbitmq
      • run scheduler.py to start install jobs on worker nodes
        • sudo python3 scheduler.py start -p 7 -i 30 -s -u $USER
      • add job to rabbitmq on master node
        • cd main && python3 detector.py install -i ../data/npmjs.with_stats.csv
      • stop scheduler.py and cleanup when needed
        • sudo python3 scheduler.py stop -s -u $USER
  • Dynamic analysis

    • similar to Install analysis, except the command
      • python3 detector.py dynamic -i ../data/npmjs.with_stats.csv
  • AstfilterLocal analysis

    • similar to Dependency analysis, except that RESULT_DIR should be set and the command is
      • python3 detector.py astfilter_local -i ../data/npmjs.with_stats.csv
  • TaintLocal analysis

    • similar to Dependency analysis, except that RESULT_DIR should be set and the command is
      • python3 detector.py taint_local
  • Astfilter analysis

    • input: dependency graph of packages from package managers
    • desc:
      • init docker swarm cluster on master node and copy/log the swarm join command
        • sudo docker swarm init
        • master node should have port 5555 open for flower, 8080 open for webserver
        • master node should have TCP port 2377, TCP and UDP port 7946, UDP port 4789 open for docker swarm
      • join the docker swarm cluster from all other nodes
        • docker swarm join --token $token $ip:$port
        • worker nodes should have TCP port 2377, TCP and UDP port 7946, UDP port 4789 open for docker swarm
      • create and customize maloss/airflow/.env
        • AIRFLOW__WEBSERVER__BASE_URL should point to webserver on master node
        • customize AIRFLOW_DAGS, METADATA_FOLDER and RESULT_FOLDER
      • deploy the analysis by the following command
        • sudo bash -c "docker stack deploy --with-registry-auth -c <(docker-compose --compatibility -f docker-compose-CeleryExecutor.yml config) rubygems_astfilter"
  • Compare Ast analysis

    • similar to dependency analysis, except that the command to add job is
      • python3 detector.py compare -i ../data/rubygems.with_stats.popular.csv
  • Static analysis

    • similar to Astfilter analysis, except the environments and service name

Scenarios

  • Add analysis for a new language
    • Add a file $LANG_analyzer.py in src/static_proxy, inherit and implement static_base.StaticAnalyzer
    • Add analyzer script for this language in astgen, astfilter, taint, static jobs
  • Add analysis for a package manager
    • Add a file $PACKAGE_MANAGER.py in src/pm_proxy, inherit and implement pm_base.PackageManagerProxy
    • Add crawler for this package manager in crawl job
    • Add analyzer script for this package manager in get_dep, build_dep, split_graph, install, dynamic
    • Steps
      • Run package crawler crawl to collect all packages of this package manager
      • Run dependency analysis get_dep to get dependency for all pacakges
      • Build the dep graph for packages using build_dep and split_graph
      • Run the necessary analyses, such as install, dynamic, astfilter, static
  • Add analysis for popular packages in a package manager
    • Add selection of popular packages in select_pkg
    • Steps
      • Build the dep graph for popular packages using split_graph with flag seedfile to separate subgraph from whole dep graph
      • Run the necessary analyses, such as install, dynamic, astfilter, static
  • Add analysis for versions of popular packages in a package manager
    • Add collection of versions in get_versions
    • Steps
      • Run get_versions to get major versions of popular packages
      • Run dependency analysis get_dep to get dependency for all versions of packages
      • Build the dep graph for popular packages using build_dep and split_graph
      • Run the necessary analyses, such as install, dynamic, astfilter, static, compare
  • Add metadata analysis for a package manager
    • Add author information retrieval in get_author
    • Add author package graph building in build_author
    • Optionally add hash comparison of same packages across different package managers in compare_hash
    • Steps
      • Run edit_dist to get packages that typosquats popular packages
      • Run get_author to fetch author information
      • Run build_author to get author package relationship
      • Run compare_hash to identify packages with different API usage among different package managers