FrameworkLauncher (or Launcher for short) is built to enable running Large-Scale Long-Running Services inside YARN Containers without making changes to the Services themselves. It also supports Batch Jobs, such as TensorFlow, CNTK, etc.
-
High Availability
- All Launcher and Hadoop components are Recoverable and Work Preserving. So, User Services is by designed No Down Time, i.e. always uninterrupted when our components shutdown, crash, upgrade, or even any kinds of outage for a long time.
- Launcher can tolerate many unexpected errors and has well defined Failure Model, such as dependent components shutdown, machine error, network error, configuration error, environment error, corrupted internal data, etc.
- User Services can be ensured to Retry on Transient Failures, Migrate to another Node per User's Request, etc.
-
High Usability
- No User code changes needed to run the existing executable inside Container. User only need to setup the FrameworkDescription in Json format.
- Idempotent RestAPI is supported.
- Work Preserving FrameworkDescription Update, such as change TaskNumber, add TaskRole on the fly.
- Migrate running Task per User's Request
- Override default ApplicationProgress per User's Request
-
Services and Batch Jobs Requirements
- Gpu Scheduling: Dynamic Topology-Aware Gpu Allocation
- Port Scheduling: Static or Dynamic Port Allocation
- Gang Scheduling: Gang Allocation: Start Services in an all-or-nothing fashion
- Antiaffinity Scheduling: Antiaffinity Allocation: Start Services on different Nodes
- Versioned Service Deployment
- ServiceDiscovery
- ApplicationCompletionPolicy
- Framework Tree Management: DeleteOnParentDeleted, StopOnParentStopped
- DataPartition
Compile-time dependencies:
- Apache Maven
- JDK 1.8+
Run-time dependencies:
- Hadoop 2.9.0 with YARN-7481 is required to support Gpu Scheduling and Port Scheduling, if you do not need them, any Hadoop 2.7+ is fine.
- Apache Zookeeper
Launcher Distribution is built into folder .\dist.
Windows cmd line:
.\build.bat
GNU/Linux cmd line:
./build.sh
Launcher Distribution is required before Start Launcher Service.
Windows cmd line:
.\dist\start.bat
GNU/Linux cmd line:
./dist/start.sh
See User Manual to learn how to use Launcher Service to Launch Framework.