Skip to content

akhilyerrapragadaa/Distributed-Robust-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distributed-Robust-Learning

This project contains all-reduce implementation using scala and kompics.

Overview

The project has 1 part:

  • A server library that is responsible for creating workers (servers)

The bootstrapping procedure for the servers, requires one server to be marked as a bootstrap server, which the other servers (bootstrap clients) check in with, before the system starts up. The bootstrap server also assigns predecessors and successors for each node including itself.

Building

Start sbt with

sbt

In the sbt REPL build the project with

compile

Before running the project you need to create assembly files for the server:

server/assembly

Running

Bootstrap Server Node

To run a bootstrap server node execute:

java -jar server/target/scala-2.13/server.jar -p 45678

This will start the bootstrap server on localhost:45678.

Normal Server Node

After you started a bootstrap server on <bsip>:<bsport>, again from the server directory execute:

java -jar server/target/scala-2.13/server.jar -p 45679 -s <bsip>:<bsport>

This will start the normal server on localhost:45679, and ask it to connect to the bootstrap server at <bsip>:<bsport>. Make sure you start every node on a different port if they are all running directly on the local machine.

By default you need 15 nodes (including the bootstrap server), before the system will actually generate assignments (predecessor and successor) and begins all-reduce implementation by triggering the deeplearning model in each worker.

The number can be changed in the configuration file (cf. Kompics docs for background on Kompics configurations).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published