Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rdma: refactor #14

Merged
merged 81 commits into from
Jan 1, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
d8f5a80
add new abstraction for transport
ymjiang Dec 11, 2019
e125ad7
remove redundant PackMetaPB
bobzhuyb Dec 11, 2019
2d145c9
wip: refactoring
ymjiang Dec 11, 2019
a60d46d
Fix makefile and compilation error
bobzhuyb Dec 11, 2019
c06df60
wip: auto transport type
ymjiang Dec 12, 2019
ee533df
wip: simplify SendMsg
ymjiang Dec 12, 2019
f300a2e
Use raw serializer instead of pb
bobzhuyb Dec 12, 2019
44e38fa
bzero after init serialization buffer
bobzhuyb Dec 12, 2019
98de771
reduce copies for PackMeta
bobzhuyb Dec 12, 2019
dbf15e1
wip: remove transport.h
ymjiang Dec 12, 2019
1ffb37e
wip: split the files
ymjiang Dec 13, 2019
38a245f
nit
ymjiang Dec 13, 2019
c97b9eb
wip: finish Transport base class
ymjiang Dec 13, 2019
b0fccf4
refactor transport to pure virtual class
bobzhuyb Dec 13, 2019
7f6d318
resolve conflict with better_rdma branch
bobzhuyb Dec 13, 2019
af67581
Merge pull request #15 from bytedance/remove_pb
bobzhuyb Dec 13, 2019
d871b30
refactored most parts of IPCTransport
bobzhuyb Dec 13, 2019
0d8704c
rename transport.h to rdma_transport.h
bobzhuyb Dec 13, 2019
60895b9
add RecvPushResponse and RecvPullRequest
ymjiang Dec 15, 2019
9c0fa31
add RecvPushRequest and RecvPullResponse
ymjiang Dec 15, 2019
2c6e090
can compile
ymjiang Dec 15, 2019
834c903
wip: improve compile
ymjiang Dec 15, 2019
174d59a
fix GetTransport nullptr problem and add check
ymjiang Dec 15, 2019
84ee377
fix connection & attempt to fallback original functionalities
ymjiang Dec 15, 2019
ed1f4ec
wip: do not free memory in mempool
ymjiang Dec 15, 2019
a6d5cda
add a max bound for meta size
ymjiang Dec 16, 2019
5e78f30
basically finish RDMATransport
ymjiang Dec 16, 2019
3aca8d4
fix msg_buf size
ymjiang Dec 16, 2019
4ac2137
fix Rendezvous recv
ymjiang Dec 16, 2019
5513276
release write context
ymjiang Dec 17, 2019
696c7fa
server: fix receiving push request
ymjiang Dec 17, 2019
59bdad2
fix storing worker tensor address
ymjiang Dec 17, 2019
3457b0b
fix push response redundant rendez
ymjiang Dec 17, 2019
b4b4487
can run 1v1
ymjiang Dec 17, 2019
59b85af
add push/pull tests
ymjiang Dec 17, 2019
b1ec799
use two separate mempools
ymjiang Dec 18, 2019
c7e9bb1
nit: a little improvement
ymjiang Dec 18, 2019
c1b45b9
tests: use page aligned memory
ymjiang Dec 18, 2019
1b5b01c
nit: some improvement
ymjiang Dec 18, 2019
3c85886
mempool->Alloc returns page aligned memory
ymjiang Dec 18, 2019
137d7c8
tests: all keys/lens use page aligned mem
ymjiang Dec 18, 2019
a14acfe
push request: only write one sge
ymjiang Dec 18, 2019
3a9f237
Split one write into two writes
bobzhuyb Dec 18, 2019
20b3782
server receives push data with aligned memory
ymjiang Dec 18, 2019
50c1bf4
tests: clean testcase and set default value
ymjiang Dec 19, 2019
3f8ee6f
quick fix
ymjiang Dec 19, 2019
fd391bd
remove key_addr_map and key_len_map
ymjiang Dec 19, 2019
be13a5d
some cleaning
ymjiang Dec 19, 2019
b0830ca
can run 2v2
ymjiang Dec 19, 2019
ab198cd
finish IPCTransport
ymjiang Dec 19, 2019
30bf1a3
tests: add ipc testcase
ymjiang Dec 19, 2019
dacde24
split testcases into two files
ymjiang Dec 19, 2019
51a9ac1
can run ipc 2v2
ymjiang Dec 19, 2019
a5bdc8e
clean some compile warnings
ymjiang Dec 19, 2019
b6109e5
keep the same partition size with worker
ymjiang Dec 20, 2019
c01535f
default byteps_local_size=1
ymjiang Dec 20, 2019
644f103
kMaxMetaBound: 4MB->4KB
ymjiang Dec 20, 2019
1efd8ec
kMaxMetaBound: 4KB->2KB
ymjiang Dec 20, 2019
1752620
Allocate a whole page for meta
bobzhuyb Dec 20, 2019
e4296da
pull response: do not add msg_buf->mrs
ymjiang Dec 21, 2019
397fee7
[debug] ipc: sync copy
ymjiang Dec 21, 2019
ef0bc3b
fix transport type for two directions
ymjiang Dec 22, 2019
fe42f94
fix async copy
ymjiang Dec 23, 2019
e30a44e
minimize map_mu_ lock coverage
ymjiang Dec 23, 2019
e1286fd
non-functional improvement
ymjiang Dec 23, 2019
f2dcad1
enforce all num_sge to 1
ymjiang Dec 24, 2019
73fd1ca
remove runtime log
ymjiang Dec 24, 2019
e7ad975
fix mr_list range
ymjiang Dec 26, 2019
8c35b87
reuse local msg_buf address
ymjiang Dec 27, 2019
6c7f80b
add memory allocator
ymjiang Dec 27, 2019
565bd45
improve receiver memory alignment
ymjiang Dec 27, 2019
3a7437c
quick fix
ymjiang Dec 27, 2019
93d4495
fix 1v1 hang
ymjiang Dec 27, 2019
84279d8
add debug mode for test case
ymjiang Dec 27, 2019
add1d10
tests: disable sum by default
ymjiang Dec 27, 2019
d43f37d
cleaning: remove useless write ctx
ymjiang Dec 29, 2019
962a05c
simplify RDMAWriteWithImm
ymjiang Dec 29, 2019
e551a42
fix repeated ibv_reg_mr
ymjiang Dec 30, 2019
a702ca5
fix broadcast: send missed first message in rendez
ymjiang Dec 30, 2019
7d1ec49
log: add send/recv log
ymjiang Dec 30, 2019
f7a704a
fix log
ymjiang Dec 31, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 4 additions & 8 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@ endif

INCPATH = -I./src -I./include -I$(DEPS_PATH)/include
CFLAGS = -std=c++14 -msse2 -fPIC -O3 -ggdb -Wall -finline-functions $(INCPATH) $(ADD_CFLAGS)
LIBS = -pthread
LIBS = -pthread -lrt

ifeq ($(USE_RDMA), 1)
LIBS += -lrdmacm -libverbs -lrt
LIBS += -lrdmacm -libverbs
CFLAGS += -DDMLC_USE_RDMA
endif

Expand All @@ -38,25 +38,21 @@ include make/deps.mk

clean:
rm -rf build $(TEST) tests/*.d tests/*.dSYM
find src -name "*.pb.[ch]*" -delete

lint:
python tests/lint.py ps all include/ps src

ps: build/libps.a

OBJS = $(addprefix build/, customer.o postoffice.o van.o meta.pb.o)
OBJS = $(addprefix build/, customer.o postoffice.o van.o)
build/libps.a: $(OBJS)
ar crv $@ $(filter %.o, $?)

build/%.o: src/%.cc ${ZMQ} src/meta.pb.h
build/%.o: src/%.cc ${ZMQ}
@mkdir -p $(@D)
$(CXX) $(INCPATH) -std=c++0x -MM -MT build/$*.o $< >build/$*.d
$(CXX) $(CFLAGS) $(LIBS) -c $< -o $@

src/%.pb.cc src/%.pb.h : src/%.proto ${PROTOBUF}
$(PROTOC) --cpp_out=./src --proto_path=./src $<

-include build/*.d
-include build/*/*.d

Expand Down
9 changes: 4 additions & 5 deletions include/ps/internal/van.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@
#include "ps/internal/message.h"
namespace ps {
class Resender;
class PBMeta;
/**
* \brief Van sends messages to remote nodes
*
Expand Down Expand Up @@ -107,14 +106,14 @@ class Van {
virtual int SendMsg(Message &msg) = 0;

/**
* \brief pack meta into a string
* \brief get the length of pack meta
*/
void PackMeta(const Meta &meta, char **meta_buf, int *buf_size);
int GetPackMetaLen(const Meta &meta);

/**
* \brief pack meta into protobuf
* \brief pack meta into a string
*/
void PackMetaPB(const Meta &meta, PBMeta *pb);
void PackMeta(const Meta &meta, char **meta_buf, int *buf_size);

/**
* \brief unpack meta from a string
Expand Down
11 changes: 0 additions & 11 deletions make/deps.mk
Original file line number Diff line number Diff line change
@@ -1,21 +1,10 @@
# Install dependencies

URL1=https://raw.githubusercontent.com/mli/deps/master/build
URL2=https://github.com/google/protobuf/releases/download/v3.5.1
ifndef WGET
WGET = wget
endif

# protobuf
PROTOBUF = ${DEPS_PATH}/include/google/protobuf/message.h
${PROTOBUF}:
$(eval FILE=protobuf-cpp-3.5.1.tar.gz)
$(eval DIR=protobuf-3.5.1)
rm -rf $(FILE) $(DIR)
$(WGET) $(URL2)/$(FILE) && tar --no-same-owner -zxf $(FILE)
cd $(DIR) && export CFLAGS=-fPIC && export CXXFLAGS=-fPIC && ./configure -prefix=$(DEPS_PATH) && $(MAKE) && $(MAKE) install
rm -rf $(FILE) $(DIR)

# zmq
ZMQ = ${DEPS_PATH}/include/zmq.h

Expand Down
4 changes: 2 additions & 2 deletions make/ps.mk
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,5 @@ ifeq ($(USE_KEY32), 1)
ADD_CFLAGS += -DUSE_KEY32=1
endif

PS_LDFLAGS_SO = -L$(DEPS_PATH)/lib -lprotobuf-lite -lzmq
PS_LDFLAGS_A = $(addprefix $(DEPS_PATH)/lib/, libprotobuf-lite.a libzmq.a)
PS_LDFLAGS_SO = -L$(DEPS_PATH)/lib -lzmq
PS_LDFLAGS_A = $(addprefix $(DEPS_PATH)/lib/, libzmq.a)
76 changes: 76 additions & 0 deletions src/meta.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
/**
* Copyright (c) 2018-2019 Bytedance Inc.
* Author: [email protected] (Yibo Zhu)
*/
#ifndef PS_LITE_META_H_
#define PS_LITE_META_H_

#include<stdint.h>

namespace ps {

struct RawNode {
// the node role
int role;
// node id
int id;
// hostname or ip
char hostname[64];
// the port this node is binding
int port;
// whether this node is created by failover
bool is_recovery;
// the locally unique id of an customer
int customer_id;
};

// system control info
struct RawControl {
int cmd;
int node_size;
int barrier_group;
uint64_t msg_sig;
};

// mete information about a message
struct RawMeta {
// message.head
int head;
// message.body
int body_size;
// if set, then it is system control task. otherwise, it is for app
RawControl control;
// true: a request task
// false: the response task to the request task with the same *time*
bool request;
// the unique id of an application
int app_id;
// the timestamp of this message
int timestamp;
// data type of message.data[i]
int data_type_size;
// the locally unique id of an customer
int customer_id;
// whether or not a push message
bool push;
// whether or not it's for SimpleApp
bool simple_app;
// message.data_size
int data_size;
// message.key
uint64_t key;
// message.addr
uint64_t addr;
// the length of the message's value
int val_len;
// the option field
int option;

// body
// data_type
// node
};

} // namespace

#endif
64 changes: 0 additions & 64 deletions src/meta.proto

This file was deleted.

Loading