forked from booksbyus/zguide
-
Notifications
You must be signed in to change notification settings - Fork 0
/
notes.txt
331 lines (217 loc) · 12 KB
/
notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
Notes for The Guide
++ Heartbeating
You'll often hit the problem of knowing whether a peer is alive or not. This is not specifically a 0MQ issue. TCP has a long timeout (30 minutes or so) that means that it's impossible sometimes to know whether a peer has died, been disconnected, or gone on a long weekend to Prague with a case of vodka and two redheads.
Heartbeating seems simple at first but isn't. There are quite a few gotchas, but happily I've already made all the mistakes, so hopefully you don't need to.
Let's start with the use cases, i.e. the reasons //why// you might want to do heartbeating in applications:
* You want to know that peers have died and gone away, so you can clean up resources and keep your applications free of entropy build-up, or as we say in the trade, "why the heck does my process keep consuming more and more memory until the machine crashes?" This affects specific types of 0MQ applications, primarily ROUTER-based brokers that track the identities of clients and servers.
* You want to know the difference between the ominous and golden varieties of silence. When my wife doesn't call me, that's good silence. But when I can't reach her, that's bad silence.
* You want to keep the underlying TCP connections alive. Literally, some networks get bored and disconnect if there's no activity for too long. When you use heartbeating to keep a connection alive, this is more often called "keep-alive".
Roughly, there are two types of heartbeating. There's one-way, and round-trip. You would use one-way heartbeating in a PUB-SUB or PUSH-PULL pattern, to tell downstream nodes that the upstream node is still alive and kicking. It's easy in PUB-SUB, simply send out an empty "ping" message every few seconds if there's no other data being published. It's harder with PUSH-PULL since the PUSH socket needs to know how many PULL sockets there are. A typical workaround is to create a separate PUB-SUB flow just for heartbeating.
Round-trip heartbeating consists of PING-PONG messages (i.e. a PING from one peer, and a PONG back in reply). PING-PONGs can be asynchronous, of course. Why bother with a PONG reply message? Well, the first thing you'll notice when you do naive heartbeating is that when a peer really does stop, and then restarts, it can find itself receiving a whole mass of PINGs from other peers. This is at least annoying, and sometimes worse.
0MQ will in most cases queue PINGs and deliver them when the (or a) peer reconnects. So you should not send more than one or two PINGs without getting something back from the other peer.
- one way
- bidirectional
- pubsub ordering
- N-to-N ordering guarantees
- single-stream using a broker
- collection of patterns
Least Recently Used
Asynchronous Client-Server
Suicidal Snail
Lazy Pirate
Simple Pirate
Paranoid Pirate
Clone
- how to make an error logging console
- how to make a TCP-to-0MQ bridge, with examples
http://lists.zeromq.org/pipermail/zeromq-dev/2011-March/010186.html
- FD_EVENT integration
- router trees
(federation)
- cross-over router
- send to, send from
- flips addresses, gives you reply model
// If socket is a ROUTER, get identity first
int socket_type;
size_t type_size = sizeof (socket_type);
zmq_getsockopt (socket, ZMQ_TYPE, &socket_type, &type_size);
if (socket_type == ZMQ_ROUTER) {
zmq_msg_t mesg;
zmq_msg_init (&mesg);
zmq_recv (socket, &mesg, 0);
}
- explain why callbacks are a pita
- what thread do they run in?
- how do you connect them to main app thread
- how do you wait, do you sleep, call something, etc.?
- integrating 0MQ into TCP loops: http://lists.zeromq.org/pipermail/zeromq-dev/2011-March/010186.html
Then we can use this to start building some reusable pieces:
- timer device
- name service
- custom load-balancing
- custom publish-subscribe
- stateful publish-subscribe
- reliable request-reply
- logging device
+++ Presence Detection
- peers come and go
- if we want to route to them explicitly, we need to know if they are present
- heartbeating, liveness, etc.
- fresh data, failover, etc.
- purging old identities from routing tables
- example of eight robots and console
- robots come and go...
+++ A ZeroMQ Name Service
- name service
translate logical name into connect or bind string
service runs locally, connect via ipc://zns
gets name updates asynchronously from central server
also local zns lookup file
using zpl syntax
pubsub state / refresh example
how to map names?
- XXX -> tcp://lo:5050 if I'm on server 1
- XXX -> tcp://somename:5050
-> does ZMQ do host lookup? Doesn't seem like it...
-> resolve host...
+++ Pipelines
- does PUSH block if there are no PULL sockets ready?
- how do maintain a queue?
+++ File Transfer
example of file transfer
+++ Generating Identities
+++ Setting Queue Limits
- setting queue limits prevents nodes from overflowing memory
- by default 0MQ does not set any limits
- example of application that will overflow memory
- publish to subscribe socket but don't read
- now fix this by setting queue limit, ZMQ_HWM
- actual behaviour with automagic LWM
- how this works on different socket types
- 'exception' on each socket type, from man page
- adding capacity for disk offload, ZMQ_SWAP
- creating persistent queues using identity
- example of HWM 1 plus massive SWAP
+++ Reliable Request-Reply
We'll create a reliable request-reply application that uses XREQ and XREP and a simple resend mechanism. When this works between two peers we'll show how it scales across a request-reply broker to effectively create edge-to-edge reliability. We'll also open up the message format for request-reply and explore identities in horrible detail.
+++ Configuration Distribution
We'll look at how to dynamically configure a network of devices using a central configuration broker.
+++ Logging Subsystem
many applications
many subscribers
broker in the middle
persistent logfiles
replay via subscribe socket
+++ Failover and Recovery
We'll look at how to handle crashes in a distributed architecture. For each pattern there is an ideal architecture, and we'll explore each of these with worked examples.
+++ Encrypted Publish-Subscribe
We'll look at how to secure pubsub data against snooping. The actual technique involves out-of-band exchange of keys, symmetric encryption, and a broker that helps the thing along. Hopefully all fairly easy to make, as usual, using 0MQ.
+++ Building a Multicast Bus
We'll now look at how the pgm: and epgm: protocols work. With pgm:, the network switch basically acts as a hardware FORWARDER device.
++++ Customized Publish-Subscribe
- use identity to route message explicitly to A or B
- not using PUBSUB at all but XREP/????
- limitations: no multicast, only TCP
- how to scale with devices...
When a client activates, it chooses a random port that is not in use and creates a SUB socket listening for all traffic on it. The client then sends a message via REQ to the publisher containing the port that it is listening on. The publisher receives this message, acknowledges it, and creates a new pub socket specific to that client. All published events specific to this client go out that socket.
When the client deactivates, it sends a message to the publisher with the port to deactivate and close.
You end up creating a lot more PUB sockets on your server end and doing all of the filtering at the server. This sounds acceptable to you.
I didn't need to do this to avoid network bandwidth bottlenecks; I created this to enforce some security and entitlements.
+++ A Clock Device
We'll look at various ways of building a timer into a network. A clock device sends out a signal (a message) at more or less precise intervals so that other nodes can use these signals for internal timing.
+++ Serializing Data
Examples of using Protocol Buffers and other options.
[!--
- ipc://name
- connects two processes on a single box
- supports all messaging patterns
- typical use case is for multithreading apps
- runs on Unix domain sockets (not available on Windows, OpenVMS)
- permissions issues:
> Since I want to work under /tmp, this all had to be done programatically. My
> server now mkdir -p's a socket subdirectory and chmod 777's it. The server
> creates and binds the socket in that folder, and then chmod 777's it. The
> server must be run as root (which is fine for my project luckily). If it is
> run as a normal user, the client's still timeout.
- tcp://ipaddress:port
- bind to address:port
- bind to *:5555
- localhost
- also bind to interface: lo:port, eth1:port, etc.
- Linux: eth1, eth2, eth3
- Mac OS X: en1, en2, en3
- Solaris: e1000g, etc.
- connect to remote address: host:port
- pgm://address;multicastgroup:port
- address can be interface name
- requires decent hardware support
- means enterprise level switches with IGNP snooping
- some routers also support PGM
- runs over IP, requires root privileges
- more standard
- rate-limited protocol, sender must define bandwidth
- pgm is currently broken
- epgm://address;multicastgroup:port
- encapsulated in UDP packets
- requires decent hardware support
- does not require root access
- non-standard to pgm
- add peer example
- exclusive lock on peer
- for owning other party
- solve reverse connection
- e.g. to cross firewall
- you need to add a bind to allow the client to accept a connection
- could be usecase for EXCLUSIVE socket
XREQ is like PUSH+PULL, XREP is like PUSH+PULL+routing
--]
* How do we tunnel 0MQ connections over 0MQ connections?
- e.g. to get 3-4 ports into DMZ via one single port
- two devices, one reads, one writes
- over other protocols: HTTP, SSH, etc...?
- acts as split device, ...
I highly recommend that you try out the simpler topology and *verifying* that 0mq cannot keep up with your message rates when publishing all data to all clients. With smart topics the client can reject the data *very* fast. A 1 GB or 10 Gb switched network can also move quite a bit of data without a lot of blocking behavior in the hardware too. You may be able to save yourself a lot of unnecessary work, so just try it.
- explain in
- debugging message flow
- using a queue device that prints out message parts
- debugging versions of devices...
- heartbeating
- set HWM to 1 message only
[!--
[!--
+++ The Wire-Level Protocol
- writing a thin client in JavaScript
+++ Building a Language Binding
+++ Tuning 0MQ
High performance multiple i/o threads
- for HP applications, several threads
- then, set affinity on sockets
+++ Contributing to 0MQ
+++ What's Missing from 0MQ
- handling crashing peers
- non-blocking send/recv
EAGAIN
- reliable pub
- pub connects to all subs
- subs send request to pub
- reliable messaging over xreq / xrep
- how to do timer events
- there is no way to disconnect; second connection creates two
- need to destroy the socket
- how to send multipart job and execute / cancel
Here is, I think, how to build an authenticated pubsub service using ØMQ. The design handles authentication, arbitrary routing criteria, and flushing of dead client connections.
You will use ZMQ_XREQ at the client side and ZMQ_XREP at the service side. There is a simple protocol between client and server that is loosely modelled on HTTP:
* Client sends messages containing: identity (ØMQ adds this automatically), authentication credentials, and subscription criteria.
* Server sends messages containing: identity (automatic), error/success code, and content.
The server has two main data structures:
* A connection table that stores authenticated identities and timestamps. Each identity corresponds to a client connection.
* A routing table that maps message keys (e.g. topic value) to identities. There are techniques for doing [http://www.zeromq.org/whitepapers:message-matching high speed message matching].
0MQ Quickstarter
- building & installing
- performance for language
- basic examples
- socket types
- transports
- main patterns
- problem solving
-> translated into different programming languages