============================================ #opendaylight-clustering: clustering_hackers ============================================ Meeting started by moizer_ at 15:59:08 UTC. The full logs are available at http://meetings.opendaylight.org/opendaylight-clustering/2015/clustering_hackers/opendaylight-clustering-clustering_hackers.2015-02-10-15.59.log.html . Meeting summary --------------- * Gary Wu presenting information on Unified Secure Channel (moizer_, 16:06:55) * wants to support call home like netconf call home (moizer_, 16:07:26) * device needs to make an inbound call to controller (moizer_, 16:07:39) * device creates a call home connection (moizer_, 16:08:23) * this allows controller to talk to device (moizer_, 16:08:57) * assumptions that any node in cluster should be able to respond to a request instead of bouncing it around (moizer_, 16:11:11) * important that rpc request needs to be routed to the node with the connection (moizer_, 16:11:58) * Other considerations: scalability; should call home devices be “multi-homed” to multiple controller nodes (tbachman, 16:12:29) * moizer_ asks gwu if the idea is that the request to controller be bounced — is that so you don’t get a redirect? (tbachman, 16:13:08) * gwu says yes (tbachman, 16:13:12) * moizer_ says that the routed RPC mechanism should support this (tbachman, 16:14:18) * uchau asks in the clustering model, what happens to an OF switch when taht node goes down; needs device ownership model so that the device can work with another node in the controller (tbachman, 16:15:48) * gwu says when a node goes down, the device needs to reconnect with one of the other nodes (tbachman, 16:16:07) * uchau asks if USC was going have openflow also go through the secure channel (tbachman, 16:16:26) * gwu says yes (tbachman, 16:16:28) * uchau is interested in developing a device ownership concept, which helps provide failover direction (tbachman, 16:16:58) * uchau says in this case, if of connects directly or through secure channel, the ownership model is the same (tbachman, 16:17:32) * gwu asks how openflow deals with multihoming/mastership? (tbachman, 16:17:43) * uchau says the openflow team is implementing a message that allows the controller to assert the role (tbachman, 16:17:58) * uchau says that it can look at the device ownership when a device connects, and assert the role (tbachman, 16:18:16) * Helen says that clustering already has a supernode concept — asks if this is related (tbachman, 16:19:33) * moizer_ says for data, there is a concept of leaders and followers, but that does not mean you can go to another node to access inventory (tbachman, 16:20:53) * Helen asks that w/o a load balancer, is it possible for clustering to solve this problem (tbachman, 16:26:00) * moizer_ recommends using virtual IPs for the controller (tbachman, 16:26:18) * uchau says one option is to have the device connect to all the controllers in a team, which is similar to the openflow model (tbachman, 16:27:13) * moizer_ says one problem with using a virtual IP and load balancing is how to do keep-alives (tbachman, 16:30:28) * gwu asks what the scalability is of that model — how many connections can a node handle (tbachman, 16:30:58) * uchau says that jmedved was maybe targeting 5k, but wasn (tbachman, 16:31:20) * uchau says that jmedved was maybe targeting 5k, but wasn’t sure whether that was per-node or per-cluster (tbachman, 16:31:51) * Helen says that their requirement is for 1 million devices (tbachman, 16:32:04) * moizer_ says with clustering, we can only store that we can fit into memory (i.e. storage can’t exceed the amount of memory available) (tbachman, 16:33:27) * moizer_ says that’s a lot of operational data (tbachman, 16:33:31) * Helen says all the other data is stateless (tbachman, 16:33:42) * moizer_ says 1 million devices, and suspects that’s a lot of data in memory (tbachman, 16:34:02) * Fabiel Zuniga says that the persistence service may be able to help here (tbachman, 16:34:49) * markmozolewski says devices could maintain 1 Master / 1-2 Slave (backup) connections and establish new slave connections as failover occurs (vs. maintaining connections to all slaves), for cluster sizes >> 3. (tbachman, 16:35:04) * moizer_ recommends connecting a bunch of devices and see how things perform (tbachman, 16:36:09) * uchau asks if Helen wants the controller to support the load balancing, or using external load balancers (tbachman, 16:37:32) * uchau guesses that the 1 million nodes is to be supported by the cluster, not by a single node in the cluster (tbachman, 16:37:57) * moizer_ says with 64 switches in openflow, it takes about 4-1/2 MB in the data store (tbachman, 16:39:18) * I need to talk about bugs/patches for 10 mins (moizer_, 16:39:38) * catohornet asks with timeouts in the cluster — sees issue with many nodes, and where they’re configured topologically (tbachman, 16:40:08) * moizer_ says you don’t need to have every node fully replicated; as an example, with routing logic and 5 cluster nodes, you might choose to do replication on only 3 of the nodes (tbachman, 16:40:39) * gwu asks if the proposal is workable (tbachman, 16:42:16) * moizer_ says yes (tbachman, 16:42:18) * gwu was thinking of presenting statistics to the MD-SAL (e.g. bytes transferred); asks about this (e.g. effects on data store as things scale) (tbachman, 16:42:50) * moizer_ says if stats colllection interval isn’t too low, then it should be okay (e.g. no client will be reading stats every 3 seconds) (tbachman, 16:43:26) Meeting ended at 17:58:15 UTC. People present (lines said) --------------------------- * tbachman (54) * moizer_ (13) * odl_meetbot (3) * markmozolewski (3) Generated by `MeetBot`_ 0.1.4