16:07:05 #startmeeting 16:07:05 Meeting started Tue Apr 15 16:07:05 2014 UTC. The chair is colindixon. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:07:05 Useful Commands: #action #agreed #help #info #idea #link #topic. 16:07:19 #topic picking up with APIC clustering 16:07:26 #topic sharding 16:10:50 #info review from last time: (1) the core tree is broken into subtrees at "context roots", (2) subtrees under context roots are placed into shards, (3) shards are replicated 3 times with one 1 leader and 2 followers, (4) this layout is static and not adjusted except by modifying models—assumption is that failures are temporary nodes will come back 16:11:16 #info shard layout algorithm does its best to even out load among nodes 16:11:52 #topic co-locating apps and data 16:12:54 #info again, review from last time: there is an attempt to locate a services compute close to its data 16:14:22 #info right now, trying not to dictate, but the best practice is that apps should shard their compute in the same manner that they shard their data 16:17:35 #info regardless of what happens, there is an actor/stub process co-located with each shard which proxies remote access if nothing else 16:20:31 #info ken watsen asks when happens when you write, does it get routed to right node 16:22:47 #info answer is yes, it's routed to the static leader and if they're down, then it goes to the next follower 16:25:37 #topic pub/sub on top of the data store 16:27:07 #info icbts asks: interesting setup for lead/follower failover… so would apps follow a failover protocol similar to https://activemq.apache.org/failover-transport-reference.html? 16:27:27 #info colindixon responds: I think that it works by (1) going to the leader all of the time unless he's marked as down, (2) if he's marked is down, this is recorded and all subsequence reads/writes go to the next follower (who is temporarily the new leader) 16:28:20 #info w.r.t. to notifications, mike dvorkin points out that it's important to make notifications idempotent so that you can avoid certain issues, e.g., so that you can jump states in your state machine 16:28:35 colindixon: that’s very much like the AMQ failover protocol — try broker1, then instances 2, 3, .. N 16:29:21 icbts: that's good to know 16:31:53 #topic shard structure 16:32:14 #info each shard has a "doer" that applies changes to to an in-memory cache 16:32:34 #info it also has a "persister" which pushes the in-memory cache into some persistent DB 16:33:10 #info a replicator pushes changes to replicas and stores what happens into the commit log (which is also persisted) 16:34:24 #info on the side, there are a set of "dictionaries" which act as DB indices to speed up access 16:34:56 http://hsqldb.org + JPA/JAT - fast in memory relational SLQ, then add transactional persistence if required. Reasonably easy to install on OSGi core. 16:35:38 #info (there is a slide on how write transactions work in leaders and another on replication, which we skim over to review later in detail with Raghu and interested parties, possible at an MD-SAL offsite) 16:35:45 #topic comparison to other DBs 16:37:40 #info key differences seem to be (1) it works with trees and (2) try to be simple and static so that people don't have to be replicated DB admins 16:39:08 #info the part about trees relies on the fact that a whole subtree always falls within a shard, so there's low overhead to transactions, subscriptions, etc. at this subtree level 16:42:31 #info no locking because there is only one writer for a shard, and you just serialize the writes 16:42:40 #topic general Q&A 16:43:07 #info colindixon says the next step would be to figure out how our data models work and are they amenable to this kind of subtree-based sharding 16:43:36 #info jan says that so far, yes, but that's that current topology and inventory 16:50:51 #info colindixon asks about how the cluster management is actually done, but this is mostly tabled for future discussion 16:51:05 #topic going forward 16:51:55 #info Jan proposes that we create some kind of working group around starting to get implementations for the MD-SAL datastore 16:53:46 #info colindixon asks when/if we can expect the APIC cluster/datastore code to be open sourced 16:54:25 #info jan/mike say that it will not be, further it's all written in C++, so this would be less useful, this presentation is more for design ideas 16:58:46 #info it seems as though the "right" approach might be to use Akka to do cluster membership and node up/down/unknown state tracking and then using the current in-memory MD-SAL DOM data store to store each shard 17:00:21 #info we will pick this up with mlemay next week—people are heads-down on stable release stuff now—however we really need to make sure that this gets into Helium 17:00:27 #endmeeting