#opendaylight-meeting Meeting

Meeting started by colindixon at 16:07:05 UTC (full logs).

Meeting summary

  1. picking up with APIC clustering (colindixon, 16:07:19)
  2. sharding (colindixon, 16:07:26)
    1. review from last time: (1) the core tree is broken into subtrees at "context roots", (2) subtrees under context roots are placed into shards, (3) shards are replicated 3 times with one 1 leader and 2 followers, (4) this layout is static and not adjusted except by modifying models—assumption is that failures are temporary nodes will come back (colindixon, 16:10:50)
    2. shard layout algorithm does its best to even out load among nodes (colindixon, 16:11:16)

  3. co-locating apps and data (colindixon, 16:11:52)
    1. again, review from last time: there is an attempt to locate a services compute close to its data (colindixon, 16:12:54)
    2. right now, trying not to dictate, but the best practice is that apps should shard their compute in the same manner that they shard their data (colindixon, 16:14:22)
    3. regardless of what happens, there is an actor/stub process co-located with each shard which proxies remote access if nothing else (colindixon, 16:17:35)
    4. ken watsen asks when happens when you write, does it get routed to right node (colindixon, 16:20:31)
    5. answer is yes, it's routed to the static leader and if they're down, then it goes to the next follower (colindixon, 16:22:47)

  4. pub/sub on top of the data store (colindixon, 16:25:37)
    1. icbts asks: interesting setup for lead/follower failover… so would apps follow a failover protocol similar to https://activemq.apache.org/failover-transport-reference.html? (colindixon, 16:27:07)
    2. colindixon responds: I think that it works by (1) going to the leader all of the time unless he's marked as down, (2) if he's marked is down, this is recorded and all subsequence reads/writes go to the next follower (who is temporarily the new leader) (colindixon, 16:27:27)
    3. w.r.t. to notifications, mike dvorkin points out that it's important to make notifications idempotent so that you can avoid certain issues, e.g., so that you can jump states in your state machine (colindixon, 16:28:20)

  5. shard structure (colindixon, 16:31:53)
    1. each shard has a "doer" that applies changes to to an in-memory cache (colindixon, 16:32:14)
    2. it also has a "persister" which pushes the in-memory cache into some persistent DB (colindixon, 16:32:34)
    3. a replicator pushes changes to replicas and stores what happens into the commit log (which is also persisted) (colindixon, 16:33:10)
    4. on the side, there are a set of "dictionaries" which act as DB indices to speed up access (colindixon, 16:34:24)
    5. http://hsqldb.org + JPA/JAT - fast in memory relational SLQ, then add transactional persistence if required. Reasonably easy to install on OSGi core. (icbts, 16:34:56)
    6. (there is a slide on how write transactions work in leaders and another on replication, which we skim over to review later in detail with Raghu and interested parties, possible at an MD-SAL offsite) (colindixon, 16:35:38)

  6. comparison to other DBs (colindixon, 16:35:45)
    1. key differences seem to be (1) it works with trees and (2) try to be simple and static so that people don't have to be replicated DB admins (colindixon, 16:37:40)
    2. the part about trees relies on the fact that a whole subtree always falls within a shard, so there's low overhead to transactions, subscriptions, etc. at this subtree level (colindixon, 16:39:08)
    3. no locking because there is only one writer for a shard, and you just serialize the writes (colindixon, 16:42:31)

  7. general Q&A (colindixon, 16:42:40)
    1. colindixon says the next step would be to figure out how our data models work and are they amenable to this kind of subtree-based sharding (colindixon, 16:43:07)
    2. jan says that so far, yes, but that's that current topology and inventory (colindixon, 16:43:36)
    3. colindixon asks about how the cluster management is actually done, but this is mostly tabled for future discussion (colindixon, 16:50:51)

  8. going forward (colindixon, 16:51:05)
    1. Jan proposes that we create some kind of working group around starting to get implementations for the MD-SAL datastore (colindixon, 16:51:55)
    2. colindixon asks when/if we can expect the APIC cluster/datastore code to be open sourced (colindixon, 16:53:46)
    3. jan/mike say that it will not be, further it's all written in C++, so this would be less useful, this presentation is more for design ideas (colindixon, 16:54:25)
    4. it seems as though the "right" approach might be to use Akka to do cluster membership and node up/down/unknown state tracking and then using the current in-memory MD-SAL DOM data store to store each shard (colindixon, 16:58:46)
    5. we will pick this up with mlemay next week—people are heads-down on stable release stuff now—however we really need to make sure that this gets into Helium (colindixon, 17:00:21)


Meeting ended at 17:00:27 UTC (full logs).

Action items

  1. (none)


People present (lines said)

  1. colindixon (37)
  2. icbts (2)
  3. odl_meetbot (2)


Generated by MeetBot 0.1.4.