15:08:52 #startmeeting neutron_northbound 15:08:52 Meeting started Mon Nov 13 15:08:52 2017 UTC. The chair is yamahata. Information about MeetBot at http://ci.openstack.org/meetbot.html. 15:08:52 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:08:52 The meeting name has been set to 'neutron_northbound' 15:08:59 #chair mkolesni rajivk_ mkolesni 15:08:59 Current chairs: mkolesni rajivk_ yamahata 15:09:03 #chair mpeterson 15:09:03 Current chairs: mkolesni mpeterson rajivk_ yamahata 15:09:10 #topic agenda bashing and roll call 15:09:14 #info mkolesni 15:09:16 #info yamahata 15:09:17 #info rajivk 15:09:30 #info mpeterson 15:09:33 today any additional topics? 15:09:44 theres a ton of stuff on the stable/pike branch waiting for review 15:09:50 most of it +1 by us 15:09:55 yes, I have two topics 15:10:08 how would we approach this wrt stable main team? 15:10:11 #info ODL cleanup between tests 15:10:33 #info Neutron/Infra/etc updates during their meetings 15:11:04 this week we have three extra topics 15:11:14 any other topics? 15:11:42 no 15:12:07 none from me 15:12:16 none from me 15:12:19 okay, move on 15:12:21 #topic Announcements 15:12:36 Last week there was openstack summit. there was forum. 15:12:50 #link https://wiki.openstack.org/wiki/Forum/Sydney2017 openstack forum 15:12:56 #link https://etherpad.openstack.org/p/Neutron-pain-points-Sydney 15:13:18 We'd like to check what was discussed there. 15:13:51 #link https://www.openstack.org/ptg/ 15:14:06 openstack PTG is planned on the week of Feb 26, 2018 in Dublin. 15:14:15 For now the details isn't announced yet. 15:14:16 im still waiting on an answer for you guys :) 15:14:54 #info openstack PTG is planned on the week of Feb 26, 2018 in Dublin. No more details at the moment. 15:14:59 Without concrete plan, it's difficult for me to get travel budget. 15:15:04 if youre going to attend or not 15:15:19 tell them rest of n-odl team will be there :) 15:15:22 I need to wait for more details to be announced. 15:16:16 any other announcement? 15:17:07 seems nothing. let's move on 15:17:11 #topic action items from last meeting 15:17:27 There is only timeslot and patch review stuff. 15:17:41 okay, now move on proposed topics 15:17:48 #topic approach for stable branch 15:18:04 mkolesni: please go ahead. 15:18:29 basically they don't monitor back port patches. we have to add them to reviewer explicitly. 15:18:34 yeah i just said theres a few things we backported there 15:18:36 ok 15:18:46 i thought they have wome weekly review or something? 15:19:02 we need to explicitly add the individuals to the patches? 15:19:27 the latter. 15:19:45 We need to add the maintenance team explicity to the patch review. 15:19:45 well thats unfortunate.. 15:20:00 They have many patches to review, they don't actively poll our patches. 15:20:09 ok will add them 15:20:22 Once they are added, usually they will review in several days. 15:20:34 i think maybe we can raise this on neutron level 15:20:40 yamahata: mmm and who would they be? I usually see they are added however they just linger without +2 +W the patches 15:20:48 like why dont we moderate stable ourselves? 15:21:19 For now that's the neutron team consensus. 15:21:34 So far do you have any issues? 15:21:44 except patch review without adding reviewers? 15:22:11 #link https://review.openstack.org/#/admin/groups/539,members neutron stable maint team 15:22:36 im not sure if its event relevant to have them moderating it 15:22:38 #info Regarding Stable patches in order for them to be reviewed the neutron stable maint team has to be manually added to the patch. Link above. 15:23:23 maybe we can have a "stable liason" that will moderate so that they do less checks on each patch 15:23:59 also probably some of the checks can be done by CI, i.e. don't backport db migration scripts, that stuff 15:25:19 If so, we need to raise it to change the neutron stadium governance. 15:26:11 So far do we have big issue to change the neutorn governance? 15:26:20 no 15:27:31 If we have issue on reviewing of backport patches with adding them to reviewer, Let's discuss on it again. 15:28:00 Is it okay for you? mkolesni 15:28:27 ok 15:28:46 thanks. then next topic 15:28:50 #topic ODL cleanup between tests 15:28:56 mpeterson: you're on stage 15:29:00 cool 15:29:04 so the thing is the following 15:29:09 sorry have to go 15:29:12 see you guys next week 15:29:25 neutron has a guideline of not cleaning up after tests and instead it deletes the db and creates it again, right? 15:29:26 mkolesni: see you next week. It's 17:00UTC. 15:29:58 well, because of that for example bgpvpn doesn't cleanup on failures 15:30:20 You mean test=unit tests. 15:30:22 which means, since we are interacting with ODL, that ODL ends in a dirty status 15:30:31 unit and functional are affected 15:30:46 #info https://bugs.launchpad.net/bgpvpn/+bug/1723725 reference on the issue 15:30:54 #link https://bugs.launchpad.net/bgpvpn/+bug/1723725 reference on the issue 15:31:06 Usually yes because it causes non-determinism depending on order of running test cases 15:31:49 so basically we need to trigger a cleanup of ODL between tests 15:32:31 in a similar way as neutron does it by dropping the DB 15:32:40 I see. So that's the reason why we're seeing intermittent errors wigh bgpvpn. 15:32:58 yamahata: very possibly 15:33:48 #info Because of a neutron design decision tests don't cleanup the DB between runs. As a result ODL gets to a dirty state. We need to cleanup ODL manually between tests. 15:34:27 #action mpeterson to create a task or bug to cleanup ODL between tests 15:34:51 yamahata: could also be part of the reason why tempest has interminent errors too 15:35:11 mpeterson: that sounds very plausible. 15:35:25 now that we have grafana you can see that there is around 50% failure ratio 15:35:52 * yamahata opening grafana page 15:36:00 and I've found this problem by chance :) 15:36:39 I'm very glad to see grafana back. 15:37:23 great finding. 15:37:25 just a clarification: no datapoints means there were no failures or there were no executions 15:38:23 that's the conclusion of this topic, if you want to continue 15:38:34 yamahata: ^^ 15:38:56 anything else to add? 15:39:40 yamahata: nope, I think it's pretty explanatory, unless someone has questions 15:39:48 okay, next topic 15:39:59 #topic Neutron/Infra/etc updates during their meetings 15:40:07 my stage again 15:40:09 mpeterson: you're still on stage. :-) 15:40:40 basically there are updates happening in neutron and infra IRC meetings that we don't have an idea of what's going on... 15:40:57 ie: the effort to freely receive patches of the neutron-lib rehoming 15:41:16 ie: incompatible changes to Zuul v3 that will be introduced 15:42:29 I've read for example in regards of the first one, that there is a ML thread where they recommend a representative of each team to attend their meetings 15:42:37 we should consider this, right? 15:43:03 Basically neutron stuff is discussed at neutron meeting 15:43:13 http://eavesdrop.openstack.org/#Neutron_Team_Meeting 15:44:07 Also there are additional specific neutron meetings. 15:44:21 e.g. drivers/CI/L3/Qos/upgreads. 15:44:35 I think neutron-lib is discussed at neutron team meeting. 15:45:08 For zuul I suppose it's discussed zuul meeting, but I'm not very sure about this. 15:45:22 #link http://eavesdrop.openstack.org/#Zuul_Meeting 15:45:50 yamahata: okey, but do we have a vested interest to participate and can we? 15:46:01 regarding to neutorn we should attend it. 15:46:38 #info there are updates that only happen on IRC that could have an impact on this project. Unless there is no participation of a representative of this project we might find ourselves in a bad position. 15:47:13 Personally I sometimes attend neutron meeting. But not very persistently recently. 15:47:54 okey, personally I can't attend on their timeslot :/ (I don't work all days of the week) 15:48:32 #link https://wiki.openstack.org/wiki/Network/Meetings neutron meeting agenda 15:48:40 We can see neutron-lib updates 15:48:46 #action to decide if we should and who should participate in the different meetings 15:48:59 okey, that's it for the topic 15:49:09 Timeslot is rotated biweekly. 15:49:44 yes, I can't in either 15:49:50 I see. 15:50:00 I need to check time, may be i can 15:50:36 the action has been created, we can follow up next week on this perhaps and continue since we only have 10'? 15:50:46 At worst we can check meeting minutes/logs. 15:51:12 Sure. 15:51:21 great 15:51:25 what's the next topic? 15:51:37 Okay, now we can move on to usually patches/bugs 15:51:43 #topic patches/bugs 15:52:15 I would like to discuss about https://review.openstack.org/#/c/516857/ 15:53:08 how should we proceed on this one? 15:53:42 Should we target the scenarios, which lead us to this situation? 15:54:16 yamahata, what do you mean by covering systematically? 15:54:58 My guess is that it's due to bug, maybe in journaling. 15:55:05 So it's bug work around. 15:55:25 If we get HTTP error from ODL, there are several possibility. 15:55:36 rajivk_: I'm reluctant to accept this patch, as currently there is a bug which we haven't found which is causing those 404 in several situations. This is a workaround that would complicate things. 15:55:50 i.e. operation=create/update/delete, and http error=404, etc. 15:56:17 If it makes sense, we should address reasonable combination. 15:56:20 yamahata, rajivk_: currently at redhat we are running a scale test and the 404s and Read Timeouts to REST are all over the place 15:56:33 But it's arguable if we should address it or not. 15:56:47 yamahata, rajivk_: so it seems there is an underlying cause 15:57:04 yes, it is temporary fix but it will cover original issue. 15:57:05 Do you observe other error case? 15:57:26 yamahata, rajivk_: we haven't identified the root cause yet 15:57:36 yamahata, no, i saw this one only. mpeterson, what about you? 15:58:07 mpeterson, yamahata, i saw them in the logs of gate jobs 15:58:18 just to give a magnitude about this... we have seen cases where we get more than 3000 Read Timeouts per minute 15:58:19 i did not encountered them on my env 15:58:49 read timeout from ODL? 15:58:53 yes 15:58:58 from the REST interface 15:59:22 does odl stuck? 15:59:26 with 10 sec timeout? 15:59:53 yes 16:00:07 With Openstack CI, I increased the timeout from default 10 sec to 60secs in the past. 16:00:18 yamahata: but that's not the solution 16:00:34 yamahata: there is an underlying cause that needs to be found 16:00:38 mpeterson: right. 16:01:07 yamahata: we are working our way through the logs, if we find anything you'll be updated on it 16:01:17 Even with single ODL deployment, sometime ODL MD-SAL transaction abort sometimes and it's retried internally. 16:01:38 hey, i notices some failure on odl side 16:01:44 yamahata: correct 16:01:59 it was something like optimistic locking. and it is continuous 16:02:10 yamahata: in these tests it's even more complex because there are 3 controllers and it includes HA 16:02:26 You can see it in ODL log. something like Got OptimisticLockFailedException 16:02:34 rajivk_: yes, that's what mkolesni found today and he'll discuss with Josh 16:02:38 Yeah. with ODL HA, thing is more complex. 16:02:57 With that I don't know what timeout is appropreate. 16:02:59 rajivk_: good to see you found the same, it could be a solid clue 16:03:39 the default value 10 sec is just randomly picked. It's not based on measurement. 16:03:58 yamahata: anyways 10 sec is a huge timeout frame... it should be way way smaller 16:04:04 yamahata, this is the timeout our rest client waits for ODL to wait? 16:04:25 rajivk_: correct 16:04:33 10s is too much 16:04:38 rajivk_: yes. 16:05:00 I head ODL operations are async 16:05:23 their is something wrong with odl. mpeterson, are you seeing these logs with all releases? 16:05:31 or carbon or neutron specific? 16:05:37 netron -> newton 16:06:17 rajivk_: this is in carbon IIRC 16:07:32 i saw those error logs in latest, AFAIK 16:07:51 In my past experience, 10sec timeout causes error and with 60sec, tests became much stabler. 16:08:10 Maybe we can experiment by decreasing timeout to 10 sec again. 16:09:27 Now we're over 9mins. 16:09:34 I want to know, what happens if we disable router, does connectivity stays or lost among different subnet? 16:09:38 Do we have any other urgent patches/bugs? 16:09:58 yamahata, mpeterson ^^^ 16:10:18 yamahata: https://review.openstack.org/#/c/519384/1 16:10:56 Wow. why didn't we notice it... 16:11:23 yamahata: because there were no UT 16:12:09 any other patches? 16:12:29 yamahata: not for now 16:12:34 I think recovery patch is near for merge. 16:13:06 https://review.openstack.org/#/c/500366/ 16:13:08 yamahata: yes, but I've posted a big comment section last time that rajivk_ hasn't addressed yet :) 16:13:29 Okay. 16:13:30 mpeterson, now i want to introduce changes by small patches 16:13:34 yamahata, rajivk_: nothing too big though 16:13:50 mpeterson, ok then 16:13:56 it is db one right? 16:14:14 i will finish it tomorrow 16:14:53 yamahata: another thing before you close the meeting... mkolesni just called me and says we can leave the meeting at the time it was today 16:15:25 you mean 15:00UTC? 16:15:48 yamahata: if we started 15 UTC today, then yes 16:15:51 how about rajivk_, mpeterson ? 15:00UTC works for you tow? 16:16:02 mpeterson: it's preferable for me 16:16:09 Right today we've started at 15:00UTC. 16:16:10 i am ok 16:16:18 Okay, then let's continue 15:00UTC 16:16:25 #agree we will continue this meetings 15:00 UTC 16:16:33 #action yamahata update timeslot to 15:00UTC on wiki. 16:16:48 anything else? 16:16:54 #topic open mike 16:17:13 okay, thank you everyone. 16:17:20 thanks! 16:17:22 see you next week 16:17:24 take care 16:17:40 #topic cookies 16:17:44 #endmeeting