Deploying a stable/mitaka OpenStack with tripleo-docs (and grep, git-blame and git-log).

This post is about how I was able to mostly successfully follow the tripleo-docs, to deploy a stable/mitaka 3-control 1-compute development (virt) setup so I can ultimately test upgrading this to Newton.

I wasn’t sure there was something worth writing here, but then the same tools I used to address the two issues I hit deploying mitaka kept coming up during the week when trying to upgrade that environment. I’ve had to use a lot of grep and git blame/log to get to the bottom of issues I’m seeing trying to upgrade the undercloud from stable/mitaka to latest/newton.

The Newton upgrade work is ongoing and possibly worthy of a future post.

I guess this post is mostly about git blame, and using URI munging using the change-id to get to actual gerrit code reviews from an error/issue you are seeing.

For the record I deployed stable/mitaka following the instructions at tripleo-docs and setting stable/mitaka repos in appropriate places. For example, during the virt-setup and the undercloud installation I followed the ‘Stable Branch’ admonition and enabled mitaka repos like:

sudo curl -o /etc/yum.repos.d/delorean-mitaka.repo http://trunk.rdoproject.org/centos7-mitaka/current/delorean.repo
sudo curl -o /etc/yum.repos.d/delorean-deps-mitaka.repo http://trunk.rdoproject.org/centos7-mitaka/delorean-deps.repo

Then when building images I enabled the mitaka repo like:

export NODE_DIST=centos7
export USE_DELOREAN_TRUNK=1
export DELOREAN_TRUNK_REPO="http://trunk.rdoproject.org/centos7-mitaka/current/"
export DELOREAN_REPO_FILE="delorean.repo"

The two issues I hit:


The pebcak issue.

This issue is the pebcak issue because whilst there is indeed a bona-fide bug that I hit here, I only hit that because I had a nit in my deployment command.

My deployment command looked like this:

openstack overcloud deploy --templates --control-scale 3 --compute-scale 1
  --libvirt-type qemu
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml
-e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml
-e network_env.yaml --ntp-server "pool.ntp.org"

Deploying like that ^^^ got me this:

The files ('overcloud-without-mergepy.yaml', 'overcloud.yaml') not found
in the /usr/share/openstack-tripleo-heat-templates/ directory

Err.. no I’m pretty sure those files are there (!)

# [stack@instack ~]$ ls -l /usr/share/openstack-tripleo-heat-templates/overcloud-without-mergepy.yaml
  lrwxrwxrwx. 1 root root 14 Jun 17 08:55 /usr/share/openstack-tripleo-heat-templates/overcloud-without-mergepy.yaml -> overcloud.yaml

So I know that message is very likely from the tripleoclient so I traced it. The code has actually already been fixed on master so grep gave me nothing there. However when I also tried against stable/mitaka:

[m@m python-tripleoclient]$ git checkout stable/mitaka
Switched to branch 'stable/mitaka'
[m@m python-tripleoclient]$ grep -rni "not found in the" ./*
./tripleoclient/v1/overcloud_deploy.py:414:  message = "The files {0} not
found in the {1} directory".format(

So then we can now use git blame to get to the code review that fixed it. Since we now know the file that error message comes from, we can use git blame against master branch. Since it is fixed on master, something must have fixed it:

[m@m python-tripleoclient]$ git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
[m@m python-tripleoclient]$ git blame tripleoclient/v1/overcloud_deploy.py

1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  382)     def _try_overcloud_deploy_with_compat_yaml(self, tht_root, stack,
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  383)                                                stack_name, parameters,
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  384)                                                environments, timeout):
7a05679e tripleoclient/v1/overcloud_deploy.py        (James Slagle               2016-04-01 08:57:41 -0400  385)         messages = ['The following errors occurred:']
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  386)         for overcloud_yaml_name in constants.OVERCLOUD_YAML_NAMES:
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  387)             overcloud_yaml = os.path.join(tht_root, overcloud_yaml_name)
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  388)             try:
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  389)                 self._heat_deploy(stack, stack_name, overcloud_yaml,
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  390)                                   parameters, environments, timeout)
7a05679e tripleoclient/v1/overcloud_deploy.py        (James Slagle               2016-04-01 08:57:41 -0400  391)             except six.moves.urllib.error.URLError as e:
7a05679e tripleoclient/v1/overcloud_deploy.py        (James Slagle               2016-04-01 08:57:41 -0400  392)                 messages.append(str(e.reason))
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  393)             else:
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  394)                 return
7a05679e tripleoclient/v1/overcloud_deploy.py        (James Slagle               2016-04-01 08:57:41 -0400  395)         raise ValueError('\n'.join(messages))

So the git blame may not display great above, but I see the following line as particularly interesting since it is different to stable/mitaka:

7a05679e tripleoclient/v1/overcloud_deploy.py        (James Slagle               2016-04-01 08:57:41 -0400  392)                 messages.append(str(e.reason))

So now we can use git log to see the actual commit and check it is the one we are looking for:

[m@m python-tripleoclient]$ git log 7a05679e
commit 7a05679ebc944e3bec6f20c194c40fae1cf39d8d
Author: James Slagle <jslagle@redhat.com>
Date:   Fri Apr 1 08:57:41 2016 -0400

Show correct missing files when an error occurs

This function was swallowing all missing file exceptions, and then
printing a message saying overcloud.yaml or
overcloud-without-mergepy.yaml were not found.

The problem is that the URLError could occur for any missing file, such
as a missing environment file, typo in a relative patch or filename,
etc. And in those cases, the error message is actually quite misleading,
especially if the overcloud.yaml does exist at the exact shown path.

This change makes it such that the actual missing file paths are shown
in the output.

Closes-Bug: 1584792
Change-Id: Id9a70cb50d7dfa3dde72eefe0a5eaea7985236ff

Now that sounds promising! So not only do we have the actual bug number, but we have the Change-Id. We can use that to get to the gerrit code review:

[m@m ~]$ gimmeGerrit Id9a70cb50d7dfa3dde72eefe0a5eaea7985236ff

Where gimmeGerrit is a bash alias in my .profile:

  2  gimme_gerrit() {$
  3      gerrit_url="http://review.openstack.org/#q,$1,n,z"$
  4      firefox $gerrit_url$
  5  }$
  93 alias gimmeGerrit=gimme_gerrit$

So from the review to master I just made a cherry-pick to stable/mitaka.

Now the reason I was seeing this issue in the first place, was because my deploy command was indeed wrong (it’s just that the error message was eaten by this particular bug). I was using ‘network_env.yaml’ but I had actually created network-env.yaml. Yes, much palmface, but if I hadn’t I wouldn’t have backported the fix so meh.


The overcloud needs moar memory bug.

It is more or less well known in the tripleo community that 4GB overcloud nodes will no longer cut it even in a virt environment, which is why we default to 5GB on current master instack-undercloud.

I was seeing OOM issues on the overcloud nodes with current stable/mitaka like:

16021:Jun 14 10:53:07 overcloud-controller-0 os-collect-config[2330]: u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mError: /Stage[main]/Main/Pacemaker::Constraint::Base[storage_mgmt_vip-then-haproxy]/Exec[Creating order constraint storage_mgmt_vip-then-haproxy]: Could not evaluate: Cannot allocate memory - fork(2)\u001b[0m\n\u001b[1;31mError: /Stage[main]/Main/Pacemaker::Resource::Service[openstack-nova-novncproxy]/Pacemaker::Resource::Systemd[openstack-nova-novncproxy]/Pcmk_resource[openstack-nova-novncproxy]: Could not evaluate: Cannot allocate memory - /usr/sbin/pcs resource show openstack-nova-novncproxy > /dev/null 2>&1 2>&1\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Main/Pacemaker::Constraint::Base[nova-vncproxy-then-nova-api-constraint]/Exec[Creating order constraint nova-vncproxy-then-nova-api-constraint]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Main/Pacemaker::Constraint::Colocation[nova-api-with-nova-vncproxy-colocation]/Pcmk_constraint[colo-openstack-nova-api-clone-openstack-nova-novncproxy-clone]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Main/Pacemaker::Constraint::Base[nova-consoleauth-then-nova-vncproxy-constraint]/Exec[Creating order constraint nova-consoleauth-then-nova-vncproxy-constraint]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Main/Pacemaker::Constraint::Colocation[nova-vncproxy-with-nova-consoleauth-colocation]/Pcmk_constraint[

16313:Jun 14 10:53:07 overcloud-controller-0 os-collect-config[2330]:
Error: /Stage[main]/Sahara::Service::Api/Service[sahara-api]: Could not
evaluate: Cannot allocate memory - fork(2)
16314:Jun 14 10:53:07 overcloud-controller-0 os-collect-config[2330]:
Error: /Stage[main]/Haproxy/Haproxy::Instance[haproxy]/Haproxy::Config[haproxy]/Concat[/etc/haproxy/haproxy.cfg]/Exec[concat_/etc/haproxy/haproxy.cfg]:
Could not evaluate: Cannot allocate memory - fork(2)

Suspecting from previous experience this would be defaulted in instack-undercloud:

[m@m instack-undercloud]$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
[m@m instack-undercloud]$ grep -rni 'NODE_MEM' ./*
./scripts/instack-virt-setup:89:export NODE_MEM=${NODE_MEM:-5120}

[m@m instack-undercloud]$ git blame scripts/instack-virt-setup | grep  NODE_MEM
2dec7d75 (Carlos Camacho  2016-03-30 09:17:44 +0000  89) export NODE_MEM=${NODE_MEM:-5120}

So using git log to see more about 2dec7d75:

[m@m instack-undercloud]$ git log 2dec7d75
commit 2dec7d7521799c0323d076cd66ba71ebb444c706
Author: Carlos Camacho <ccamacho@redhat.com>
Date:   Wed Mar 30 09:17:44 2016 +0000

    Overcloud is not able to deploy with the default 4GB of RAM using instack-undercloud

    When deploying the overcloud with the default value of 4GB of RAM the overcloud fails throwing "Cannot allocate memory" errors.
    By increasing the default memory to 5GB the error is solved in instack-undercloud

    Change-Id: I29036edeebefc1959643a04c5396e72863fdca5f
    Closes-Bug: #1563750

So as in the case of the pebcak issue, gimmeGerrit yields the review so I then just cherrypicked that to stable/mitaka too.



blog comments powered by Disqus
RSS Feed Icon site.xml
RSS Feed Icon tripleo.xml