marios@ ~]$ /var/log
This post is about how I was able to mostly successfully follow the tripleo-docs, to deploy a stable/mitaka 3-control 1-compute development (virt) setup so I can ultimately test upgrading this to Newton.
I wasn’t sure there was something worth writing here, but then the same tools I used to address the two issues I hit deploying mitaka kept coming up during the week when trying to upgrade that environment. I’ve had to use a lot of grep and git blame/log to get to the bottom of issues I’m seeing trying to upgrade the undercloud from stable/mitaka to latest/newton.
The Newton upgrade work is ongoing and possibly worthy of a future post.
I guess this post is mostly about git blame, and using URI munging using the change-id to get to actual gerrit code reviews from an error/issue you are seeing.
For the record I deployed stable/mitaka following the instructions at tripleo-docs and setting stable/mitaka repos in appropriate places. For example, during the virt-setup and the undercloud installation I followed the ‘Stable Branch’ admonition and enabled mitaka repos like:
sudo curl -o /etc/yum.repos.d/delorean-mitaka.repo http://trunk.rdoproject.org/centos7-mitaka/current/delorean.repo
sudo curl -o /etc/yum.repos.d/delorean-deps-mitaka.repo http://trunk.rdoproject.org/centos7-mitaka/delorean-deps.repo
Then when building images I enabled the mitaka repo like:
export NODE_DIST=centos7
export USE_DELOREAN_TRUNK=1
export DELOREAN_TRUNK_REPO="http://trunk.rdoproject.org/centos7-mitaka/current/"
export DELOREAN_REPO_FILE="delorean.repo"
The two issues I hit:
The pebcak issue.
This issue is the pebcak issue because whilst there is indeed a bona-fide bug that I hit here, I only hit that because I had a nit in my deployment command.
My deployment command looked like this:
openstack overcloud deploy --templates --control-scale 3 --compute-scale 1
--libvirt-type qemu
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml
-e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml
-e network_env.yaml --ntp-server "pool.ntp.org"
Deploying like that ^^^ got me this:
The files ('overcloud-without-mergepy.yaml', 'overcloud.yaml') not found
in the /usr/share/openstack-tripleo-heat-templates/ directory
Err.. no I’m pretty sure those files are there (!)
# [stack@instack ~]$ ls -l /usr/share/openstack-tripleo-heat-templates/overcloud-without-mergepy.yaml
lrwxrwxrwx. 1 root root 14 Jun 17 08:55 /usr/share/openstack-tripleo-heat-templates/overcloud-without-mergepy.yaml -> overcloud.yaml
So I know that message is very likely from the tripleoclient so I traced it. The code has actually already been fixed on master so grep gave me nothing there. However when I also tried against stable/mitaka:
[m@m python-tripleoclient]$ git checkout stable/mitaka
Switched to branch 'stable/mitaka'
[m@m python-tripleoclient]$ grep -rni "not found in the" ./*
./tripleoclient/v1/overcloud_deploy.py:414: message = "The files {0} not
found in the {1} directory".format(
So then we can now use git blame to get to the code review that fixed it. Since we now know the file that error message comes from, we can use git blame against master branch. Since it is fixed on master, something must have fixed it:
[m@m python-tripleoclient]$ git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
[m@m python-tripleoclient]$ git blame tripleoclient/v1/overcloud_deploy.py
1077cf13 tripleoclient/v1/overcloud_deploy.py (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200 382) def _try_overcloud_deploy_with_compat_yaml(self, tht_root, stack,
1077cf13 tripleoclient/v1/overcloud_deploy.py (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200 383) stack_name, parameters,
1077cf13 tripleoclient/v1/overcloud_deploy.py (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200 384) environments, timeout):
7a05679e tripleoclient/v1/overcloud_deploy.py (James Slagle 2016-04-01 08:57:41 -0400 385) messages = ['The following errors occurred:']
1077cf13 tripleoclient/v1/overcloud_deploy.py (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200 386) for overcloud_yaml_name in constants.OVERCLOUD_YAML_NAMES:
1077cf13 tripleoclient/v1/overcloud_deploy.py (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200 387) overcloud_yaml = os.path.join(tht_root, overcloud_yaml_name)
1077cf13 tripleoclient/v1/overcloud_deploy.py (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200 388) try:
1077cf13 tripleoclient/v1/overcloud_deploy.py (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200 389) self._heat_deploy(stack, stack_name, overcloud_yaml,
1077cf13 tripleoclient/v1/overcloud_deploy.py (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200 390) parameters, environments, timeout)
7a05679e tripleoclient/v1/overcloud_deploy.py (James Slagle 2016-04-01 08:57:41 -0400 391) except six.moves.urllib.error.URLError as e:
7a05679e tripleoclient/v1/overcloud_deploy.py (James Slagle 2016-04-01 08:57:41 -0400 392) messages.append(str(e.reason))
1077cf13 tripleoclient/v1/overcloud_deploy.py (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200 393) else:
1077cf13 tripleoclient/v1/overcloud_deploy.py (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200 394) return
7a05679e tripleoclient/v1/overcloud_deploy.py (James Slagle 2016-04-01 08:57:41 -0400 395) raise ValueError('\n'.join(messages))
So the git blame may not display great above, but I see the following line as particularly interesting since it is different to stable/mitaka:
7a05679e tripleoclient/v1/overcloud_deploy.py (James Slagle 2016-04-01 08:57:41 -0400 392) messages.append(str(e.reason))
So now we can use git log to see the actual commit and check it is the one we are looking for:
[m@m python-tripleoclient]$ git log 7a05679e
commit 7a05679ebc944e3bec6f20c194c40fae1cf39d8d
Author: James Slagle <jslagle@redhat.com>
Date: Fri Apr 1 08:57:41 2016 -0400
Show correct missing files when an error occurs
This function was swallowing all missing file exceptions, and then
printing a message saying overcloud.yaml or
overcloud-without-mergepy.yaml were not found.
The problem is that the URLError could occur for any missing file, such
as a missing environment file, typo in a relative patch or filename,
etc. And in those cases, the error message is actually quite misleading,
especially if the overcloud.yaml does exist at the exact shown path.
This change makes it such that the actual missing file paths are shown
in the output.
Closes-Bug: 1584792
Change-Id: Id9a70cb50d7dfa3dde72eefe0a5eaea7985236ff
Now that sounds promising! So not only do we have the actual bug number, but we have the Change-Id. We can use that to get to the gerrit code review:
[m@m ~]$ gimmeGerrit Id9a70cb50d7dfa3dde72eefe0a5eaea7985236ff
Where gimmeGerrit is a bash alias in my .profile:
2 gimme_gerrit() {$
3 gerrit_url="http://review.openstack.org/#q,$1,n,z"$
4 firefox $gerrit_url$
5 }$
93 alias gimmeGerrit=gimme_gerrit$
So from the review to master I just made a cherry-pick to stable/mitaka.
Now the reason I was seeing this issue in the first place, was because my deploy command was indeed wrong (it’s just that the error message was eaten by this particular bug). I was using ‘network_env.yaml’ but I had actually created network-env.yaml. Yes, much palmface, but if I hadn’t I wouldn’t have backported the fix so meh.
The overcloud needs moar memory bug.
It is more or less well known in the tripleo community that 4GB overcloud nodes will no longer cut it even in a virt environment, which is why we default to 5GB on current master instack-undercloud.
I was seeing OOM issues on the overcloud nodes with current stable/mitaka like:
16021:Jun 14 10:53:07 overcloud-controller-0 os-collect-config[2330]: u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mError: /Stage[main]/Main/Pacemaker::Constraint::Base[storage_mgmt_vip-then-haproxy]/Exec[Creating order constraint storage_mgmt_vip-then-haproxy]: Could not evaluate: Cannot allocate memory - fork(2)\u001b[0m\n\u001b[1;31mError: /Stage[main]/Main/Pacemaker::Resource::Service[openstack-nova-novncproxy]/Pacemaker::Resource::Systemd[openstack-nova-novncproxy]/Pcmk_resource[openstack-nova-novncproxy]: Could not evaluate: Cannot allocate memory - /usr/sbin/pcs resource show openstack-nova-novncproxy > /dev/null 2>&1 2>&1\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Main/Pacemaker::Constraint::Base[nova-vncproxy-then-nova-api-constraint]/Exec[Creating order constraint nova-vncproxy-then-nova-api-constraint]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Main/Pacemaker::Constraint::Colocation[nova-api-with-nova-vncproxy-colocation]/Pcmk_constraint[colo-openstack-nova-api-clone-openstack-nova-novncproxy-clone]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Main/Pacemaker::Constraint::Base[nova-consoleauth-then-nova-vncproxy-constraint]/Exec[Creating order constraint nova-consoleauth-then-nova-vncproxy-constraint]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Main/Pacemaker::Constraint::Colocation[nova-vncproxy-with-nova-consoleauth-colocation]/Pcmk_constraint[
16313:Jun 14 10:53:07 overcloud-controller-0 os-collect-config[2330]:
Error: /Stage[main]/Sahara::Service::Api/Service[sahara-api]: Could not
evaluate: Cannot allocate memory - fork(2)
16314:Jun 14 10:53:07 overcloud-controller-0 os-collect-config[2330]:
Error: /Stage[main]/Haproxy/Haproxy::Instance[haproxy]/Haproxy::Config[haproxy]/Concat[/etc/haproxy/haproxy.cfg]/Exec[concat_/etc/haproxy/haproxy.cfg]:
Could not evaluate: Cannot allocate memory - fork(2)
Suspecting from previous experience this would be defaulted in instack-undercloud:
[m@m instack-undercloud]$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
[m@m instack-undercloud]$ grep -rni 'NODE_MEM' ./*
./scripts/instack-virt-setup:89:export NODE_MEM=${NODE_MEM:-5120}
[m@m instack-undercloud]$ git blame scripts/instack-virt-setup | grep NODE_MEM
2dec7d75 (Carlos Camacho 2016-03-30 09:17:44 +0000 89) export NODE_MEM=${NODE_MEM:-5120}
So using git log to see more about 2dec7d75:
[m@m instack-undercloud]$ git log 2dec7d75
commit 2dec7d7521799c0323d076cd66ba71ebb444c706
Author: Carlos Camacho <ccamacho@redhat.com>
Date: Wed Mar 30 09:17:44 2016 +0000
Overcloud is not able to deploy with the default 4GB of RAM using instack-undercloud
When deploying the overcloud with the default value of 4GB of RAM the overcloud fails throwing "Cannot allocate memory" errors.
By increasing the default memory to 5GB the error is solved in instack-undercloud
Change-Id: I29036edeebefc1959643a04c5396e72863fdca5f
Closes-Bug: #1563750
So as in the case of the pebcak issue, gimmeGerrit yields the review so I then just cherrypicked that to stable/mitaka too.
blog comments powered by Disqus