Openstack-ansible is another public deployment tool for Openstack. Similar as Kolla, it segregates Openstack services/daemons inti LXC containers. Following TIPs are based on version 5.0 only

Same External/Internal IP

when deploy via ansible, if external and internal VIP are using same IP, SSL feature needs to be disabled, otherwise it will cause pip install failure:

------------------------------------------------------------

FAILED - RETRYING: TASK: pip_install : Install pip packages (fall back mode) (2 retries left).
FAILED - RETRYING: TASK: pip_install : Install pip packages (fall back mode) (1 retries left).
fatal: [infra01_galera_container-ff9ac443]: FAILED! => {"attempts": 5, "changed": false, "cmd": "/usr/local/bin/pip2 install -U --isolated --constraint http://10.240.169.102:8181/os-releases/master/requirements_absolute_requirements.txt ", "failed": true, "msg": "\n:stderr: Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', BadStatusLine(\"''\",))': /os-releases/master/requirements_absolute_requirements.txt\nRetrying (Retry(total=3, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', BadStatusLine(\"''\",))': /os-releases/master/requirements_absolute_requirements.txt\nRetrying (Retry(total=2, .......... Max retries exceeded with url: /os-releases/master/requirements_absolute_requirements.txt (Caused by ProtocolError('Connection aborted.', BadStatusLine(\"''\",)))\n"}

resolution is to edit:/etc/openstack_deploy/user_variables.yml

openstack_service_publicuri_proto: http
openstack_external_ssl: false
haproxy_ssl: false

Openstack-ansible playbook

to make openstack ansible build openstack, need to generate random password for components first then run playbook /opt/openstack-ansible/scripts. To Generate password:

python pw-token-gen.py --file /etc/openstack_deploy/user_secrets.yml

after fully deployed openstack ansible, lxc-attach onto utility container, find /root/openrc file for openstack environment vars.

In the case of reboot or all galera servers down and couldn’t be started, it may report error as mysql service failure, run openstack-ansible galera-install.yml --tags galera-bootstrap to recover it.

Openstack ceph-ansible

to make Openstack use cinder-ceph, we need to manually install ceph from git first.

git clone https://github.com/ceph/ceph-ansible/
cd ceph-ansible/
cp site.yml.sample site.yml
cp group_vars/all.yml.sample group_vars/all.yml
cp group_vars/mons.yml.sample group_vars/mons.yml
cp group_vars/osds.yml.sample group_vars/osds.yml

edit hosts file

[[email protected] ~]# vi inventory_hosts
[mons]
10.240.173.102
10.240.173.103
10.240.173.104

[osds]
10.240.173.102
10.240.173.103
10.240.173.104
10.240.173.105
10.240.173.106

[rgws]
10.240.173.102

verify they are reachable via ssh:

ansible -m ping -i hosts all

edit site.yml and unmark anything not needed edit group_vars/all.yml:

ceph_origin: upstream
ceph_stable: true
ceph_stable_release:jewel
monitor_interface: br-storage
journal_size: 1024
public_network: 10.240.173.0/24

edit group_vars/osd.yml to indicate which disks are used for osd and journal. then run the installation file: ansible-playbook site.yml -i hosts

make sure ceph health is OK, if it’s stuck at inactive stat, check mtu on eth ports.

if ceph -s complains too few PGs per OSD, then change osd pool num. For 10-50 osd, use 1024 pg.

ceph osd pool set rbd pg_num 1024
ceph osd pool set rbd pgp_num 1024

After installation, generate keyring on all ceph-mon nodes, otherwise Openstack-ansible will complain for missing keyring:

ceph auth get-or-create client.cinder
ceph auth get-or-create client.glance
ceph auth get-or-create client.cinder-backup

on mon, you can check keyring with ceph auth list.

also need to add permission for each client: ceph auth caps client.cinder mon 'allow *' osd 'allow *'

otherwise it will cause cinder-volume failure.

another option is to uncomment related settings inside ceph-ansible/groupvars/mons.yml

Access rbd image from ceph mon

To directly access ceph disk which is also the actual mounted vm/image/volume disk, use rbd map to map image to mon’s system, and mount to a folder to access. Use rbd -p poolname ls to show ceph images inside a pool, and use rbd -p poolname info imagename to see details.

on ubuntu 16.04, with ceph jewelle, some new features are enabled but not supported on ubuntu, need to disable these feature on a per volume mapping base rbd feature disable imagename deep-flatten fast-diff object-map exclusive-lock, and then rbd map pool/image will work. A /dev/rbdx will be generated and if it’s a right image it will have sub partition which can be mounted.

Nova access cinder-volume

To make nova able to attach or mount cinder volume, a rbd_secret_uuid need to be added on both cinder.conf and nova.conf. otherwise it will complain “`otype"error.

Horizon Issue

To fix URL option missing under image tab on openstack dashboard, add this line inside /etc/horizon/local_settings.py on all 3 controller’s horizon container.

IMAGES_ALLOW_LOCATION = 'true'

then restart apache2 or add this inside /etc/ansible/roles/os_horizon/templates/horizon_local_settings.py.j2, as by default, newer version of openstack has omit it for security concern.

To make original location visiable under glance image-list, change /etc/glance/glance-api.conf inside each glance container:

#display URL address
show_image_direct_url = True
#display available multiple locations
show_multiple_locations = True

then restart glance-api service

If seeing Cannot read property 'data' of undefined while creating new images can be multiple causes, check available stores defined for your input field, if it’s enabled with “ile on horizon, check HORIZON_IMAGES_UPLOAD_MODE under horizon local_settings.py; if it’s URL enabled, check your glance store setting. If use URL link, then all instance when they first boot will use this url to download image.

Glance Issue

add extra image location and path to an exiting image, also in order to authenticate with API calls we need a token, so

$ keystone token-get
+-----------+----------------------------------+
| Property | Value |
+-----------+----------------------------------+
| expires | 2015-05-06T14:22:16Z |
| id | 2602709084d64417b7f3480fccfa1785 |
| tenant_id | 486ab7509bfd46c386d4a8353b80a08d |
| user_id | 0b78d6793b1c4305ad6e76fa232b5a74 |
+-----------+----------------------------------+

and then reuse this token to make API call

$ curl -i -X PATCH -H 'Content-Type: application/openstack-images-v2.1-json-patch' \
-H "X-Auth-Token: 2602709084d64417b7f3480fccfa1785" \
http://192.168.0.60:9292/v2/images/90674766-dbaa-4a6e-a344-2a4116af9fab \
-d '[{"op": "add", "path": "/locations/-", "value": {"url": "rbd://5de961fb-2368-4f77-8725-7b002732e214/images/7bb0484c-cb6b-4700-88bb-0a18b8f3a8f5/snap", "metadata": {}}}]'

HTTP/1.1 200 OK
Content-Length: 955
Content-Type: application/json; charset=UTF-8
X-Openstack-Request-Id: req-req-29faba33-657e-4959-b508-fcffe8081d8f
Date: Wed, 06 May 2015 14:21:21 GMT

{"status": "active", "virtual_size": null, "name": "CirrOS-0.3.3", "tags": [], "container_format": "bare", "created_at": "2015-05-06T09:29:40Z", "size": 13200896, "disk_format": "qcow2", "updated_at": "2015-05-06T14:21:20Z", "visibility": "private", "locations": [{"url": "rbd://5de961fb-2368-4f77-8725-7b002732e214/images/90674766-dbaa-4a6e-a344-2a4116af9fab/snap", "metadata": {}}, {"url": "rbd://5de961fb-2368-4f77-8725-7b002732e214/images/7bb0484c-cb6b-4700-88bb-0a18b8f3a8f5/snap", "metadata": {}}], "self": "/v2/images/90674766-dbaa-4a6e-a344-2a4116af9fab", "min_disk": 0, "protected": false, "id": "90674766-dbaa-4a6e-a344-2a4116af9fab", "file": "/v2/images/90674766-dbaa-4a6e-a344-2a4116af9fab/file", "checksum": "133eae9fb1c98f45894a4e60d8736619", "owner": "486ab7509bfd46c386d4a8353b80a08d", "direct_url": "rbd://5de961fb-2368-4f77-8725-7b002732e214/images/90674766-dbaa-4a6e-a344

Ceilometer issue

Ceilometer only works with mongodb, openstack-ansible doesn’t have mongodb role, so we need to manually install it.

apt-get install mongodb-server mongodb-clients python-pymongo

add smallfiles = true in /etc/mongodb.conf, restart the service and add ceilometer user

mongo --host 127.0.0.1 --eval 'db = db.getSiblingDB("ceilometer"); db.addUser({user: "ceilometer", pwd: "CEILOMETER_DBPASS", roles: [ "readWrite", "dbAdmin" ]})'

then add following in user_variable.yml

ceilometer_db_type: mongodb
ceilometer_db_ip: localhost
ceilometer_db_port: 27017

this way, we make each ceilometer to use its own local mongo database

Nova boot process illustration

Rabbitmq

To have a general view of what’s going on with AMQP traffic, we need to access rabbitmq GUI.

Enable the mgmt GUI plugin rabbitmq-plugins enable rabbitmq_management:

rabbitmqctl add_user test test
rabbitmqctl set_user_tags test administrator
rabbitmqctl set_permissions -p / test ".*" ".*" ".*"

SRIOV config

  1. Change /etc/default/grub:

    GRUB_CMDLINE_LINUX_DEFAULT="nomdmonddf nomdmonisw intel_iommu=on
    

update-grub then add vif echo '7' > /sys/class/net/eth6/device/sriov_numvfs 2. Change compute /etc/nova/nova.conf to enable vif passthrough

[default]
pci_passthrough_whitelist = { "devname": "eth6", "physical_network": "sriov"}

then service nova-compute restart

  1. Change neutron server nodes to support sriov:
    #/etc/neutron/plugins/ml2/ml2_conf.ini
    mechanism_drivers = sriovnicswitch
    

(optional)add /etc/neutron/plugins/ml2/ml2_conf_sriov.ini: supported_pci_vendor_devs = 8086:10ed then service neutron-server restart

  1. Add on each nova-scheduler node
    [DEFAULT]
    scheduler_default_filters = PciPassthroughFilter
    

then service nova-scheduler restart

  1. Each compute nodes run:
    apt-get install neutron-plugin-sriov-agent
    

And modify: #/etc/neutron/plugins/ml2/sriov_agent.ini [securitygroup] firewall_driver = neutron.agent.firewall.NoopFirewallDriver [sriov_nic] physical_device_mappings = sriov:eth6 exclude_devices = Then apply new ini to agent: bash neutron-sriov-nic-agent \ --config-file /etc/neutron/neutron.conf \ --config-file /etc/neutron/plugins/ml2/sriov_agent.ini Then change neutron.conf to enable tlsv1.2, as default tls1 is not supported by rabbitmq anymore

kombu_ssl_version = SSLv23

Then service neutron-sriov-agent restart

OVS traffic capture

OVS traffic flow illustration(kolla example):

  1. traffic to go out of cloud via provider network
    VM –> tap+qbr(linuxbridge)+qvb –> qvo+br-int+int-br-ex –> phy-br-ex+br-ex+br_vlan –> external network
  2. traffic to go to vxlan tenant
    VM –> tap+qbr(linuxbridge)+qvb –> qvo+br-int+patch-tun –> patch-int+br-tun+port vxlan# –> remote host vxlan if ip

if no DVR used, then all traffic will go to neutron nodes from compute nodes then use neutron nodes’ port# to go out.

if DVR used, every host has a qrouter(same mac+IP), then when there’s no float IP for vm,  it can go out right from compute, don’t need to go to neutron; if there’s float IP, the float IP will reside on neutron node, so traffic need to go from vm to neutron first, then NATed and send to external, and when initiated from external, it will first hit neutron’s float IP, then filtered and NATed to vm.

Regular tcpdump can be done on Host’s port, but that only usable down to qvo. for patch-tun –> patch-int, you need to do following:

$ ip link add name snooper0 type dummy
$ ip link set dev snooper0 up

$ ovs-vsctl add-port br-int snooper0

$ ovs-vsctl -- set Bridge br-int mirrors=@m \
  -- --id=@snooper0 get Port snooper0 \
  -- --id=@patch-tun get Port patch-tun \
  -- --id=@m create Mirror name=mymirror select-dst-port=@patch-tun \
  select-src-port=@patch-tun output-port=@snooper0 select_all=1

You can then try to do the tcp dump:

$ tcpdump -i snooper0

To clear it:

$ ovs-vsctl clear Bridge br-int mirrors
$ ovs-vsctl del-port br-int snooper0
$ ip link delete dev snooper0