Use HyperFIDO HYPERSECU on Linux

To use HyperFIDO HYPERSECU on Linux you must have access to the created device. So you need to add a udev rule to set up the permissions. There are two examples on the support page, but for the systemd-uacces method, one information is missing.

You can use the HyperFIDO-k5-1.rules approach:

ACTION!="add|change", GOTO="u2f_end"

KERNEL=="hidraw*", SUBSYSTEM=="hidraw", ATTRS{idVendor}=="096e", ATTRS{idProduct}=="0880", TAG+="uaccess"

LABEL="u2f_end"

But you need to rename it to something smaller than 70- in /etc/udev/rules.d/, because the uaccess-Tag is processed in 70-uaccess.

E.g. /etc/udev/rules.d/69-hyperfido.rules.

Flattr this!

Starting from scratch

This post is part of a series how we as SysEleven manage a large number of nodes. We deploy OpenStack on those nodes, but this could be basically everything.

For sure, this is not our first attempt to deploy a managable OpenStack platform. We not only deployed this platform, we also deployed a platform based on Virtuozzo, which is still in heavy use for our managed customers. We have a whole bunch of learnings from deploying and managing the mentioned platforms, which leds us to the decision to start from scratch.

In the beginning of this year, there is basically nothing, but a rough plan. Not even a single line of code, neither software, nor some kind of automation. With this project, we were able to break with everything we run in the company. We were (and still are) allowed to question each software and hardware decission. We started on a greenfield.

There may be much things you will put into question. Most of what we did is inpired by what others did. Nothing is new, or kind of rocket science. Sometimes we needed to take the shortest path, because we had a deadline to mid of the year to deploy a stable high available OpenStack platform.

So, where to start over?

First of all, we needed to drill down the issues we encountered with the previous deployments. We went from vague ideas to concret problems with soft- and hardware solutions we used. But the larges problem was a lack of formal definition of what we want to build, and how we want to do that.

So the very first step was to change this. We wrote a whole bunch of blueprints, capturing a high level view of the cluster. Some of them were very precise, like the decision for Ansible as our primary configuration management system, although the former cluster was build with Puppet. Or the decission for network equipment, how we plug them together and how we operate it. Some other very specific blueprints described that we use a single mono-repo, how we manage the review process, everything we script has to be done in Python of Go, styleguides for those language and Ansible, how we handle dependencies in Ansible, that we are going to solve our current problem, and not every thinkable future problem, and so on and so forth. There were some vague blueprints about: We need a way to get an overview of the current cluster state. Not just what we expect. We need a system to initially configure and track our nodes.

All blueprints are meant to change. Not in a whole. But each time, someone diggs into the next topic, the blueprint is extended with the current knowledge and needs.

So we had a rough overview of what we need. We split the team into two groups. As we knew that we had to deploy a working OpenStack, one group started to deploy OpenStack, with all the components needed via Ansible. The primary goal still is to provide a high available OpenStack. The group I belong to works on the individual software we need to provide a basic configuration for our nodes, keep track of them and keep them up to date.

I joined the team after 8 month off, to take care of my boy. At that point, the basic blueprints were written and the team was consolidated. And to be honest, I am not sad to miss this process, and espacially not the path leading to this.

Where do we get from there?

We dedicated three nodes to be our bootstrap nodes, so called bootinfra. They are mostly installed and configured by hand.

Within half of the year, we were able to plug a new node into a rack. This shows up in our MachineDB where we configure the location of the node. After this, a very basic configuration happens automagically. The node configures its designated fixed ip addresses, hostname and fqdn, some bgp configuration, setup of repos, setup of ssh configuration. Just very basic stuff to reach the node. The next step is still very manual. We assign the node to a group in our Ansible Inventory File, and run Ansible by hand.

From the Ansible point of view, there is no change since then, even there was much process in detail. But we were able to deploy two other cluster in the same manner. One of them is our lab, which shares the production bootinfra. That tends to be easier, but it threw up a whole bunch of problems. For the other cluster, we needed to automate much of our (so far handcrafted) bootinfra. Which now pays out, since we are about to replace the initial bootinfra nodes by automated ones.

Next time, I will write about our bootinfra. Not too much about the configuration, but which services we run, and for what reasons.

Flattr this!

Managing a large number of nodes

Imagine following setup. As surly mentions somewhen in this blog, we are running a cloud based on OpenStack. The team, I am working in, is responsible to provision new nodes to the cluster, and manage the lifecycle of the existing ones. We also deploy OpenStack and related services to our fleet. One hugh task is running OpenStack high available. This seems not too difficult, but also means, we have to make each component OpenStack depends on, HA as well. So we use Galera as Database, Quobyte as a distributed storage, clustered RabbitMQ, clustered Cassandra, Zookeeper, and things I may have forgotten.

I will write about some aspects of our setup, how we deploy our nodes, and how we keep them up to date.

Flattr this!

Dropping weekly format

Turns out. Like expected I am not able to write one post per week, and I didn't manage to pick up after two weeks of vacation. When reading the weekly posts again, they seem kind of weird. Which is not bad, but nut worse the time. So I call the my first attempt to write a series to be failed.

Anyway. Natually, not each week is equally interresting. I will try to pick up interresting topic here, and try to go a bit into detail.

Flattr this!

Weekly 2016/37

No such meeting Monday. This week, our all-hands-meeting was moved to Tuesday. Because most people in my team were on vacation, we did a short daily instead of a 1 hour weekly. More time to do real work :) I returned to the cluster wide unattended update. I worked at least half of the day on this topic. Another look into the networking setup of our lab and some more documenation did the rest of it.

Back in office on Wednesday after a day of sickness. From Tuesday on, the production cloud was kind of slow. In terms of: Creating a bare network took 6 seconds in average, instead of 2 seconds in the lab. Digging into the problem showed a problem in filling up one rabbitmq queue. Usually this should be consumed by a telemetry solution, which is not deployed correctly so far. I removed all the messages in the queue and everything went back to normal again.

On Thursday, I had not too much time, to digg into unattended updates again. Just an hour in the morning. This week is strange, with many little disturbances, and I literally forgot all the small stuff. After lunch we did a complete wipe and reinstall of our lab. I left after 2 hours of installation for 10 hardware nodes. Does not seem to be pretty fast, but in fact this is ok. At least we had to do the process on our first 3 nodes (those form our consul cluster, now) twice, because of an expired pgp key. My colleagues Continues with running Ansible on the nodes, without huge problems.

Although, the installation on the day before was pretty ok overall, I cleaned up some stuff on Friday. With some minor changes, the installation is smoother than Thursday. I did not run an entire installation. But shut off the consul cluster to simulate the "nothing is there"-situation. This worked pretty well. I did a second reinstallation, but with consul turned on. This is the usual scenario if we grow our existing cluster by new machines. I finalized the installation by applying Ansible. Because the node, I selected, was a gateway of our SDN, there was a little problem with a new UUID for this node, but easily to resolve by hand. This sceanrio is not usual, so I just documented this issue and did not automate it away. Furthermore, I extended the existing documentation by things, I saw yesterday, but not yet in the documentation. Somewhen in between, we had a call with our storage vendor about the next steps to happen. The afternoon I prepared a update of consul 0.6.4 to 0.7. Its going to be bad:

"Consul's default Raft timing is now set to work more reliably on lower-performance servers, which allows small clusters to use lower cost compute at the expense of reduced performance for failed leader detection and leader elections. You will need to configure Consul to get the same performance as before. See the new Server Performance guide for more details. [GH-2303]"

In other words: It did not work well on raspberry pi, we fixed this, but destroyed it for everyone else. And for sure, 0.6er consul refuse to start because of an unkown key, if I already would place the new paremeter to all config files :(

Oh. Almost forgotten. Last week I wrote about my plans of cleaning up my vim modules. And with this, a switch to neovim. Turned out. Cleaning up stuff went well. The neovim experiment not. It felt ok in most points, but not better then good old vim. And one thing let me scream several times, and go back vim. In some/most cases a hit on ESC took a while to complete. That long, that I typed some more characters before leaving insert mode. An I'm escaping a lot!

Flattr this!

Weekly 2016/36

Starting with meeting Monday. The first half of the day was gone for this. After lunch, I had to add another authentication mechanism to our MachineDB. In fact this is a Flask-app which uses ldap as an authentication backend. We have to put this part of the software into a place without any access to our ldap. Since we use postgres as the DB backend, I created another authentication backend, using postgres. In the afternoon I helped a fellow with some scripts to setup floating ip networking to our lab cluster.

On Tuesday, I continued my work on the authentication backend and got it merge ready. More debugging was done at the floating ip creation stuff. Because the script is more complex than I hoped, I created the networks by hand and was able to configure our integration test to run against the lab.

The next day was a short day for me. I figured out some problems detected by the tempest run. Some strange errors happen due to the ordering of openstack endpoints. The config parameter v3_endpoint_type has been ignored, and adminURL was used by accident. By accident, since it really depended on the sorting of endpoints. I fixed this by updating the id of the endpoint in the keystone database. Beside this, I did some daywork.

Thursday, was kind of boring. First of all, it was a short day again. So primarily I did some daywork, and prepared a backup-presentation for the Berlin Ops Summit on Friday. Turned out, we didn't need it.

Friday was Conference day. I liked the Talk of Jeff Sussna. Most other talks were ok, but not new to me. The most entertaining talk was UI engineering by Dennis Reimann and Jan Persiel. The closing party was great :)

Btw: I am on my way to re-create my vimrc in terms of throwing away modules and settings, I don't use any more. I take the chance to switch to neovim in this step. Lets see how far I come with this.

Flattr this!

Linux Kernel Features…

… mit dessen Hilfe Container-Runtimes gebaut werden. Keine Garantie für die Vollständigkeit dieser Liste.

chroot

z.B. Alpine Linux in chroot

Um den gestarteten Prozessen ein anderes / vorzugeben. Noch nicht mal ein Linux-Feature und wurde bereits 1979 in Unix implementiert.

# Ich habe da mal was vorbereitet.
export mirror=http://dl-3.alpinelinux.org/alpine/
export chroot_dir=/root/alpine
export version=2.6.7-r0
wget ${mirror}/latest-stable/main/x86_64/apk-tools-static-${version}.apk
tar -xzf apk-tools-static-*.apk
./sbin/apk.static -X ${mirror}/latest-stable/main -U --allow-untrusted --root ${chroot_dir} --initdb add alpine-base
mknod -m 666 ${chroot_dir}/dev/full c 1 7
mknod -m 666 ${chroot_dir}/dev/ptmx c 5 2
mknod -m 644 ${chroot_dir}/dev/random c 1 8
mknod -m 644 ${chroot_dir}/dev/urandom c 1 9
mknod -m 666 ${chroot_dir}/dev/zero c 1 5
mknod -m 666 ${chroot_dir}/dev/tty c 5 0
cp /etc/resolv.conf ${chroot_dir}/etc/
mkdir -p ${chroot_dir}/root
mkdir -p ${chroot_dir}/etc/apk
echo "${mirror}/${branch}/main" > ${chroot_dir}/etc/apk/repositories

mount -t proc none ${chroot_dir}/proc
mount -o bind /sys ${chroot_dir}/sys

chroot ${chroot_dir} /bin/sh -l

Namespaces

Definiert was Prozesse sehen und machen können.

Mount

unshare -m /bin/bash
mount -t tmpfs tmpfs /tmp

touch /tmp/blah
ls -l /tmp ### nur blah


Anderes Terminal
ls -l /tmp ### orig-stuff aber kein blah

PID

unshare --pid /bin/bash
ps -ef ## segmentation fault, da fehlt wohl noch was

unshare --pid --mount-proc -f /bin/bash
ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 08:26 pts/1    00:00:00 /bin/bash
root         4     1  0 08:26 pts/1    00:00:00 ps -ef

User

Wichtig, als != root starten

id -u ## 500
unshare --user --map-root-user /bin/bash
id -u ## 0

Trotzdem hat man keine root-Berechtigung!

mount -o tmpfs tmpfs /tmp  ## permission denied

Network

man ip-netns

ip netns list
ip netns add testing
ip netns exec testing ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1

ip link add veth0 type veth peer name veth1
ip link set veth1 netns testing
ip a ## kein veth1

ip addr add 192.168.0.1/32 dev veth0
ip route add 192.168.0.0/24 dev veth0
ip netns exec testing ip link set veth1 up
ip netns exec testing ip addr add 192.168.0.2/32 dev veth1
ip netns exec testing ip route add 192.168.0.0/24 dev veth1

ping 192.168.0.2 ## works

ip netns exec testing ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1
7: veth1@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
   inet 192.168.0.2/32 scope global veth1

Capabilities

Gezielt Fähigkeiten von Prozessen beschränken. Auch wenn der Prozess als effective user 0 (root) läuft.

man capabilities

mount -t tmpfs tmpfs /mnt
umount /mnt
capsh --drop=CAP_SYS_ADMIN --
id -u  ## 0
mount -t tmpfs tmpfs /mnt  ## permission denied

cgroups

Resourcen von Prozessen einschränken. Beispiel Memory.

cd /sys/fs/cgroup/memory
mkdir testing
cd testing
ls -l  # zeigt u.a. tasks

# 2. Console auf und dort die PID ermitteln
ls -l  # funktioniert
echo $$

# Zurück in Console 1 und bash der Console 2 in die cgroup testing aufnehmen
echo PID > tasks
cat tasks
PID

echo 1 > memory.limit_in_bytes

# Console 2
ls -l
Killed

Wie man sieht hängen alle Child Prozesse auch in dieser cgroup!

Weitere Resourenlimits:

  • CPU
  • blkio
  • devices
  • pids (Process numbers)

Siehe die Kernel Dokumentation zu cgroups.

seccomp

secure computing mode. Gezieltes filtern von system calls. Leider habe ich hier kein einfaches Beispiel.

AppArmor / SELinux / SMACK

Mandatory Access Control im Linux Kernel. Es gibt verschiedene Implementierungen. Ubuntu verwendet z.B. AppArmor.

Auf Ubuntu

vi /etc/apparmor.d/usr.sbin.rsyslogd
...
/usr/sbin/rsyslogd {
  # rsyslog configuration
  /etc/rsyslog.conf r,
  /etc/rsyslog.d/ r,
  /etc/rsyslog.d/** r,
  /{,var/}run/rsyslogd.pid rwk,
  /var/spool/rsyslog/ r,
  /var/spool/rsyslog/** rwk
 ...
}

AppArmor wird unter Ubuntu auch dazu verwendet den Zugriff von VMs-Prozessen einzuschränken. Damit wird der Schaden potentielle Schaden bei einem Ausbruch aus der VM verringert.

Zum Schluss

Alle der genannten Features existieren schon einige Jahre im Linux-Kernel. Vorläufer davon existieren in Form von OpenVZ / Parallels Virtuozzo bereits seit ca. 2000. Allerdings außerhalb des Mainline Kernels gepflegt. Die Features können nicht nur dafür benutzt werden um Container Runtimes wie Docker oder CoreOS rkt zu erstellen, man kann auch "regulär im System" laufende Anwendungen damit einpferchen. Auch systemd bietet Flags, die einige der angesprochenen Techniken verweden um Prozesse besser voneinander zu trennen. Siehe dazu Security Features in systemd.

Flattr this!

OpenStack-Tempest No nw_info cache associated with instance

When using Tempest to test an OpenStack-Cloud, it is easy to run into this error

tempest.lib.exceptions.BadRequest: Bad request
Details: {u'message': u'No nw_info cache associated with instance', u'code': 400}

In fact, networks are not configured proper. Tempest needs a full configured network, connected to the external network, to get some things working. This must be available to all tempest projects/tenante, hence it is enough to create it for each, but use the same name. I created a heat stack for this, and called this network needed-for-tempest. .. code:

heat_template_version: 2014-10-16

description: >
  Setting up a private Network with an snat-router

parameters:
  public_net_id:
    type: string
    description: ID of public network for which floating IP addresses will be allocated

resources:

  ### Creates a Neutron network and subnet
  network:
    type: OS::Neutron::Net
    properties:
      name: needed-for-tempest

  subnet:
    type: OS::Neutron::Subnet
    properties:
      name: needed-for-tempest1
      dns_nameservers:
        - 37.123.105.116
        - 37.123.105.117
      network_id: { get_resource: network }
      ip_version: 4
      gateway_ip: 10.0.0.1
      cidr: 10.0.0.0/24
      allocation_pools:
        - { start: 10.0.0.230, end: 10.0.0.240 }

  router:
    type: OS::Neutron::Router
    properties:
      external_gateway_info: { network: { get_param: public_net_id } }

  router_subnet_connect:
    type: OS::Neutron::RouterInterface
    depends_on: [ subnet, router ]
    properties:
      router_id: { get_resource: router }
      subnet: { get_resource: subnet }

Of cource, tempest.conf has to be edited as well:

[network]
fixed_network_name = needed-for-tempest
Flattr this!

Weekly 2016/35

This is the third approach to write a weekly, and the first one, I totally failed. I didn't make any notes of the week :/

But let's start anyway. This week was covered by a re-deployment of our lab. As expected several things went wrong, and we had to fix it iteratively. Nothing special so far. This couldn't be sticked to one special day. Like describe in the last weeks, I digged around in our network setup. This week, I took my time and documented it down into our IPAM nipap. That includes all the transfer nets and all public IPs assigned to us. In case of service IPs, I documented the "function" as well. We are now able to collect new IPs directly from this pools, without having to grep config files, and hope you catched everything.

On Wednesday, one day with fixing lab-stuff, I got the request to benchmark our S3. Which I started on my short Thursday. I decided to write a script in Go to parallelize the work. This would be perfectly doable in python as well, but I wanted to use something new. First results were collected on Friday morning. Over and over again in this week, I helped a fellow to configure the forward proxy corretly.

Beside of the S3-work, I was in our change commission, to discuss what things we can do on our own, and what should be discussed again in this commission. Lucky us, we can do most of the things, I wanted to.

Friday, started with collecting S3 benchmarks. Then an outage of our cinder-volume occured, which I helped to fix. For sure, since one of my changes broke the configuration :( Fortunately, it resulted in a loss of control and running instances were not affected. I wrote the post-mortem about this outage, and filed some tickets to fix missing bits in our monitoring. After this, I fixed an issue with our StorageManager. It creates new logical volumes on cluster bootstrap, but in our configurations they are very likely to be at the same position like before. So xfs restored the filesystem instead of creating a new one. I now wipefs the new created lv.

Currently I am graphing the S3-results with jupyter notebook, which is really cool. Give it a try. I installed mathplotlib and numpy as well. The performance in average is pretty ok, both up and download. But the long tail does really hurt. Some of the requests took around 8 seconds, while the average is around 0.08 seconds.

What did I test? I uploaded around 12500 files from a pool of 25 unique images. Using 5 uploads in parallel, and parallel to this, 10 downloads of random images. Both over and over again. I could even do more in parallel. Anyway. I we got some tickets about slow S3-requests. But like all tickets without any steps to recreate. The result we see here, is exactly what I wanted to get out of this tests. In short-term we are going to update our S3-Layer to the latest version. When the results are still the same, we can file a bug against our vender. Of course with steps to recreate ;) Maybe the latest version will be in lab somewhen next week.

Flattr this!

Weekly 2016/34

Starting this week with minor devel tasks. Two weeks before, I implemented a feature in our machine database, to be able to talk to different consul datacenters. There were some review comments, I had to fix. This enables us to talk to reach the consul cluster of our lab, even though the machine database is located in a different cluster.

My colleagues started to roll out our ansible roles to the lab cluster. Since we changed lots of stuff to support multiple clusters, we expect the rollout to be painful. And it did not last long to run into the first error. The storage manager crashes, due to a wrong usage of pathlib.PosixPath. I fixed those issues, and built debian packages as well. Whereas building debs was the most painful part of this task.

At some point in the rollout, we were unable to reach all of the lab hosts, but only from two hosts, including one ouf our cluster workstations. The rest was working fine. I discovered "strange" routes on the ToR of lab, and was able to track down the issue with the help of our NetOps. I learned to ask bird for learned and exported routes. This quickly showed, two hosts in the lab exported the same ip address like the production cluster. With this information, a fellow reconfigured the canonical ip addresses in his host-vars and re-rolled ansible. After bringing the wrong (duplicate) address down, everything worked fine.

Tuesday, we continued our work on the lab plattform. After some minor changes in our ansible roles, we discovered some inconsistency in ASN numbering. I reverse-engineered the remaining parts of the bgp configurations of both clusters and introduced a new ASN policy. This included a change to all the ASN in our new lab. Which means a temporarily loss of control of all the nodes in the lab. Fortunately the configuration was successful and all nodes re-announced their IPs. The configuration change was done with our NetOps again, because we need some changes on devices, we don't have access to. After all the changes the connection between production and lab were established again, and the loadbalancers in the lab were able to annonce public ips. We have an accessable lab now :)

At some time on Tuesday, I tried to get metrics from Consul. Turned out, Consul is only able to stream changes to something like statsd.

Bring all the stuff together. One of my colleagues mainly changed many of our ansible roles, to make the lab possible. In fact, the rollout of the last days was done by these changes. He started a while ago, and so master diverged a lot, resulting in a hugh, unmaintainable merge request. Since the cluster rollout worked fine, we split the merge request into smaller ones, that we merge into master, without affecting production. Just one large refactoring of the loadbalancer role remaining. After getting all the v4 bgp-sessions alive during the last days, today I fixed issues with the v6 configuration an brought up all the sessions as well. This was the last bit for a fully working underlay-network :) The afternoon I documented things for baremetal deployment and the network configuration.

On Thursday we did some minor fixes to the loadbalancer role and finally merged it into master. Another feature of yet another colleagues became merge ready, and we brought this to master as well. Since it is not yet in any playbook, this will not affect any running services. After those merges, we applied the entire new master on our lab. The next step would be to re-deploy the lab from master. Since another team would like to participate, we scheduled this into next week. I tried to check some network settings with our SDN Midonet, as a prerequisite for running the integration test-suite, and discovered missing configuration of external IPs and BGP sessions. It was barely automated with a shell script, and it was available in our repo, but never called automatically on bootstrap. However, this script does not perform any checks whether the objects it tries to create, are already available. It also works only for our production cloud, even if it should never ever be called against it, due to the lack of checks. Anyway, it is a nice task for our new team member, to rewrite most of the script as an ansible role, that could run regulary. Since we need this script updated for the lab deployment anyway, we just wait for this rewrite.

Friday, and this week one of my short days, and the lab works good enough to test some things. First I helped out Mathias to setup the Loadbalancers with the config of production again. Then we migrated to the new loadbalancer role, without any problems. Yeahie. The rest of the day, I tried a fix to the configuration of all OpenStack-APIs, I did a few weeks ago. Due to a lack of a "not production" environment, I was not able to test it so far. Today I did, and the first shot worked well for all services except the both relevant ones nova and neutron. The latest version of python-openstacksdk does not work against our neutron. We use the traditional OpenStack service ports, but using https. The services http-links in api-json-results. If the client follows those links, it talk http to the https endpoint resulting in a bad request (400). The solution is to configure oslo_middleware.HTTPProxyToWSGI for each service. With this, all the returned json-links are https.

Before:

> curl https://api.cbk.cloud.syseleven.net:9696/ | jq '.'
{
  "versions": [
    {
      "status": "CURRENT",
      "id": "v2.0",
      "links": [
        {
          "href": "http://api.cbk.cloud.syseleven.net:9696/v2.0",  <<===
          "rel": "self"
        }
      ]
    }
  ]
}

After:

> curl https://api.zbk.cloud.syseleven.net:9696/ | jq '.'
{
  "versions": [
    {
      "status": "CURRENT",
      "id": "v2.0",
      "links": [
        {
          "href": "https://api.zbk.cloud.syseleven.net:9696/v2.0", <<==
          "rel": "self"
        }
      ]
    }
  ]
}

The first shot did not work for nova and neutron, because the versions-endpoints had a different pipeline configuration, than all the other stuff. I just needed to add HTTPProxyToWSGI to those endpoints and everything worked fine. Because some oslo_middleware developers decided to disable HTTPProxyToWSGI by default for a strange reason. This flag is not yet available in OpenStack Mitaka, but the default will break our deployment on an update to Newton. So I set enable_proxy_headers_parsing = True in all our OpenStack Services, and let it be ignored in the current deployment.

Flattr this!