Building a DIY SOHO router, Part 4

Building a DIY SOHO router using the Yocto Project build system OpenEmbedded, Part 4

In part three of this series I finished putting together what I wanted to have on my SOHO router and declared it to be done. While I plan to revisit the topic of a SOHO router using the Yocto Project and OpenEmbedded, this is the final part of the series. In this part, I want to focus on some of the things that I learned while doing the project.

The first thing is that I learned a lot about IPv6, specifically how it’s usually implemented within the United States for residential customers, and some of the implications of this implementation. The first thing to note is that I’ve been off-and-on trying to enable IPv6 for general IPv6 connectivity at home for some time now. Long before my ISP offered IPv6 service, I used Hurricane Electric to have a IPv6 tunnel and connectivity. This was great and only sometimes lead to problems, such as when Netflix finally supported IPv6 and began blocking well known tunnels for region-blocking reasons. It wasn’t until I started on this project that I decided to try to make real use of  routable addresses for hosting personal services. My expectations, and that of lots of software designed to manage IPv6 as well, are best described in article from RIPE about understanding IP addressing. In short, my house or “subscriber site” should get at least 256 LAN segments to do with as I want. Docker expects to have it’s own LAN segment to manage as part of configuring network bridges. When you have 256 or more LAN segments to use, that’s not a problem at all.

Unfortunately, my ISP provides only a single LAN segment. This is simultaneously more IPv6 addresses than the whole of IPv4 and something that should not be further subdivided in routing terms. I could subdivide my LAN segment, but this would in turn cause me to have to do a whole lot more work and headaches. That’s because at the routing level IPv6 is designed for my segment to be the smallest unit. Rather than deal with those headaches I switched my plans up from using Docker to using LXC. With LXC it’s easy to dump the container onto my LAN and then it picks up an IPv6 address the same way all of the other machines on my LAN do. This is good enough for my current needs, but it will make other things a lot harder down the line, if I want separation at the routing level for some devices.

But why am I doing that at all? Well, one of the benefits of having a small but still capable router is that I can run my own small services. While I don’t want to get into running my own email I think it makes a whole lot of sense to host my own chat server for example. With closed registration and no (or limited later on perhaps) federation with other servers I don’t need to worry about unauthorized users contacting my family nor do I have to worry about the company deciding it’s time to shutdown the service I use.

Another lesson learned is that while the Yocto Project has great QA, there’s always room to improve things. As part of picking a firewall I found that one of the netfilter logging options had been disabled by accident a while back. As a part of writing this series of articles and testing builds for qemux86-64, I found that one of the sound modules had been disabled. As a result, the instructions I wrote back in part 2 wouldn’t work. Working upstream is always fun and these changes have been merged and will be included in the next release.

I also worked on a few things for this project that I didn’t include directly in the relevant part of the series. For example, while I did include a number of full utilities in the list of packages installed in the router, I didn’t talk about replacing busybox entirely. This is something that OpenEmbedded supports using the PREFERRED_PROVIDERS and VIRTUAL-RUNTIME override mechanisms in the metadata. Prior to this article however, there wasn’t a good example on how to do this in upstream. Furthermore, there wasn’t an easy way to replace all of busybox and instead you had to list a single package and then include the rest of the required packages in your IMAGE_INSTALL or similar mechanism.  I am a fan of using busybox in places where I’m concerned about disk usage. However, on my router I have plenty of disk space so I want to be sure that if I have to go and solve a problem I’m not using my swiss army knife but rather have my full toolbox available. As a result, OpenEmbedded Core Master now has packagegroup-core-base-utils and a documented example of how to use that in local.conf.sample.extended. This means that when I refresh this image to be based on the Warrior branch I can remove a number of things from my IMAGE_INSTALL.

Another lesson is that old habits die hard.  In general, I always try to use the workflow where I make a change outside the device I’m working on, build the change in, and test it, rather than editing things live.  But when it’s “just” a quick one line change I’ll admit I do it live and roll it into my next build sometimes.  And then sometimes I forget to roll all my changes back up.  So while implementing this project I tried even harder than usual to not fall into that “just a quick change” mindset.  For the most part I’ve been successful at sticking to the idea workflow.  I really believe stateless is the right path forward.  And “for the most part” means that, yes, one time I did have to make use of the fact that the old rootfs was still mountable and copied a file over to the new rootfs, and then to the build machine.  I like to think of that as a reminder that A/B updates are more helpful than a “rewrite your disk each time” workflow for those occasional mistakes.

The caveat to the lesson above is because I really did the “git, bitbake, mender” cycle on this project. I didn’t start on it quite as soon as I said in the article, and I spent a lot more time toying with stuff in core-image-minimal instead of following my own advice, too.  I suppose that is the difference between writing a guide on how things should be done compared with how you do things when you just want to test one more thing, then switch over.  I really should have switched earlier however as every time I avoid doing the SD card shuffle it’s a win on a number of levels.

Did I say SD card above?  Yes, I did.  For this project, a 64GB “black box” that’s in the form-factor of a SD card will have as long of a life span as there is in the form-factor of a M.2 SSD or any other common storage format.  While my particular hardware has a SATA port, I don’t want to try to fit the required cabling, let alone the device itself in the case that’s recommended.  I will admit that I’m taking a bit of a risk here, I am putting as much frequent-write files under a ramfs as I can and after all, I did say stateless is a goal.  If everything does really die on me, I can be back up and running fairly quickly.

Last thing I learned is something I knew all along, really. I like the deeper ownership of the router. There’s both the pride and accomplishment in doing it and that “old school” fun of being the admin again, for real.

Building a DIY SOHO router, Part 3

Building a DIY SOHO router using the Yocto Project build system OpenEmbedded, Part 3

In part two of this series I created a local configuration layer for OpenEmbedded, and had the build target core-image-minimal producing an image. The image that was produced wasn’t really a router, but did let us bring up our board and look around. In this article, I’m going to create a custom image and populate it with additional software packages configured to my  requirements. I’m also going to get started using Over-The-Air (OTA) software updates on the device.

Now that I’ve proven that the image works on the hardware, I can really get down to implementing the project of making a router.  While I could continue to add things to core-image-minimal, it really makes sense at this point to stop and create my own image. Since I want something relatively small, I will still start with core-image-minimal as the base.  Moving back over to meta-local-soho, I’m creating the recipes-core/images directory and then populating core-image-minimal-router.bb with:

require recipes-core/images/core-image-minimal.bb

DESCRIPTION = "Small image for use as a router"

IMAGE_FEATURES += "ssh-server-openssh"
IMAGE_FEATURES += "empty-root-password allow-empty-password allow-root-login"

IMAGE_INSTALL += "\
    "
MENDER_STORAGE_TOTAL_SIZE_MB = "4096"

This tells bitbake that it must have core-image-minimal.bb available and to include it. I then provide a new DESCRIPTION to describe the new image. Next, I include a number of new features in the image. First, I’ll use the normal hook for adding a SSH server. Then I’ll add a line of features for development mode that I’ll remove later.  These features, as their names imply, allow for root to login without a password. This is quite handy for development and quite unwise for production. I’ll circle back and remove these development features later. Next, I give myself an empty list of additional packages to be filled out later. Finally, I tell Mender that it has 4096 megabytes of disk space to work with.  I’m going to hide space from Mender so that I can entirely control that part myself instead. At this point I can build core-image-minimal-router, and it will complete very quickly as I’ve not yet added any packages that have not been previously built. So it’s time to once again git add, git commit, and bitbake these changes.

At this point, I want to flash the new image onto the device and boot it up. The reason for this is that the new image can be used with Mender to test any subsequent image builds. The system is now functional enough to support delivering new image updates via Mender, so it’s good to get into the habit of using the OTA update workflow. It also forces me to treat the device as if it’s really stateless. I’ll talk about how to apply an OTA update when I make the next set of changes.

Now it’s time to begin adding content to the custom image. The first thing I’m going to do is borrow some logic from packagegroup-machine-base. I don’t want to use this packagegroup directly because it will cause bitbake to build a lot of extra stuff that I don’t end up installing. This is due to the fact that it’s part of packagegroup-base.bb (because it’s needed to resolve dependencies of other parts of the packagegroup). Instead, I’m going to add:

    ${MACHINE_EXTRA_RDEPENDS} \
    ${MACHINE_EXTRA_RRECOMMENDS} \

to IMAGE_INSTALL so that any additional machine-specific functionality that’s been specified is installed to the image. Next, I’ll add in kernel-modules to the list so that all of the modules that have been built for the kernel are installed to the image. This will be a lot easier than listing out every module I may need, especially for later on when it comes to various firewall rules I want to use.  On top of all of this, I also want to drop in a bunch of full-versions of common packages I use, and then let busybox fill in the rest.

    bind-utils \
    coreutils \
    findutils \
    iputils-ping \
    iputils-tracepath \
    iputils-traceroute6 \
    iproute2 \
    less \
    ncurses-terminfo \
    net-tools \
    procps \
    util-linux \

Almost everything in this list can be tweaked as desired. There are a couple items that serve a critical purpose and deserve an explanation:

  1. systemd calls out to $PAGER for many functions, including browsing logs with journalctl. If I don’t have the full version of less available, I won’t have a fully functional pager and browsing the output is extremely difficult.
  2. I don’t use xterm for my terminal emulator anymore so I want ncurses-terminfo installed. This ensures that the right terminfo is available and terminal output is correct.

At this point it’s time for a git add, git commit, and then a bitbake of our image too.

Now that I have a new image with additional content to try out, I want to put it on the device and confirm things work. As mentioned before, I’m using Mender in standalone mode since I have a single deployed device.  It’s very simple to serve the new image and then apply it. On the build machine, I do the following (change qemux86-64 to match the machine in use):

$ (cd tmp-glibc/deploy/images/qemux86-64; python3 -m http.server)

And then on the device:

# mender -rootfs http://build-server.local:8000/core-image-minimal-router-qemux86-64.mender
... wait while it downloads and applies ...
# reboot

Once the device comes back up, I’ve logged back in, and confirmed I’m satisfied with my changes, I do:

# mender -commit

This will mark what I am now running as the valid rootfs. However, if the device didn’t boot up or I couldn’t log in, I would simply not commit the changes. To do that I would then just reboot or otherwise power-cycle the device. If I don’t commit the changes to Mender then I get an automatic rollback to the previous install.  Of course, it’s also possible to use any HTTP server on the build machine.

At this point, it’s time to iterate over adding a number of different features that require little more than adding to IMAGE_INSTALL. Since I’ve talked about LXC, I need to add in lxc and gnupg (for verification of containers used from the download template). Once that’s added, I do the git add, git commit, bitbake, and then mender -rootfs cycle again and confirm LXC is working. One thing I noticed when doing this was that containers didn’t autostart because the service isn’t enabled by default.  Since I’m keeping this stateless, I changed that behavior with a bbappend file.  I also ended up installing e2fsprogs-mke2fs to be able to further partition my device to give LXC some room to work with.  This also means that I needed to have base-files provide the fstab that matches my setup, rather than the stock one.  Another small thing to cover is if your hardware does, or does not have a hardware random number genreator available.  If you do have one, you should pull in rng-tools on the image.  If you don’t have one however, you should install haveged to help feed the entropy pool instead.

Now I need to enable a functional access point. This is the first case where it’s really non-trivial to write up the config file to use, so it’s done a little bit differently.  The first step is to install hostapd and iw and boot that.  Now, on the device, edit /etc/hostapd.conf and iterate on editing and testing it on the device until everything is set up as desired. The iw tool can be helpful here to do things like perform a site scan to see what frequencies are already in use.  Once I’m done with the config, I copy the file out from the target and over to my build server with scp as /tmp/hostapd.conf. Then it’s time to make it stateless:

$ mkdir -p recipes-connectivity/hostapd/hostapd
$ cp /tmp/hostapd.conf recipes-connectivity/hostapd/hostapd/

And then I edit recipes-connectivity/hostapd/hostapd_%.bbappend to look like this:

FILESEXTRAPATHS_prepend := ":${THISDIR}/${PN}"

SRC_URI += "file://hostapd.conf"

do_install_append() {
    install -m 0644 ${WORKDIR}/hostapd.conf ${D}${sysconfdir}
}

SYSTEMD_AUTO_ENABLE_${PN} = "enable"

This will do two things. Everything except that last line is to tell bitbake to look in my layer for hostapd.conf and then to install it. The last thing is that now that we have a configured AP we want to start it automatically so have it be an enabled systemd service. Now it’s time once again for the git add, git commit, and so forth cycle.

The next step is to do the same kind of thing to dnsmasq. The good news that this time, the dnsmasq_%.bbappend file only needs one line:

FILESEXTRAPATHS_prepend := ":${THISDIR}/${PN}"

This is because the rest of the recipe already knows to grab dnsmasq.conf from a local file. In the case of my network, I need to pass in a few special options to some DHCP clients and have certain clients be given certain IP addresses, so I’ve gone with dnsmasq as my light-weight, but still fully featured IPv4 configuration server. I could have just as easily gone with ISC DHCPD instead, and it would look much the same as the above.  Conversely, if I didn’t need those few extra rules, I could just let systemd handle DHCP serving.  I left out IPv6 from my statement there as I am letting systemd handle that.

The only thing missing at this point from a router, aside from turning off developer mode features, is to add in a firewall. There are a few ways to go about this.  I already have systemd handling one of the aspects that is often associated with a firewall, setting up IPv4 NAT.  If the only other thing I needed on top of this is to shut the rest of the world out, I can use ufw and potentially even leverage its features that allow for adding iptables commands directly for slight enhancements.  While I have gone that direction for some projects, it’s not a good fit for this one. Instead, I chose to go with arno-iptables-firewall because I’m going to have a more complex setup. The process of customizing the firewall configuration is similar to how I customized hostapd and dnsmasq. That is, I iteratively configure it on the device, test for functionality, and copy the configuration files to my host.  This time, however, the arno-iptables-firewall_%.bbappend will look a little different:

FILESEXTRAPATHS_append := ":${THISDIR}/files"

SRC_URI += "file://firewall.conf \
            file://custom-rules \
"

do_install_append() {
    install -m 0644 ${WORKDIR}/firewall.conf \
    ${D}${sysconfdir}/arno-iptables-firewall/
    install -m 0644 ${WORKDIR}/custom-rules \
    ${D}${sysconfdir}/arno-iptables-firewall/
}

I have two files this time. The first one is the main config file, and the second one is the file that contains my custom rules. This is only necessary because I have a number of custom rules, otherwise it could be omitted.

At this point, looking back at the feature list I laid out in part one, I believe I can check all of my items off now.  I have the following all operational:

  1. access point
  2. firewall
  3. IPv4 and IPv6 network configuration
  4. containers 
  5. OTA software update

I’m building all of my software as hardened as my compiler will allow.  There’s very little state on the router itself to worry about backing up, and everything else is handled by my build server being backed up.  I’m confident in my OTA configuration as I’ve been using it for some time now in the development workflow. I’ve also tweaked the installed package list so that all of my favorite sysadmin tools are available.

At this point, it’s time to lock things down. First up, it’s time to go back to core-image-minimal-router.bb and remove that second line worth of IMAGE_FEATURES. Instead, I’m going to create a new local-user.bb recipe with my own user and SSH key. After listing local-user in IMAGE_INSTALL, I copy meta-skeleton/recipes-skeleton/useradd/useradd-example.bb to somewhere in meta-local-soho, and change it to look like this:

SUMMARY = "SOHO router user"
DESCRIPTION = "Add our own user to the image"
SECTION = "examples"
LICENSE = "MIT"
LIC_FILES_CHKSUM = "file://${COREBASE}/meta/COPYING.MIT;md5=3da9cfbcb788c80a0384361b4de20420"

SRC_URI = "file://authorized_keys"

S = "${WORKDIR}"

inherit useradd

# You must set USERADD_PACKAGES when you inherit useradd. This
# lists which output packages will include the user/group
# creation code.
USERADD_PACKAGES = "${PN}"

USERADD_PARAM_${PN} = "-u 1200 -d /data/trini -r -s /bin/bash trini"

do_install () {
install -d -m 0755 ${D}/data/trini
install -d -m 0700 ${D}/data/trini/.ssh

install -m 0600 ${WORKDIR}/authorized_keys ${D}/data/trini/.ssh/

# The new users and groups are created before the do_install
# step, so you are now free to make use of them:
chown -R trini ${D}/data/trini
chgrp -R trini ${D}/data/trini
}

FILES_${PN} = "/data/trini"

# Prevents do_package failures with:
# debugsources.list: No such file or directory:
INHIBIT_PACKAGE_DEBUG_SPLIT = "1"

Now, there’s one slight problem. I added myself with a user under /data which is excluded from Mender updates. The good thing is I get persistent history and so forth.  The bad thing is I’m not installed there yet.  So I either need to re-flash one last time or manually copy the files over from the filesystem image to the device before I reboot.  Finally, I need to enable myself to use sudo. In addition to adding sudo to IMAGE_INSTALL I also need to either tweak the sudo recipe so that /etc/sudoers.d/ is looked under, tweak it so that anyone in the wheel group can use sudo and add a wheel group, or borrow the example from meta/recipes-core/images/build-appliance-image_15.0.0.bb and do the following in core-image-minimal-router.bb:

# Take the example from recipes-core/images/build-appliance-image_15.0.0.bb
# on adding more sudoers
fakeroot do_populate_poky_src () {
    echo "trini ALL=(ALL) NOPASSWD: ALL" >> ${IMAGE_ROOTFS}/etc/sudoers
}
IMAGE_PREPROCESS_COMMAND += "do_populate_poky_src; "

With all of that built, deployed, and unit tested, it’s time to go live.  My SOHO router is done and ready for production.  It’s now on me to make sure this stays up to date, which in some ways is a lot better than the alternative.  With my previous router, I only had an non-volatile RAM dump specific to the model of router as a backup. I now have my complete configuration containing firewall rules, DHCP options, and more saved. Since starting on the project I have even braved a few OTA updates and had minimal downtime.

This concludes the walk through of building a SOHO router with OpenEmbedded. In the final part of this series, I will describe some of the lessons I learned while designing and implementing this project.

[Go to Part Four of the series.]

Building a DIY SOHO router, Part 2

Building a DIY SOHO router using the Yocto Project build system OpenEmbedded, Part 2

In part one of this series I explained some of my motivations for this project. Now it’s time to start on implementing the project itself. At this point I’m going to assume the reader has basic familiarity with using OpenEmbedded.  Otherwise, the Yocto Project (YP) quickstart guide and OpenEmbedded getting started pages are useful to bring yourself up to speed.  I assume that you’ve followed these instructions on how to prepare your build host and gone so far as to have completed a build for some target previous to this.  In this guide, I’m going to do my best to follow common best practices. Whenever I deviate from best practices, I’ll explain why we want something a bit different. After all, best practices are supposed to be taken as guidelines, not absolutes. Finally, I’m going to be working against the Yocto Project thud code-name release as that’s what is current as of this writing.

The first thing I’m going to do in my project is create a build directory now so that I can easily get access to various tools.  The next thing I’m going to do is use those tools to make a layer to store our configuration in.

$ . oe-core/oe-init-build-env
You had no conf/local.conf file. This configuration file has therefore been
created for you with some default values. You may wish to edit it to, for
example, select a different MACHINE (target hardware). See conf/local.conf
for more information as common configuration options are commented.
...
You can also run generated qemu images with a command like 'runqemu qemux86'
$ bitbake-layers create-layer ../meta-local-soho NOTE: Starting bitbake server... Add your new layer with 'bitbake-layers add-layer ../meta-local-soho' $ bitbake-layers add-layer ../meta-local-soho NOTE: Starting bitbake server... $

Be sure to change oe-core to wherever the core layer was checked out.  Now that the layer has been created, let’s go ahead and start by putting it into git. Why? I am a firm believer in “commit early and commit often” as well as “cleanup and rebase once you’re done”. To me, one of the big selling points of git is that you can track your work incrementally, and when you notice unexpected breakage later on you can easily go back in time and locate the bugs.

$ cd ../meta-local-soho
$ git init .
$ git add *
$ git commit -s -m "Initial layer creation"
$ git branch -m thud

With the first commit in place, it’s time to start customizing the layer. We don’t need the example recipe, so let’s remove it.

$ git rm -r recipes-example/example
$ git commit -s -m "Remove example recipe"

Next, it’s time for a more substantive set of customizations. I’ll start by editing the README. Why? To start with, I’m going to list all of the layers I know of that are dependencies at this point. The README will also be a handy place to note which physical port is for WAN, which port(s) are for LAN, and the network devices that each port is associated with. For now, I add the following to the README:

  URI: git://git.openembedded.org/meta-openembedded
  branch: thud
  layers: meta-oe, meta-python, meta-networking, meta-filesystems

  URI: git://git.yoctoproject.org/meta-virtualization
  branch: thud

  URI: https://github.com/mendersoftware/meta-mender
  branch: thud

Now that I’ve documented these requirements, I’ll also enforce them in code. I edit conf/layer.conf and document these requirements too. The end of the file should look like:

LAYERDEPENDS_meta-local-soho = "core"
LAYERDEPENDS_meta-local-soho += "openembedded-layer meta-python"
LAYERDEPENDS_meta-local-soho += "networking-layer filesystems-layer"
LAYERDEPENDS_meta-local-soho += "virtualization-layer mender"
LAYERSERIES_COMPAT_meta-local-soho = "thud"

Since there are so many layers in use, I will make use of the TEMPLATECONF functionality so that the build directory will be populated correctly to start with. Next, I copy over meta/conf/bblayers.conf.sample, meta/conf/local.conf.sample and meta/conf/conf-notes.txt from the core layer over to the conf directory and commit them without change. Why? This will isolate my local edits later on and make my life easier next year when I decide it’s time to update to a current release. After I’ve committed those files, I the edit conf/bblayers.conf.sample and change it to look like:

BBLAYERS ?= " \
  ##OEROOT##/meta \
  ##OEROOT##/../meta-openembedded/meta-oe \
  ##OEROOT##/../meta-openembedded/meta-python \
  ##OEROOT##/../meta-openembedded/meta-networking \
  ##OEROOT##/../meta-openembedded/meta-filesystems \
  ##OEROOT##/../meta-virtualization \
  ##OEROOT##/../meta-mender/meta-mender-core \
  ##OEROOT##/../meta-local-soho \

Note that this assumes a directory structure where I’ve put all of the layers I will use in the same base directory. If I didn’t do that then I would need to adjust all of the paths to match how to get from the core layer to where the layers are stored. Now I want to use git add and commit all of these changes, so I can move on to the next step.

I will enable systemd next, and while there’s a number of places I could do this, including going so far as to create my own “distro” policy file, for this article I’ll just use conf/local.conf.sample to store these changes. While the core layer’s conf/local.conf.sample.extended has an example on switching to systemd, I’ll do it slightly differently. At the end of the conf/local.conf.sample insert the following, git add and then commit:

# Switch to systemd
DISTRO_FEATURES_append = " systemd"
VIRTUAL-RUNTIME_init_manager = "systemd"
VIRTUAL-RUNTIME_initscripts = ""
VIRTUAL-RUNTIME_syslog = ""
VIRTUAL-RUNTIME_login_manager = "shadow-base"
DISTRO_FEATURES_BACKFILL_CONSIDERED = "sysvinit"

This differs from the core example in a few ways. First, I blank out pulling in an initscripts compat package. While this can be useful, for these images it’s going to end up being redundant. Next, I blank out a syslog provider as I’ll be letting systemd handle all of the logging. Finally, while busybox can be available and provide the login manager, I’ll be using shadow-base instead. This particular change is something now done upstream and will be in later releases, so keep that in mind for the future.

The next functional chunk for conf/local.conf.sample is enabling support for virtualization through meta-virtualization. That layer has its own well documented README file, and taking the time to go over it is a great idea. In my configuration, I’m just going to enable a few things that give me the minimal functionality. So add, git add and git commit:

# Add virtualization support
DISTRO_FEATURES_append = " virtualization aufs kvm"

When talking about system security there are many aspects. One aspect is to harden the system as much as possible by having the compiler apply various build-time safety measures that turn certain classes of attack from “exploit the system” to “Denial of Service by crashing the application”, or even “this is wildly unsafe code, fail to build”. To enable those checks in the build, add, git add and git commit:

# Security flags
require conf/distro/include/security_flags.inc

There’s one last bit of functionality I want in the conf/local.conf.sample file, and that’s support for Mender. How do I go about that? That’s going to depend a bit on what hardware you’re going to do this project on, as it’s a little bit different for something like the APU2 than on an ARM platform. Fortunately, there’s great documentation on how to integrate Mender here. While reading over all of the information there is important, it’s best to focus on the Configuring the build section to understand all of the required variables. In fact, now is a good time to talk more about how I’m going to use Mender in this particular setup. Looking at the overall Mender documentation, it’s very flexible. For this very small deployment scenario, I’ll use standalone mode rather than managed mode. This lets me skip all of the things about setting up a host of other services to make upgrades happen automatically. So, we’ll follow the instructions for configuring Mender in standalone mode. The next thing to touch on is how to handle persistent data. One way to do this would be to make sure that anything which needs to be manually customized or that will be persistently changed at runtime is written somewhere under /data on the device rather than at its normal location. This is because the normal location is going to change when I apply a new update but /data will always be the same. For a lot of deployments, this idea works best, as there are likely many users, and beyond our initial configuration the end user will make changes I know nothing about. For my use case, I am the user, and making the system as stateless as possible will in turn make backing the system up as easy as possible. So instead I will be modifying recipes to include our local configuration when needed. Having an easy to access and modify backup of the various configuration files and so forth will make my life easier in the long run if, for example, something happens to the hardware and it needs to be replaced.

Now that I have everything configured, it’s time to create the build directory. This will let me take advantage of all of the configuration work I just did. At this point, I can go back to the shell, leave meta-local-soho and do this:

$ TEMPLATECONF=../meta-local-soho/conf . oe-core/oe-init-build-env build-router

Again, be sure to change oe-core to wherever the core layer was checked out. Running cat conf/bblayers.conf will show how the changes that were made were now expanded.

What to do next? Well, that depends a little on what I already know about the system. The next step in this project is to configure systemd to take care of creating the networks. If I knew what all of the devices will be named, I could move on to that step. But if I didn’t know, or wasn’t sure, the next step is to build core-image-minimal. That’s as easy as doing:

$ bitbake core-image-minimal

and waiting for the final result, assuming that local.conf.sample file was configured to default to the appropriate target MACHINE. Otherwise I’ll need to pass that in on the command line above. Once that completes, take the appropriate image and boot it. The reward should be a root login prompt.  In this configuration, there’s no root password right now. Login and do:

# ip link

and this will show me what the interface names are. Assuming there is a DHCP server somewhere already, I can then use udhcpc -i NAME after plugging in an Ethernet cable to see which interface is which and note it down for the next step.

Now that I have my network interface names, I can configure systemd to handle them. In my specific case I have 4 Ethernet ports, and coincidentally the one closest to the physical console port (enp1s0) is the one I wanted to call the WAN port. This lets me put the other 3 ports into a bridge for my LAN. Now, to configure these interfaces, I’m going to dump some files under /etc/systemd/network/, and I will use the base-files recipe to own these files, rather than systemd itself. Why? Whenever a change forces a rebuild of systemd that forces a rebuild of a lot of other packages, so this allows me to isolate that kind of build churn. Now I go to meta-local-soho and enter the following:

$ mkdir -p recipes-core/base-files/base-files
$ cd recipes-core/base-files/base-files

I’m going to create four different files. Note that all of the filenames are arbitrary and are meant to help poor humans figure out what file does what. If another naming scheme is helpful, it can be used just as easily. First I’ll take care of the WAN port by creating wan-ethernet.network with the following content:

# Take the eth port closest to the console port for WAN
[Match]
Name=enp1s0

[Network]
DHCP=yes

Next, I’ll create our bridge for the other ports and this is done with two files. First I need a lan-bridge.network with:

# Bridge the other 3 remaining ports into one.
[Match]
Name=enp2s0 enp3s0 enp4s0

[Network]
Bridge=br0

Second I create br0.netdev with:

[NetDev]
Name=br0
Kind=bridge

And now I have a bridge device that I can use to manage all of our LAN ethernet ports. Later I’ll even put the AP on this bridge for simplicity. Now that I have a bridge, I need to configure it, so create bridge-ethernet.network and add:

[Match]
Name=br0

[Network]
Address=192.168.0.1
IPForward=yes
IPMasquerade=yes
IPv6AcceptRA=false
IPv6PrefixDelegation=dhcpv6

[IPv6PrefixDelegation]
RouterLifetimeSec=1800

There’s a lot in there, but it can be broken down pretty easily. Working my way up from the bottom, the first portion is needed to have the IPv6 address on the WAN port pass along what’s needed to the LAN to have devices configure themselves. Since we still live in a world with IPv4, I need to enable masquerade and forwarding. Finally, I configure a static IP for the interface that matches to br0. I haven’t talked about configuring the LAN for IPv4 yet. This is going to be a bit more complex and while systemd supports a trivial DHCP server, I want to do more complex things like assign specific addresses, so that will come later.

Finally, it’s time to make use of these new files. I’ll head up a level and back to recipes-core/base-files and create base-file_%.bbappend with the following:

FILESEXTRAPATHS_prepend := ":${THISDIR}/${PN}"

SRC_URI += "file://wan-ethernet.network \
            file://lan-bridge.network \
            file://bridge-ethernet.network \
            file://br0.netdev \
" do_install_append() { # Add custom systemd conf network files install -d ${D}${sysconfdir}/systemd/network # Add custom systemd conf network files install -m 0644 ${WORKDIR}/*.network ${D}${sysconfdir}/systemd/network/ install -m 0644 ${WORKDIR}/*.netdev ${D}${sysconfdir}/systemd/network/ }

What this does is to look in that directory where I created those 4 files, add them to the recipe and then finally install them on the target where systemd will want them. At this point, I can git add the whole of recipes-core/base-files, and git commit.

There’s one last thing I want to configure right now. I’m going to have systemd handle things like turning on NAT, and doing some other basic iptables work. As a result, I need to enable that part of systemd. Another thing I’ll want to do here is work around a current systemd issue with respect to bridges. To do so, I need to make the directory recipes-core/systemd/ and create the file systemd_%.bbappend in that directory with the following:

do_install_append() {
    # There are problems with bridges and this service, see
    # https://github.com/systemd/systemd/issues/2154
    rm -f ${D}${sysconfdir}/systemd/system/network-online.target.wants/systemd-networkd-wait-online.service
}

PACKAGECONFIG_append = " iptc"

The first part of these changes is to work around the github issue mentioned in the comment. While it’s quite frustrating that the issue in question has been open since the end of 2015, I can easily work around it by just deleting the service for my use case. The second part is to add iptc to the PACKAGECONFIG options, and in turn that part of systemd will be enabled and it can support iptables so the IPMasquerade flag above will work.

At this point, I can now build core-image-minimal again and have a system that will automatically bring up the network. However, it is not a router yet. In the next part of the series I will create a new image that is intended to be a router and customize that.

[Go to Part Three of the series.]

Building a DIY SOHO router, Part 1

Building a DIY SOHO router using the Yocto Project build system OpenEmbedded, Part 1

I spend my days working on embedding Linux in devices where the end user doesn’t typically think about what’s running inside. As a result, I became motivated to embed Linux in a device that’s a little more visible, even if only to myself. To that end, in this series of articles I will discuss how to build your own SOHO router using the Yocto Project build system, OpenEmbedded.

I realize that many people may be asking the question, “Why build our own router when there are any number of SOHO routers available on the market that are specifically built using good Open Source technologies?”. Commonly, when this question is answered in other guides the major reasons for Roll Your Own (RYO) tend to be better performance and enhanced security. While these are both true, that’s not the primary motivation behind this series. While projects like pfSense and OpenWrt are wonderful for a “turn-key solution”, they don’t meet one of my primary requirements. My requirement is to use my favorite Linux distribution creation tools, and gain a stronger understanding of the inner working of these tools when building a system from the ground up. This is why I’ve chosen to build my router with OpenEmbedded.

What are the Yocto Project and OpenEmbedded? To quote from the former’s website:

The Yocto Project (YP) is an open source collaboration project that helps developers create custom Linux-based systems regardless of the hardware architecture.

The OpenEmbedded Project (OE) is the maintainer of the core metadata used to create highly flexible and customized Linux distributions and a member of the Yocto Project. As a long time developer and user of YP and OE, these projects have become my favorite tools for creating customized distributions. It’s also something that we frequently use and support commercially at Konsulko Group.

Let’s consider what software is required to meet the functional requirements of the SOHO router. My high-level requirements are:

  • Wired/Wireless networking – basic network connectivity
  • Firewall – necessary when a device is on the Internet
  • Wireless access point – required to support the end user’s many WiFi devices
  • Over-The-Air (OTA) software update – My initial software load will be improved and bug fixed in the same manner as any commercial product and requires a simple update path

OpenEmbedded provides all of these features as well as other advanced features I’d like to leverage such as container virtualization.

With the question of software features settled, the next issue is hardware selection. One of the major advantages of OpenEmbedded is that it runs on a wide range of processors and boards, so there are many options. Depending on one’s specific use case, one option is to use any x86-64 based system with two or more ethernet ports. For example, the SolidPC Q4 or the apu2 platform. Another direction would be to use an ARM CPU and look at the HummingBoard Pulse, NXP i.MX6UL EVK, or even the venerable Raspberry Pi 3 B, and making use of a USB ethernet adapter for secondary ethernet. If you have a specific piece of hardware in mind, so long as there’s Linux support for it, you can use it. I did pick something from the above list for my specific installation. However, this guide is intended to be hardware agnostic and so I will highlight any hardware-dependent portions along the way.

Now it’s necessary to further develop detailed requirements and expectations that we have for the router. First, the device is expected to be able to perform all of the standard duties of a modern SOHO router. It’s not just a WiFi access point and IPv4 NAT. It also must handle complicated firewall rules, and support public IPv6 configuration of the LAN (when the ISP supports IPv6). This is an advanced router, so it’s not enough to simply pass along the ISP-defined DNS servers. I may like to push my outbound traffic, with a few exceptions for streaming services, perhaps, through a VPN. OpenEmbedded, via various metadata collections stored in layers, supports all of these features with its core layer and a few of the main additional layers.

I’m selecting Mender for self-managed A/B style OTA upgrades. Why? I don’t have redundant routers, and I’d like to be able to both experiment with enabling new features and updates (such as a new Linux kernel version or other core component), firewall changes, and minimize downtime if anything goes wrong. Neither myself nor my users (my family) are tolerant of much in the way of Internet outages, so the ability to fall back to a known good state with just a reboot is a major feature. I want the router to be kept as current as possible, and with A/B style updates there’s much less state to maintain from release to release. As a result, the router installation will be made as stateless as possible. If a component parameter is configured by the router administration, it comes in the installation image. This will not only help with rollback, but also make it much easier to back up the router. Having the exact config file for something like dnsmasq is a lot more portable than an nvram export that’s tied to a specific hardware model, just in case of lightning strikes.

As a framework component for enabling some advanced router features, LXC will be included via the meta-virtualization layer to support containers. I realize that many people will be asking, “Why would anybody want containers on a router?”. The answer is, quite simply, that there are many good examples of router-appropriate software that’s well managed by containers rather than directly on the installation itself. In this guide, I use the example of pi-hole as an advanced router feature that I want to enable. A container allows us keep it current in the manner which the project itself recommends. The container also supports the goal of keeping a minimum amount of state on the device that must be backed up elsewhere. Could Docker have been used here instead? Potentially, yes. Due to some peculiarities of the IPv6 deployment by my ISP, LXC is much easier to deploy than Docker.

Finally, lets talk about security. For this first guide, what that means is that YP does its best to keep software up to date with respect to known security issues as well as making it easy to enable compiler-generated safeguards such as -fstack-protector-strong and string format protections. On top of that there are layers available which support various forms of owner-controlled measured boot and various Linux Security Models such as Integrity Measurement Architecture (IMA). As the former is hardware dependent we’re going to leave that to a follow-on series as the specific hardware I’m using lacks certain hardware components to make that useful. As for IMA, it can be difficult to combine that with containers so it too will be covered in another series.

[Go to Part Two of the series.]

2 new articles help engineers new to Automotive Grade Linux

Konsulko Group’s Leon Anavi has made a couple of recent contributions to the Automotive Linux (AGL) wiki:
* ConnMan (how to use the cli tool to connect to a network)
* Media Browser and Player (playing songs from a USB stick)

We hope that this technical information will help beginners to get started easier and faster with AGL. We encourage you to participate in the AGL wiki. Feel free to update and improve either or both articles.

Supporting Flame Graphs on production kernels

Background

Perf is an amazing tool for observing system performance in Linux. Using perf on production kernels can be filled with pitfalls, due to the rapid pace at which new features are being added. In my case, I support a production kernel team that expects every feature they read about on the web to work on their older production kernel. A good example of a downstream use case of perf is Brendan Gregg’s very nice Flame Graphs tool for visualizing frequently used code paths in a system.

Example mysql Flame Graph

Example mysql Flame Graph

Recording call frame information with perf

Generation of Flame Graphs depends on perf capturing call frames. As documented in the Flame Graph tools, one records perf data on a x86-64 system by enabling DWARF call graph support with a command line like:

$ perf record -F 99 -a --call-graph dwarf -- sleep 60

That, of course, produces the raw perf.data file. The call frames we need are there. However, we need to process this data with a reporting tool.

Problems generating Flame Graphs

Now we start running into the problem with our production kernel. In our case, we are on a 4.1 kernel. Users are happily running perf report, seeing the complete set of call frame information throughout the system components under observation. The interesting thing is that if we generate a Flame Graph using this same data, then the users no longer have visibility into the complete calling tree information. That is, the Flame Graph will simply show time spent in a given library. So what’s wrong? Let’s take a look at how Flame Graphs are generated:

$ perf script > out.perf
$ stackcollapse-perf.pl out.perf > out.folded
$ flamegraph.pl out.folded > out.svg

The key here is that we are no longer parsing the perf data using perf report, but rather using perf script to do the heavy lifting and feeding the result into the Flame Graph generation tools. Doing a bit of git detective work, we can see that perf report added callchain sampling all the way back in 3.18:

$ git describe --contains 0cdccac6fe4b1316f04f0dbfcc4efab51932014a
v3.18-rc1~8^2~2^2~6
$ git log -1 -p 0cdccac6fe4b1316f04f0dbfcc4efab51932014a
commit 0cdccac6fe4b1316f04f0dbfcc4efab51932014a
Author: Namhyung Kim <[email protected]>
Date:   Mon Oct 6 09:45:59 2014 +0900

    perf report: Set callchain_param.record_mode for future use

    Normally the callchain_param.record_mode is used only for record path.
    But as it might need to prepare something for dwarf unwinding, setup
    this info for perf report too.

    Signed-off-by: Namhyung Kim <[email protected]>
    Acked-by: Jiri Olsa <[email protected]>
    Cc: David Ahern <[email protected]>
    Cc: Frederic Weisbecker <[email protected]>
    Cc: Ingo Molnar <[email protected]>
    Cc: Jean Pihet <[email protected]>
    Cc: Jiri Olsa <[email protected]>
    Cc: Namhyung Kim <[email protected]>
    Cc: Paul Mackerras <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    Link: http://lkml.kernel.org/r/[email protected]
    Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 2cfc4b93..140a6cd 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -257,6 +257,13 @@ static int report__setup_sample_type(struct report *rep)
                }
        }

+       if (symbol_conf.use_callchain || symbol_conf.cumulate_callchain) {
+               if ((sample_type & PERF_SAMPLE_REGS_USER) &&
+                   (sample_type & PERF_SAMPLE_STACK_USER))
+                       callchain_param.record_mode = CALLCHAIN_DWARF;
+               else
+                       callchain_param.record_mode = CALLCHAIN_FP;
+       }
        return 0;
 }

diff --git a/tools/perf/tests/dwarf-unwind.c b/tools/perf/tests/dwarf-unwind.c
index 96adb73..fc25e57 100644
--- a/tools/perf/tests/dwarf-unwind.c
+++ b/tools/perf/tests/dwarf-unwind.c
@@ -9,6 +9,7 @@
 #include "perf_regs.h"
 #include "map.h"
 #include "thread.h"
+#include "callchain.h"

 static int mmap_handler(struct perf_tool *tool __maybe_unused,
                        union perf_event *event,
@@ -120,6 +121,8 @@ int test__dwarf_unwind(void)
                return -1;
        }

+       callchain_param.record_mode = CALLCHAIN_DWARF;
+
        if (init_live_machine(machine)) {
                pr_err("Could not init machinen");
                goto out;

Making Flame Graphs work with our kernel

Knowing that this worked on newer versions of perf in at least the 4.6 kernel, we were then able to spot that it wasn’t until 4.3 that perf script gained callchain support. Notice the addition of the analogous code to what was already in perf report:

$ git describe --contains 7322d6c98dd214252bd697f8dde64a3576977fab
v4.3-rc1~138^2~5^2~10
$ git log -1 -p 7322d6c98dd214252bd697f8dde64a3576977fab
commit 7322d6c98dd214252bd697f8dde64a3576977fab
Author: Jiri Olsa <[email protected]>
Date:   Thu Aug 13 09:17:24 2015 +0200

    perf script: Initialize callchain_param.record_mode

    Milian Wolff reported non functional DWARF unwind under perf script. The
    reason is that perf script does not properly configure
    callchain_param.record_mode, which is needed by unwind code.

    Stealing the code from report and leaving the place for more
    initialization code in a hope we could merge it with
    report__setup_sample_type one day.

    Reported-by: Milian Wolff <[email protected]>
    Signed-off-by: Jiri Olsa <[email protected]>
    Tested-by: Milian Wolff <[email protected]>
    Cc: David Ahern <[email protected]>
    Cc: Namhyung Kim <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    Link: http://lkml.kernel.org/r/[email protected]
    Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 7b376d2..105332e 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1561,6 +1561,22 @@ static int have_cmd(int argc, const char **argv)
        return 0;
 }

+static void script__setup_sample_type(struct perf_script *script)
+{
+       struct perf_session *session = script->session;
+       u64 sample_type = perf_evlist__combined_sample_type(session->evlist);
+
+       if (symbol_conf.use_callchain || symbol_conf.cumulate_callchain) {
+               if ((sample_type & PERF_SAMPLE_REGS_USER) &&
+                   (sample_type & PERF_SAMPLE_STACK_USER))
+                       callchain_param.record_mode = CALLCHAIN_DWARF;
+               else if (sample_type & PERF_SAMPLE_BRANCH_STACK)
+                       callchain_param.record_mode = CALLCHAIN_LBR;
+               else
+                       callchain_param.record_mode = CALLCHAIN_FP;
+       }
+}
+
 int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 {
        bool show_full_info = false;
@@ -1849,6 +1865,7 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
                goto out_delete;

        script.session = session;
+       script__setup_sample_type(&script);

        session->itrace_synth_opts = &itrace_synth_opts;

By backporting this support from the 4.3 version of perf, we were able to support generation of Flame Graphs with our 4.1 production kernel tooling.

Conclusion

The moral of the story is: don’t count on well publicized perf features working on your older kernel. It is just as important to backport updates to the userspace perf tools as it is to backport updates for the production kernel itself.

Git workflow for upstreaming patches from a vendor kernel

Background

Upstreaming patches from a vendor kernel is a constant process of trying to get on board a train, falling behind, and hopping back on again.

Typically, a vendor kernel is based on an older version of the kernel. As a result, one has to forward port a series of patches against the latest kernel mainline. This is seldom a pain free affair, since the patches may not apply without manual edits. For example, the kernel interfaces may have changed, patches merged into mainline could have conflicting changes, and so on. On top of all of this, you have the reality that this task will need to be performed a number of times while you track mainline.

Basics

I usually perform cherry-picking of each patch on top of the mainline branch from the vendor branch, which is usually based on some old stable point release. Let’s assume that there are tags pointing to the vendor and stable commits, this reports the number of patches on top of that stable release.

panto@dev:~/linux (master)$ git describe stable
v4.1.15

panto@dev:~/linux (master)$ git describe vendor
v4.1.15-926-gea8c225

In this case, the patches are against stable v4.1.15 and there are 926 patches
on top of it; the format of the describe label is <tag>-<#-of-patches>-g<commit>

Workflow

First we get our needed information on the master branch.

panto@dev:~/linux (master)$ git describe master
v4.8-rc8-13-g53061af

Our master is today’s mainline kernel (v4.8-rc8) with just 13 patches on top of it. The problem with cherry-picking and manual editing is that when you edit the patch the commit id changes since the contents of the patch changes. We need a way to have a list of patches to cherry-pick, iteratively apply them, and manually fix any problems.

panto@dev:~/linux (master)$ git checkout --track -b work master
Checking out files: 100% (33262/33262), done.
Branch work set up to track local branch master.
Switched to a new branch 'work'

panto@dev:~/linux (work)$ git log --reverse --oneline stable..vendor >patchlist.txt

We create file with a list of the commits we want to apply on the work branch.

panto@dev:~/linux (work)$ cat patchlist.txt | head -n 2
ff24250 ppc: Make number of GPIO pins configurable
e4d443c pci/pciehp: Allow polling/irq mode to be decided on a per-port basis

This is the standard oneline format of git log (in reverse since we want the list to be in chronological order). If we were to do this manually, we’d have to do it like this:

panto@dev:~/linux (work)$ git cherry-pick ff24250 

If the cherry-pick is successful, we can proceed with the next one and so on. Otherwise, we have to manually fix it and issue git cherry-pick –continue Why not automate this by picking out the commit from the list and work iteratively? We can’t simply use commit IDs because the commit ID changes after every edit. The following cherry-pick-list.sh script does the heavy lifting of picking out the commits for us. Given the patchlist file, it will git cherry-pick each commit in the list. However, it will skip the already applied commits which match the description. It does not consider commit IDs since those might have changed.

#!/bin/bash

# get top
top=`git log --oneline HEAD^..HEAD | head -n1`
ctop=`echo ${top} | cut -d' ' -f1`
dtop=`echo ${top} | cut -d' ' -f2-`
l="$ctop $dtop"
if [ "$l" != "$top" ] ; then
        echo "Reconstructed top failure"
        echo $top
        echo $l
        exit 5
fi

# get list of commits and descriptions
old_IFS=${IFS}
IFS=$'n'

j=0
for i in `grep -v '^#' $1`; do
        c[${j}]=`echo ${i} | cut -d' ' -f1`
        d[${j}]=`echo ${i} | cut -d' ' -f2-`
        l="${c[${j}]} ${d[${j}]}"
        if [ $l != $i ] ; then
                echo "Reconstructed changeset failure $i"
                exit 5
        fi
        ((j++))
done
last=$((j - 1))
IFS=${old_IFS}

# skip over patches that are applied (checking description only)
match=0
for i in `seq 0 $last`; do
        ct=${c[${i}]}
        dt=${d[${i}]}
        if [ "${dt}" == "${dtop}" ]; then
                echo "Match found at $i: $dt"
                match=$(($i + 1))
                break;
        fi
        # echo "$i: $ct $dt"
done

for i in `seq $match $last`; do
        ct=${c[${i}]}
        dt=${d[${i}]}
        echo "cherry-picking: $i: $ct $dt"
        git cherry-pick $ct
        if [ $? -ne 0 ] ; then
                exit 5;
        fi
done

It makes sense to work on a simplified example using the kernel’s README file.

panto@dev:~/linux (work)$ git checkout --track -b foo master
Switched to branch 'foo'
Your branch is up-to-date with 'master'.

Edit the README file resulting to the following patch:

panto@dev:~/linux (foo)$ git diff
diff --git a/README b/README
index a24ec89..947fe6c 100644
--- a/README
+++ b/README
@@ -6,6 +6,8 @@ kernel, and what to do if something goes wrong.

WHAT IS LINUX?

+  foo
+
Linux is a clone of the operating system Unix, written from scratch by
Linus Torvalds with assistance from a loosely-knit team of hackers across
the Net. It aims towards POSIX and Single UNIX Specification compliance.

Commit the change:

panto@dev:~/linux (foo)$ git commit -m 'foo description'
[foo 4b5f122b] foo description
1 file changed, 2 insertions(+)

List the patches on top of master in sequence:

panto@dev:~/linux (foo)$ git log --oneline --reverse master..foo
4b5f122b foo description

Let’s create a new bar branch:

panto@dev:~/linux (foo)$ git checkout --track -b bar master
Switched to branch 'bar'
Your branch is up-to-date with 'master'.

Edit the README file resulting to the following patch:

panto@dev:~/linux (bar)$ git diff
diff --git a/README b/README
index a24ec89..4e7043c 100644
--- a/README
+++ b/README
@@ -6,6 +6,8 @@ kernel, and what to do if something goes wrong.

WHAT IS LINUX?

+  bar
+
Linux is a clone of the operating system Unix, written from scratch by
Linus Torvalds with assistance from a loosely-knit team of hackers across
the Net. It aims towards POSIX and Single UNIX Specification compliance.

Note that this conflicts with the foo patch, we will need to manually fix it later.

Commit the change:

panto@dev:~/linux (bar)$ git commit -m 'bar description'
[bar aba1679] bar description
 1 file changed, 2 insertions(+)

Make another commit that is conflict free:

panto@dev:~/linux (bar)$ git diff

diff --git a/README b/README
index 2788bfc..fbdf488 100644 
--- a/README
+++ b/README 
@@ -412,3 +412,4 @@ IF SOMETHING GOES WRONG:
gdb'ing a non-running kernel currently fails because gdb (wrongly)
disregards the starting offset for which the kernel is compiled.

+   more bar

Switch back to the foo branch to apply the changes in the bar branch.

panto@dev:~/linux (bar)$ git checkout foo
Switched to branch 'foo'
Your branch is ahead of 'master' by 1 commit.
  (use "git push" to publish your local commits)

Generate the patchlist file:

panto@dev:~/linux (foo)$ git log --oneline --reverse master..bar | tee patchlist.txt
22d7ac6 bar description
2be1bbb more bar description

Apply them using the script:

panto@dev:~/linux (foo)$ cherry-pick-list.sh patchlist.txt  
cherry-picking: 0: 22d7ac6 bar description
error: could not apply 22d7ac6... bar description
hint: after resolving the conflicts, mark the corrected paths
hint: with 'git add <paths>' or 'git rm <paths>'
hint: and commit the result with 'git commit'
Recorded preimage for 'README'

panto@dev:~/linux (foo)$ git diff
diff --cc README
index 947fe6c,4e7043c..0000000
--- a/README
+++ b/README
@@@ -6,7 -6,7 +6,11 @@@ kernel, and what to do if something goe

WHAT IS LINUX?

++<<<<<<< HEAD
+  foo
++=======
+   bar
++>>>>>>> 22d7ac6... bar description

Linux is a clone of the operating system Unix, written from scratch by
Linus Torvalds with assistance from a loosely-knit team of hackers across

Edit and fix it to look like this:

panto@dev:~/linux (foo)$ git diff
diff --cc README
index 947fe6c,4e7043c..0000000
--- a/README
+++ b/README
@@@ -6,7 -6,7 +6,8 @@@ kernel, and what to do if something goe

WHAT IS LINUX?

+  foo
+   bar

Linux is a clone of the operating system Unix, written from scratch by
Linus Torvalds with assistance from a loosely-knit team of hackers across

panto@dev:~/linux (foo)$ git add README

Edit the commit message (leaving the conflict or removing the Conflicts: tag)

panto@dev:~/linux (foo)$ git cherry-pick continue
Recorded resolution for 'README'.
[foo cc9dc34] bar description
1 file changed, 1 insertion(+)

Note the Recorded resolution line. Next time we will perform the same operation so we don’t have to repeat the manual fix. Run the script again to pick up the rest of the patchlist.

panto@dev:~/linux (foo)$ git cherry-pick continue
Match found at 0: bar description
cherry-picking: 1: 2be1bbb more bar description
[foo 63c1973] more bar description
 1 file changed, 1 insertion(+)

Note the message Match found at 0:. The script picked up that the first commit has been applied (albeit manually edited) and continued with the rest, which apply without problems. List the patches on top of master on the foo branch. Note that the commit ids of the patch sequence have changed.

panto@dev:~/linux (foo)$ git log --oneline --reverse master..
4b5f122b foo description 
cc9dc34 bar description
63c1973 more bar description

Now if we reset the foo branch back to the starting point:

panto@dev:~/linux (foo)$ git reset --hard HEAD^^
HEAD is now at 4b5f122b foo description

Try to apply the patchlist again to see what happens:

panto@dev:~/linux (foo)$ cherry-pick-list.sh patchlist.txt
cherry-picking: 0: 22d7ac6 bar description
error: could not apply 22d7ac6... bar description
hint: after resolving the conflicts, mark the corrected paths
hint: with 'git add <paths>' or 'git rm <paths>'
hint: and commit the result with 'git commit' 
Resolved 'README' using previous resolution.

Note the Resolved ‘README’ using previous resolution. This means that git determined that we are trying to perform the same edit and already made the change for us. Of course, it didn’t commit the change so that we have a chance to verify that it is correct.

panto@dev:~/linux (foo)$ diff --cc README
index 947fe6c,4e7043c..0000000
--- a/README
+++ b/README
@@@ -6,7 -6,7 +6,8 @@@ kernel, and what to do if something goe

WHAT IS LINUX?

+  foo
+   bar

Linux is a clone of the operating system Unix, written from scratch by
Linus Torvalds with assistance from a loosely-knit team of hackers across

Just add the changed file as earlier:

panto@dev:~/linux (foo)$ git add README 
panto@hp800z:~/juniper/linux-medatom.git (foo)$ git cherry-pick --continue
[foo 2586dba] bar description
 1 file changed, 1 insertion(+)

Use the script again and end up at the same result:

panto@dev:~/linux (foo)$ ./cherry-pick-list.sh patchlist.txt 
Match found at 0: bar description
cherry-picking: 1: 2be1bbb more bar description
[foo 6641dd4] more bar description
 1 file changed, 1 insertion(+)

I’m not a regular git guru but I found out that this small script ended up saving me a large amount of repetitive work. I hope someone else will find this useful.