Modern Linux service isolation

Introduction

Application security in Linux has changed a lot over the last few years. More fine-grained restrictions became available and there’s more attention to locking down unnecessary access. While not many of the options described later are completely new, most of them got much easier to activate. Many effective restrictions don’t require creating a full system SELinux profile anymore. And that’s progress.

Here’s a few simple changes that you can apply to services that can make a big difference to the system overall.

Typical automated behaviour

For an effective defence of a generic service, we first need to identify what are the goals. Specifically, what do exploits actually do either to get control of the system or afterwards. Some ideas can be taken from a list of exploit payloads - for example those published in metasploit. The most common operations are:

  1. Connect out to download another payload.
  2. Listen for connection.
  3. Execute another application.
  4. Bind shell to the current connection.

Additionally, exploits trying to obtain elevated local privileges may try to:

  1. Add/modify files read by other services.
  2. Execute local applications giving admin privileges.

Let’s look at a few ways to limit or eliminate those actions. This should effectively disable most of the automated (not targetted) attacks.

Separate users

Running services as a separate user is a very common protection for a number of reasons. First, it ensures that no other local applications can be affected by the service you’re trying to protect. They can’t access other process’ memory, send it signals, etc. It also helps avoid creating files that are accidentally shared between unrelated services.

Sometimes you could be tempted to keep root privileges in your application because of some common requirement like opening low-number ports or running helper commands which require admin account. This is almost never necessary and the workarounds usually fall into one of the following categories:

  • do the operation as a remote procedure call to another service (IPC, or network)
  • configure temporary privilege elevation system like sudo to allow just the required operations, only for the specific user
  • receive resources like sockets from another privileged service at startup

Having a dedicated user also makes implementing other protections easier, since there’s a clear way to match the right process.

Services don’t usually talk to the internet

Allowing only the expected inbound connections is pretty much a standard these days, however outbound connections are still very rarely controlled. Fortunately, it’s not hard to do since iptables has a nice way of filtering based on a list of destinations and on the process owner. If your service needs access to (for example) a local MySQL at 10.1.2.3 and some remote service, you can do:

ipset create service_out hash:ip,port
ipset add service_out 10.1.2.3,3306
ipset add service_out '[endpoint.hostname],443'

iptables -A OUTPUT -m owner --uid-owner service_user \
    -m set --match-set service_out dst,dst \
    -p tcp -j ACCEPT
iptables -A OUTPUT -m owner --uid-owner service_user \
    -p tcp -j REJECT

Of course, since the DNS destinations can change over time, it’s important to verify with the service provider that this is safe to do. If the service is very dynamic, then an enforced, filtering web proxy may be a better option. Or a script which queries the DNS and refreshes the table periodically.

Restricting both the incoming and outgoing traffic this way allows us to get rid of a large number of automated attackers. Since attackers often either download extra software after a successful break-in, or require another inbound connection for the second stage, this completely stops them. Additionally logging the packets before rejecting them provides real-time reporting that something bad happened.

Files owner and service user different

Once there’s a separate user dedicated to running the account, it’s important to separate the file and process ownership. It’ss a very common behaviour for scripted malware, especially attacking popular PHP frameworks or CMSes like Wordpress to inject new code into the existing scripts. There’s almost never any reason for allowing this behaviour.

Things get a bit more complicated when the framework expects this behaviour. For example Symfony expects to create and execute cached PHP files from its directory. Usually, the issue can be avoided by warming up the cache and then locking down the access rights. If no further modifications are expected, this is a safer solution than granting extra permissions as described in their deployment documentation.

No new privileges, ever

Now it’s time for a more recent protection - from 2012 and Linux 3.5, but still not widely used. Processes can declare that they do not need any new privileges and almost all requests for them should be denied. This means that even executing suid binaries is locked - the application itself will be run, but only with the current service privileges. Applications can severely limit the scope for exploits this way. Even if there exists an exploit for your application which can execute arbitrary local commands, it cannot gain elevated privileges this way.

Usually this protection is activated by calling prctl with appropriate options, but there are easier ways too. If your application is started by systemd for example, this restriction can be activated by adding the following options to the unit:

[Service]
NoNewPrivileges=true

This option doesn’t force the process to drop any privileges and capabilities it already has, so if you’re currently using specific capabilities the documentation should explain how those options interact.

In most cases, this change gets rid of issue 6 on the list - no more elevated privileges.

Limited writable paths

Many local exploits depend on races of file creation - very often temporary files. This is often an interesting attack vector, because even with separate users and groups, many services use a common directory for writing their temporary files. Most of the time that behaviour is unnecessary. What you want to achieve is to write into a temporary location, not the shared temporary location.

Filesystem namespaces help here because you can give each service its own /tmp location. This is done pretty much by default for all services running in popular container environments - they get a completely separate filesystem. But standard services can be partially restricted as well. You can either do this by creating the right namespace yourself or if you’re starting from systemd, you can add the following line to your configuration:

[Service]
PrivateTmp=yes

For similar reasons restricting writes to other parts of the filesystem may be beneficial. No service should ever surprise you with where it tries to write its files. If it does, it should be stopped. Again, filesystem namespaces can help here. You can set the right mapping and restrictions yourself via unshare and mount. Alternatively, in systemd you can set the list of writable paths using:

ReadWritePaths=...
ReadOnlyPaths=...
CapabilityBoundingSet=~CAP_SYS_ADMIN
SystemCallFilter=~@mount

Later options ensure that no further modifications can be made after the process is started. Putting those together, we can get rid of issue 5.

Limited memory mapping

Just as with restricting files, restricting memory access can be very helpful. While most distributions will make sure that applications have non-executable stack area and stop a lot of trivial issues that way, recently systems are going further and try to stop all situations where a memory region is both writable and executable. This may be enforced in processes by creating a seccomp filter which prevents marking memory as writable and executable at the same time. While it can be done manually, systemd again provides a simple flag for it:

MemoryDenyWriteExecute=yes

Unfortunately not every application will be compatible with this option. Many dynamic runtimes and JIT-compiled languages will break. Still - whenever possible, it’s a good safety net against heap based exploits.

A similar protection (and many more) is available from the kernels with grsecurity patches. These are a great idea on their own, but require extra time for the maintenance of your custom kernel updates.

Resource limits

Let’s look at some very old settings for a change, but ones that are still not very common in production. I’m talking about resource limits and other related process settings. You can see the list of available limits in the get/setrlimit man pages. They can catch weird behaviour which may be an attempt to exploit some system facility, or simply your service breaking in an unusual way. Either way, they’re a very nice and simple safety net.

For example: Your service usually won’t try to spawn hundreds of new threads, which can be stopped by setting NPROC (also protects you from fork bombs). Your service is also usually unlikely to write huge files, so setting FSIZE limit could be useful. Check out the man pages for all the details.

Special protection via LSM

Finally, there are many more restrictions provided by the LSM framework. This means either AppArmor or SELinux for the common Linux distributions. This topic is too long to describe here properly, but Linux Security Modules allow checking each interaction between your app and the system. Each action like opening a file, or sending a message to another thread, or executing a new process can be captured and either approved or denied. This allows a much finer control than the coarse options described before.

The choice of the implementation will likely depend on your system. RedHat-like systems usually integrate SELinux, while Debian-like systems usually go with AppArmor. SELinux tries to handle the protection of the whole system, while AppArmor allows process-by-process profiles. This is a big simplification, so make sure to read more about them before making a choice.

Summary

There’s a lot of really simple modifications that can be applied to your services to prevent common attack mechanisms. They are more likely to stop automated exploitation rather than attackers specifically targeting you, but they still provide a reasonable defence. On modern systems they can be activated with simple init system flags, or short scripts and with next to no performance impact.

While a lot of those protections are packaged together into the concept of Linux containers, it’s worth remembering that you can pick and choose. You don’t have to opt into purely container-based deployment to get most of their security-related features.

Will not restrict your services now?

Did you find this information interesting? Are you looking for ways to secure your online services? Defence for Startups can help you analyse, secure, and make plans for the future of your software. It doesn't matter whether it's just an idea or a successful, established business. Very simple changes can have a big effect on your future security.

Contact us and we'll figure out how to solve your problems.