Cray Aprun/Apinit Privilege Escalation

MWR have identified a vulnerability which allows users to escalate their privileges to root on Cray supercomputers. This advisory details the vulnerability and the patches which Cray customers can apply in order to mitigate this issue.

Description

Apinit and aprun are utilities used to schedule tasks on Cray supercomputers. Apinit runs as a service on compute nodes, and aprun is used to communicate with these nodes.

The apinit service does not safely validate messages supplied to it through the use of aprun. Users of Cray systems are able to exploit this weakness in order to execute commands on the compute nodes of a Cray supercomputer as arbitrary users, including root (UID 0).

Impact

Successful exploitation allows code execution as root on a compute node.

Cause

The vulnerability is caused due to a failure to appropriately validate the content of launch messages sent from the aprun utility.

Interim Workaround

N/A, Cray have provided appropriate patches for this issue.

Solution

Cray have addressed this issue in CLE 5.1.UP00 and CLE 4.2.UP02. Applying these updates will mitigate this issue. The Cray ID for this issue is FN5912.

Technical Details

On Cray supercomputers, the aprun command provides an interface for users to submit jobs for execution on compute nodes. An example of this is as follows:

gibson$ aprun <command>

When aprun is executed, it receives a placement list of nodes from the Application Level Placement Scheduler (ALPS) detailing the compute nodes available for execution of the job. On receiving this listing, aprun then sends a launch message to the apinit daemon running on the first compute node in this list. The launch message contains various pieces of information, including the user ID (UID) under which the job will be executed. However, it was found that apinit was not validating the UID received from within this message against the trusted UID received over the privileged alpsauth connection. As a result, when apinit forks its child process (referred to as the apshepherd or just shepherd process) to launch and manage the application, the application is run under the UID specified in this launch message.

The UID within the launch message is determined by a call to getuid(), and therefore is controllable by the calling user. For example, an attacker could patch the return value from this call at runtime as aprun executes. This attack can be performed by any user of the system to escalate privileges to any other system user.

2013-07-19	Issue reported to Cray
2013-07-19	Acknowledgement by Cray and further details provided
2013-07-20	Issue corrected and testing underway
2013-07-25	Testing completed, patch distributed to Cray customers