Sectegrity - Allow No Harm
Allow No Harm


Sectegrity Corporation
Allow No Harm

Securing AWS from a compromised Lambda Function
by Adam Goode

We’ve done a great job locking down OUR code, sanitizing inputs/outputs, using safe functions instead of shelled system calls, parametrizing all our SQL queries, privilege isolating our DB reads and writes, and so on... However, buried in one of the external libraries we use in our Lambda package is a vulnerability which allows for the execution of arbitrary system commands (I know, I should have read my article “Don’t be afraid to re-invent the wheel” regarding external libraries).

Unfortunately, the vulnerability has been found and exploited by an attacker. Here's how we limit the blast radius of this type of event.

Accept that which we can not change...

There are certain things we will not be able to stop our attacker from doing:

a) Accessing files in our application package

b) Accessing environment variables

c) Accessing temporary files in /tmp

d) Adding/Changing/Removing files in /tmp

e) Executing system libraries/interpreters/commands for which our application has privileges

Don't live in a barn...

Unless there is a strong business driver for our Lambda to have outbound Internet access, we need to close that door. Without direct outbound Internet access, our attacker will need to find a different method (ie: Public S3 buckets, SES, SNS, etc...) to gauge progress, exfiltrate our Lambda's package/environment/temp files, or bring in additional toolsets. The cleanest way to do this is to put the function in a VPC subnet that does not have an external route through a NATGateway. For functions that have no dependencies on VPC resources (ie: RDS, EC2, etc...), a separate VPC makes for a clean environment.

The following high level subnet definitions work well and cover many common scenarios (all of which should be controlled with both routing and security policies):

a) Dead-End Subnet: No access to the Internet or VPC resources

b) VPC-Only Subnet: Access to VPC resources only

c) Internet-Only Subnet: Access to the Internet without access to VPC resources - This scenario does allow for security groups to limit outbound access. If the target addresses are undefinable, the default non-VPC config of a Lambda function can be used.

d) VPC+Internet Subnet: Access to Internet and VPC resources

It should be noted that many/most AWS services require the caller to have Internet access (even when calling from within AWS), which kinda throws a wrench in the works for our network isolation scheme. However, AWS provides VPC endpoints for many of their services. For example, to access Dynamodb, you will need to create a Dynamodb VPC endpoint accessible to the subnets that don't have Internet access.

Let's say we find ourselves in the situation where our function requires relatively unrestriced outboud Internet access. As such, our attacker will be able to exfiltrate our Lambda package/environment/temp files. While having the confidentiality of our source code compromised is terrible (unless we've open-sourced it), there is far greater harm that can be inflicted upon us. For this reason, it is here, with the exfiltration of the package/environment that the damage must end.

Limit secondary targets...

Ensuring our Lambda function is only privileged to do those tasks required for it to do its job will GREATLY reduce our attacker’s opportunities to inflict further damage. This begins and ends with IAM (AWS’ Identity and Access Management tool).

1) Every Lambda function must have its own role and be set to execute as that role.

2) Every Lambda role must have its own policy.

3) Policies must be white-list based (explicit allows only).

4) Policies must only allow the minimum required privileges for the lambda to function. Explicit actions, resources and conditions must be defined.

5) Unless there is an incredibly strong and clear business driver, Lambda functions should never be given access to IAM.

Once the above is done and our attackers have a limited set of target opportunities, we need to keep our attackers from exploiting those targets.

Make 'em fly blind...

Now is a good time to elaborate on item #5 above... If our attacker had access to, say view the IAM policy for the running context, they would have a complete target list of available resources and their allowed methods. Taking this a step further, if the IAM policy allows for managing IAM policies, the entire AWS account and possibly any trusting accounts can be taken over as our attacker would be able define any access permission they wish. This clearly can not be allowed!

Moving along... We all know that source code must never contain “secret” data (ie: usernames/passwords, API keys, Secret Strings, Signing Certs, database connection data, etc...). Given that we know our attacker will have access to our application package and environment, we need to extend this practice to the entire lambda package and environment (we'll come back to this one in a bit).

I know, it sounds like an easier said than done scenario. Fortunately, it isn’t.

Enter The Secure Configuration Store...

Dynamodb can be made to be a pretty darn secure configuration store. When a Lambda function first executes (cold start) it pulls its configuration data from the configuration store and loads it into memory (under its currently running scope, which is not available to subprocesses). After that, all subsequent warm starts will already have their configurations set and will not need to call to the configuration store.

At its simplest, the “secure configuration store” table only needs two columns:

‘App-ID’ # Hash / Partition Key

‘Config’ # Attribute storing the configuration data (ie: json or other)

As such, every application using this store must have a strong, unique app_id.

Every application’s policy is permitted only get_item access to this table. This means that the application must know its app_id (primary key) to retrieve the item. To further limit access, each application has a condition applied to its policy limiting their access to only their entry (primary key) in the store. As such, our compromised application cannot retrieve the configuration information of any other app, even if our attacker knew their keys.

Example Policy Statement:

 "Effect": "Allow", 
 "Action": "dynamodb:GetItem", 
 "Resource": "arn:aws:dynamodb:us-aaaa-y:xxxxxxxxxxxx:table/configuration-store", 
 "Condition": { 
      "ForAllValues:StringEquals": { 
      "dynamodb:LeadingKeys": "mvqxuu15bpigjfyqolo7wlq_example_id” 

Transient Mutation of the Immutable...

So, you are probably asking yourself, the above sounds great but how do we securely pass the Lambda Function the name of the configuration store and its Application ID. Easy, we set it as an environment variable. What???? Didn’t you just say that was bad???? Here’s the thing, within the context of the running lambda function, environment variables are indeed mutable and changes last for all subsequent warm starts of the instance! The next cold start will again load the original environment variables.

As one of the first orders of business when the function cold starts, we read the environment variable and immediately overwrite it. From there we call the configuration store and get to work.

Lambda example (Python):

import json 
import os 
before = os.environ['Some_Secret'] 
os.environ['Some_Secret'] = 'Nothing to See Here' 
after = os.environ['Some_Secret'] 
config_data = config_runner(before) # pulls the config data
    print(‘Did Stuff’) 
def lambda_handler(event, context): 
    return { 
        'statusCode': 200, 
        'body': json.dumps(f"""Before: {before}; After: {after}""") 
  "statusCode": 200, 
  "body": ""Before: Configuration-Store,mvqxuu15bpigjfyqolo7wlq_example_id; After: Nothing to See Here"" 

Clean up before yourself...

/tmp is the only writeable space available to our application and therefore to our attacker. With every cold start, /tmp is clean, however data will persist from warm start to warm start. As such, /tmp should be cleaned from execution to execution. We know, unfortunately, we can not count on our Lambda running its code to completion, so we must ensure that we are cleaning the environment prior to running any business logic.

Lambda example snippet (Python):

def lambda_handler(event, context):["rm", "-R", "-f", "/tmp/"]) 
    return { 
        'statusCode': 200, 
        'body': json.dumps(f"""Before: {before}; After: {after}""") 

While an excellent control, it should be noted that it is not fool-proof as there may be files created during the current execution, but prior to the exploit being executed, which would be available to our attacker (with highly sensitive temp data, it is best to clean it the moment it is no longer needed) With that said, this control greatly limits the scope of what’s available to our attacker and completely removes any ability for our attacker to persist code/data from one execution to the next.

Keep your ear to the ground...

Obviously, the sooner we are alerted to the presence of our intruder the sooner we can take action. Cloudwatch and Cloudtrail are our windows into the AWS and must be configured to capture events and alert on the important ones. Some of the bells we'd expect our attacker to ring include:

a) Lambda function errors (both execptions and timeouts)

b) Failed access attempts to IAM

c) User errors in Dynamodb

d) Failed login attempts in RDS

e) API Gateway 5xx errors

f) Failed access attempts to SES / SNS

g) etc...

Weak Links

It can not be stressed enough that weaknesses in your organization’s user and/or change management controls have the potential to lead to systemic compromise of your AWS account.

For expediency sake an over privileged developer quickly created the below policy for a Lambda function he is building:

  “Version”: “2012-10-17", 
  “Statement”: { 
  "Effect": "Allow", 
  "Action": "*:*", 
  "Resource": "*” 

If this function were to be compromised, its game over for your AWS account.

Wrap Up

We must be prepared for the real possibility that our code may not be as bullet-proof as we aspired make it. As such, we must have plans and controls in place to limit the damage an attacker is capable of inflicting upon us.

First, we reduce the communication channels available to our attacker for gauging progress and removing data. Second, we reduce viable secondary targets our attacker has by limiting the privileges of the Lambda function’s role our attacker will be inheriting.

Third, we deny our attacker any workable knowledge of the viable secondary targets.

Fourth, we deny our attacker the ability to persist code and/or data between warm executions, thereby forcing square one buildup on every request.

Our attacker will have to hunt around in the dark. This will be extremely noisy and pretty darn easy to spot.

Until next time. Be safe, Allow No Harm!


Allow No Harm

Don’t be afraid to re-invent the wheel
by Adam Goode

We are in a golden age of frameworks, libraries, modules, plugins, etc… They are everywhere, each more buzztastic than the last. Businesses flock to them in droves hoping to minimize time to market and development costs. Developers love tinkering with the newest, shiniest library to hit their preferred platform. Yet a surprisingly large amount of these entities fail to understand or appreciate the real costs and risks of leveraging these external codebases.

The most fundamental truth is: YOU ARE RESPONSIBLE FOR EVERY LINE OF CODE IN YOUR PROJECT, even if you didn’t write it.

In the end, it makes no difference if your organization lost all its high value data due to an insecure external code base or an insecure internal code base. The effects are the same, devastating! However, the probabilities of occurance can be quite different.

Instead of trying to be all things to all people, your internal code will be far more task focused and limited, thereby substantially reducing the code’s attack surface. This smaller, more focused code base makes it much easier to reason through the security posture of the code, resulting in more efficient upfront risk mitigation, as well as down stream risk remediation.

Your internally developed code will not be targeted as a vehicle for mass exploitation endeavors. Meaning, in this context, your code will not be scoured by the entire underworld in search of vulnerabilities. Nor will it be targeted for source code tampering. It will not be targeted for socially engineered name squatting.

Of course your organization will still be a target, both directly and indirectly. A poor organizational security posture can still result in a loss of source code, or in source code tampering, not to mention all the other other very, very bad things that can be done.

When calculating development effort, include the FTEs to perform a code review of the 35k line external codebase vs the development and review of the 800 line internally developed code base.
Sectegrity: internal vs external codebase

Every external code base utilized repressents a project dependency for which you may have little control, translating into potentially very large risks for the lifecycle of your project. As such, every time external code is considered, a substantial upfront investment of time and effort is required to determine the most adequate and appropriate codebase for your needs.

Of course, what appeared adequate and appropriate 6 months ago may be something very different today.

A library may be under aggressive development with fast moving adding/changing/removing of features and methods, resulting in excessive, continued integration work for your project. Not to mention the increased likelihood of significant vulnerabilities and other bugs being introduced into the codebase.

A library may become abandoned as the next new thing gets all the attention, leaving you to either internally maintain the code or replace the code.

A library may be moderately maintained with fairly significant or even unacceptable intervals between bug discovery and remediation. Depending upon the severity of the issue, you may be forced to either shut-down the features within your project that rely on the library or attempt to fix the code yourself.

A library may be very well maintained, yet becomes constraining as the project matures and new features are needed that the library is simply not capable of providing, thereby slowing down the project and possibly forcing a major re-write.

When we set out to build URGIENT, one of our primary objectives was to limit our dependencies. As such, we chose not to use a web framework, but rather to build our own. We chose not to use any client side libraries, but rather to use the tools native to all modern web browsers. We were very selective in the non-core backend modules we leveraged, as well as limiting our use of SDK's to only a select few.

Like in the 'Serenity Prayer', we all hope to have the wisdom to determine what we can change and what we can’t. Things like processor instruction sets are clearly beyond our control. Things like the newest, trendiest javascript library... those are completely in our control.

Until next time. Be safe, Allow No Harm!