Easy Testing of Amazon (AWS) Lambda Micro-Services

(See the GitHub repo to clone the code)

AWS Lambda is a service (currently in Preview) that allows us to easily run code in response to certain events. What that means is we can easily create small Micro-Services that do one thing and do it well in response to any given event. Good exmaples include:

  • Automatically optimizing images that are droppped into an S3 bucket
  • Data triggers (DynamoDB or Kineses stream processing)
  • Large volume, parallel task running
  • Scheduled tasks

Currently Lambda only supports Node.js... Luckily I love Node! They also have a very reasonable free tier so we can try it out.

How to Test Locally?

Since Lambda is a hosted PaaS (Platform as a Service) offering from AWS you get a lot for your money (great for lazy coding) but you also lose some control. Testing is one of those things you lose some control of when it's not easy to remote into the machine.

Lambda Logs in AWS Cloud Watch Example of Lambda logs in AWS Cloud Watch.

Creating a Simple Local Test Harness

Luckily for us, it's easy to create a test harness locally to simulate the Lambda environment events and run our code. This allows us to use our existing debugging toolset, e.g. node inspector, WebStorm, Visual Studio etc.

For example, in this sample, we simulate an S3 file Put event and call our code in the same way Lambda would.

The test.js file:

// Our Lambda function fle is required 
var importify = require('./importify.js');

// The Lambda context "done" function is called when complete with/without error
var context = {
    done: function (err, result) {
        console.log('------------');
        console.log('Context done');
        console.log('   error:', err);
        console.log('   result:', result);
    }
};

// Simulated S3 bucket event
var event = {
    Records: [
        {
            s3: {
                bucket: {
                    name: 'hotlunch-west-2'
                },
                object: {
                    key: 'importing/org_create_table_test/4-25-mytestfile.docx'
                }
            }
        }
    ]
};

// Call the Lambda function
importify.handler(event, context);

The importify stub - your Lambda function entry point:

var AWS = require('aws-sdk');

// Your exported Lambda entry point
exports.handler = function(event, context) {
    console.log('Received event:');
    console.log(JSON.stringify(event, null, '  '));

    // S3 information from the event
    var bucket = event.Records[0].s3.bucket.name;
    var key = event.Records[0].s3.object.key;

    // Do something with the S3 information
    doSomething(bucket, key, function (err, result) {
        console.log('error:', err);
        console.log('result:', result);

        // Notify Lambda done with error or result
        context.done(err, result);
    });
};

Running the Test Locally

Running the test locally is easy, from the command line (or terminal on Mac):

> node test.js

Why this isn't Perfect

Of course what we're doing here isn't perfect - we're not simulating the Lambda environment precisely (including memory limits). But we do now have a way to quickly develop and test Lambda micro-services locally and then deploy them to Lambda for integration testing. This process minimizes the "change-zip-upload-run-check logs" process that is required otherwise.

How to Test Remotely?

I'm working on this one... hopefully I'll have a update soon!

Lambda Limitation in Preview

Since Lambda is in preview, there are some published limitations and also some that we've found out through using it on a daily basis:

  • Limited events can generate Lambda code to run (S3, DynamoDB and Kinesis events currently)
  • Node.js is the only supported Lambda platform currently supported
  • Limited RAM (1 GB) and time-out (60 seconds max) settings
  • Difficulty debugging using the logs created in CloudWatch (requires verbosity in logging)
  • Potential issues when Lambda re-uses the same environment for multiple runs need to be taken into account - write your code as isolated
  • Requires a better way to control retries on failure (retry count, signal error but no retry etc)
  • Slightly unpredictable things occur sometimes - make sure you over-log your code to detemine what's up
  • Vendor lock-in is a real problem but the benefits for us outweigh the cons so far

Having said this, Lambda is an awesome service that I strongly recommend you try.

Creating a Lightweight Blog with Github-Pages and Jekyll

As a developer I find the typical flow ant tools used for blogging to be less than ideal. I want a flow that embraces the tools I use on a daily basis and makes it as easy as possible to blog...

What I want is an optimized blogging platform and experience targetted to developers.

What about Wordpress?

I could have used Wordpress and WP Engine, an excellent solution to be sure. But, it doesn't really tick the developer flow or simplicity requirement.

GitHub to the Rescue

Github has a great feature called Github Pages that allows us devs to host static content as a web site - they even allow us to associate a custom domain name (e.g. example.com) with the web site. Could this be suitable?

GitHub Pages offers the following capabilities that are relevant to us:

  • Pre-processing and templating support via Jekyll
  • Quick start blogging templates like JekyllNow and Poole (this blog is based on Poole / Hyde)
  • Markdown support as the blogging format
  • Git commit publishing (even publish directly from GitHub)
  • Custom domain names can be pointed to your static pages
  • Supports commenting, categories, tags etc easily or with little change
  • Easy to set up and get going with
  • It's free and fast!

Building the GitHub Pages Blog

So with this information, I decided to MVP blog using GitHub Pages and see how it went.

I decided to use the Hyde repository as a starting point as it looked quite beautiful and was simple to understand and enhance.

Cloning the Base Repository

If you don't already have a GitHub account, jump over there now and create one for free.

Once you're logged into your GitHub account, the quickest way to get started is to clone an existing Jekyll repository to modify. As stated, I decided to use Hyde, so to do the same, visit the Hyde repository and click the Fork button Hyde Repository Fork

There are other Jekyll repos you can clone to get started quickly. Check out JekyllNow and Poole for tsraters.

Rename your Cloned Repository

This step is critical! It's also easy :)

In the cloned repository, click the Settings button and then rename the repository so it has exactly the same name as your GitHub home. For me that's "mikestokes.github.com", for you, it'll follow the pattern "yourgithubname.github.io".

Hyde Repository Fork

Visit your Cloned Blog

Your should now be able to visit your blog website at "yourgithubname.github.io".

Hyde Repository Fork

Customising your Blog

Now that you have a static blog, we'll customise it over the next few blog posts.

Things we'll be looking at customising:

  • Title, blog description and other basics (hint: look at the _config.yml file)
  • Links in the left hand navigation (hint: sidebar.html and head.html templates)
  • Adding new posts (/posts folder)
  • Adding Disqus comments to your blog posts
  • Changing the Favicon (look in the /public folder)
  • Adding Google Analytics (put it in the default.html template)
  • Ideal Mac and Windows setups (I use Hooroopad on Windows)

Using Lists with Entity Framework

By default Entity Framework (as of version 6.x) does not support serialization of Lists to a database field. We could always go relational with this, creating another database table for the List values and use a foreign-key relationship with the parent entity. But sometimes, for simplicity sake, we want to store the List values within the same entity.

Out of the box, the following is not possible in Entity Framework:

[Table("YourTableName")]
public class YourEntity
{
    [Key]
    [DatabaseGenerated(DatabaseGeneratedOption.Identity)]
    public int Id { get; set; }

    public List<string> YourList { get; set; }
}

Serializing Lists using Entity Framework Complex Types

After searching, I came upon a rather simple (and re-usable) solution from Bernhard Kircker that we have put into production and has been working very well for us. I'll outline here what we did, with thanks to Bernhard for the solution.

With Bernhard's solution, the above Entity class can simply be changed and Entity Framework will take care of the rest for you:

[Table("YourTableName")]
public class YourEntity
{
    [Key]
    [DatabaseGenerated(DatabaseGeneratedOption.Identity)]
    public int Id { get; set; }

    [Column("YourColumnName", TypeName ="ntext")]
    public virtual PersistableStringCollection YourList { get; set; }
}

Using the Collection

You can use the collection very simply - just like you would a normal .NET Framework enumerable, for example:

Adding a value: YourEntity.YourList.Add(value);

Clearing the collection: YourEntity.YourList.Clear();

Creating a Custom ICollection

So, what is needed to achieve this?

We need a simple base ICollection class which can be extended for each type we wish to serialize (string, int etc). Luckily Bernhard has done most of the heavy lifting for us here and I've included this below for completeness:

/// <summary>
/// Baseclass that allows persisting of scalar values as a collection (which is not supported by EF 4.3)
/// </summary>
/// <typeparam name="T">Type of the single collection entry that should be persisted.</typeparam>
[ComplexType]
public abstract class PersistableScalarCollection<T> : ICollection<T>
{
    // use a character that will not occur in the collection.
    // this can be overriden using the given abstract methods (e.g. for list of strings).
    const string DefaultValueSeperator = "|";

    readonly string[] DefaultValueSeperators = new string[] { DefaultValueSeperator };

    /// <summary>
    /// The internal data container for the list data.
    /// </summary>
    private List<T> Data { get; set; }

    public PersistableScalarCollection()
    {
        Data = new List<T>();
    }

    /// <summary>
    /// Implementors have to convert the given value raw value to the correct runtime-type.
    /// </summary>
    /// <param name="rawValue">the already separated raw value from the database</param>
    /// <returns></returns>
    protected abstract T ConvertSingleValueToRuntime(string rawValue);

    /// <summary>
    /// Implementors should convert the given runtime value to a persistable form.
    /// </summary>
    /// <param name="value"></param>
    /// <returns></returns>
    protected abstract string ConvertSingleValueToPersistable(T value);

    /// <summary>
    /// Deriving classes can override the string that is used to separate single values
    /// </summary>
    protected virtual string ValueSeperator
    {
        get
        {
            return DefaultValueSeperator;
        }
    }

    /// <summary>
    /// Deriving classes can override the string that is used to separate single values
    /// </summary>
    protected virtual string[] ValueSeperators
    {
        get
        {
            return DefaultValueSeperators;
        }
    }

    /// <summary>
    /// DO NOT Modify manually! This is only used to store/load the data.
    /// </summary>
    public string SerializedValue
    {
        get
        {
            var serializedValue = string.Join(ValueSeperator.ToString(),
                Data.Select(x => ConvertSingleValueToPersistable(x))
                .ToArray());
            return serializedValue;
        }
        set
        {
            Data.Clear();

            if (string.IsNullOrEmpty(value))
            {
                return;
            }

            Data = new List<T>(value.Split(ValueSeperators, StringSplitOptions.None)
                .Select(x => ConvertSingleValueToRuntime(x)));
        }
    }

    #region ICollection<T> Members

    public void Add(T item)
    {
        Data.Add(item);
    }

    public void Clear()
    {
        Data.Clear();
    }

    public bool Contains(T item)
    {
        return Data.Contains(item);
    }

    public void CopyTo(T[] array, int arrayIndex)
    {
        Data.CopyTo(array, arrayIndex);
    }

    public int Count
    {
        get { return Data.Count; }
    }

    public bool IsReadOnly
    {
        get { return false; }
    }

    public bool Remove(T item)
    {
        return Data.Remove(item);
    }

    #endregion

    #region IEnumerable<T> Members

    public IEnumerator<T> GetEnumerator()
    {
        return Data.GetEnumerator();
    }

    #endregion

    #region IEnumerable Members

    IEnumerator IEnumerable.GetEnumerator()
    {
        return Data.GetEnumerator();
    }

    #endregion
}

List of Strings

For example, if we need to serialize a list of strings, we simply create the following class:

[ComplexType]
public class PersistableStringCollection : PersistableScalarCollection<string>
{
    protected override string ConvertSingleValueToRuntime(string rawValue)
    {
        return rawValue;
    }

    protected override string ConvertSingleValueToPersistable(string value)
    {
        return value.ToString();
    }
}

List of Ints

Similarly, if we need to serialize a list of ints, we simply create the following class:

[ComplexType]
public class PersistableIntCollection : PersistableScalarCollection<int>
{
    protected override int ConvertSingleValueToRuntime(string rawValue)
    {
        return int.Parse(rawValue);
    }

    protected override string ConvertSingleValueToPersistable(int value)
    {
        return value.ToString();
    }
}

That's it, once again thanks Bernhard for this little gem.