Beginnings in Golang and AWS – Part VII – Events, Lambda and Transcribe (cont’d)

Introduction

In today’s post, we’re going to be doing the fun part of putting everything together to get our project into actions. We’ll be uploading our code to S3, creating our Lambda function, and then creating an event subscription that will trigger the function when we’ve uploaded an mp4 file to the bucket being used. As well already are aware, this should then begin processing of the file and create a transcription using Transcribe.

Uploading our Code to S3

When creating a Lambda function that runs Go code, we need to provide a zipped file of the Go code. This can either be carried out via an upload of the file on your development system, or alternatively by uploading the zip file to S3 and providing the location information.

Of the two options, the latter is the most flexible for us since we can update our code and upload a new zip file to this location without needing to change any configuration.

Note that I’m doing these steps on OSX and using Bash. For Windows systems you may need to use slightly different context.

Ensure you’re using the current Github release

Change the current directory to ./src/transcribe within the project folder.

The Lambda code runs on Linux, so we need to ensure when compiling the code that the compiler knows this (via GOOS=linux). We also specify the output file to be main


Now, we can create the zip file

An inspection of the contents of the directory should show a new files, main, and the zip file of it, main.zip.

Uploading the File

Now we want to upload the file. You can either do this manually from the AWS console, or if you want can use the utility we created earlier in the series (as below)

Now, if you login to the AWS console, and take a look at the S3 bucket, main.zip will be there. Click on the file to get its properties and copy the URL under Link into the clipboard. We’re going to be using this in the next step.

Creating the Lambda Function

From the AWS console:

Click Services on the black bar, and then Lambda

Click Create Function

Now, in the Author from scratch section, enter the following values.

  • Name : transcribe
  • Runtime: Go 1.x
  • Role: Create new role from template(s)
  • Role name: transcribe_role

Click Create function

The Designer window appears, featuring the transcribe function, and with a role already defined to allow access to Cloudwatch.

Next, we tell Lambda where to get the code.

In the Function code section, select:

  • Code entry type : Upload a file from Amazon S3
  • Runtime: Go 1.x
  • S3 link URL : <paste the link that you copied into the clipboard  in the previous steps>
  • Hander : main

Create an S3 Trigger

Now that the basic function is in place, we need to configure it for our specific needs. its As previously mentioned, we want it to run when a new .mp4 file is created in our S3 bucket.

On the left hand side, click S3

This will add it onto the console, and a Configure triggers dialog will appear.

Change Suffix to .mp4 and select Add

Click Save

Give the Lambda function access to Transcribe

We need to give transcribe_role permissions to access amazon transcribe in addition to S3 and Cloudwatch

  • Click Services, IAM
  • Click Roles
  • Click transcribe_role

  • Click Attach policies
  • Find and put a check next to AmazonTranscribeFullAccess
  • Attach Policy

With this complete, the policy is now attached to the role.

A return to the Lambda function will also now show Amazon Transcribe on the right hand side, indicating it has permission to access this service.

Upload our movie

We’re ready to test the functionality out! Let’s find an mp4 video file and upload it. In my case, I’ve a file, movie.mp4, which is going to be used.

As before, you can choose either to manually upload the file via the AWS console, via the AWS CLI, or using the Upload program we created earlier.

The Results

The function should kick in pretty much as soon as the file has finished copying to S3. Let’s have a look from the console by going to Machine Learning, Amazon Transcribe.

And it’s there. We’ve now got an end-to-end mp4 to transcript file.

Conclusion

In this post, we’ve created our Go package, uploaded it to S3, setup the Lambda function, made an event subscription for when an MP4 file is uploaded to our bucket, configured the role associated with the function to allow it to use Transcribe, and verified its operation.

At this point there are further steps we could think of.

  • During the time of putting this series of blogs together, AWS added CloudWatch events for Transcribe. We could write another Lambda function, which ran once either a Transcribe completed or failed event occurred. Using this, we could do things like notify us when a job has completed, or even do something like converting the output to .srt format.
  • Add an endpoint and some code to allow us to query the status of one of the jobs.
  • We could even look into using a completely different way of getting a file into S3, such as passing a link to an MP4 file on a website and getting the Lambda function to download the file and store directly in S3 prior to creating the job.
  • Via the above, we could also look at adding in additional event sources, such as via API Gateway.

There’s lots and lots of possibilities, and a forthcoming blog series will cover one or more of these.

thanks for reading! Feedback always welcome. 🙂

cheersy,

Tim

Share

Beginnings in Golang and AWS – Part VI – Events, Lambda and Transcribe (cont’d)

Introduction

In today’s post, we’re going to be looking at the code within the handler’s function. As part of this, we’ll be covering using structs and JSON together, logging to CloudWatch, marshaling, processing of S3 event data, and how to start a Transcribe job. It’s a bit of a longer post today as we’ll go through the entire code in the function.

The Story So Far…

At this point, our handler has been triggered by a file being placed in our S3 bucket to which there is an event subscription for CreateObject (more about this in the next blog). We’ve received the event information, which is placed in our variable S3Event, a struct. We have the information we need for further processing of S3Event and can proceed immediately with it. However, it’s worthwhile spending a couple of minutes looking at how Go processes the information received to place it into the variable.

A Bit About JSON & Go

Go does not have a native parsing mechanism for JSON data that allows dynamic generation of a struct based on the content (think PowerShell’s ConvertFrom-JSON cmdlet for example). Instead, (unless you want to go into the murky world of reflection and creating maps) you are expected to have some degree of awareness of the schema of data being received. Go still handles the conversion process, but it looks to you for information on how to map content. This is done in struct definitions simply by indicating the location of the data to map to, of the format json:"location".

In our situation, the definition of an S3Event for example (an array of type S3EventRecord), maps to the “Records” section of the event data (see below). When information is nested (i.e. a subsection of another), we just make sure our struct matches this. An example of this is the EventVersion string, contained in the S3EventRecord struct, which is mapped to “eventVersion” in the JSON file.

An extract of the struct configuration and JSON data is below. You can examine the complete definition of an S3Event from the aws-lambda-go sdk, here:

Here’s the first two levels of our struct…

…and an extract from the S3Data JSON

Notice how they match up. The parser uses this information to populate the struct properties.

One aspect that is quite nice about the mapping process is that providing the top level outline structure matches the data being received, the entire substructure does not need to be present. You have complete control over which properties should be mandatory or not.

Naturally, it makes sense to have your struct defined to represent the “full” content schema of the JSON data, but content being received into this struct does not need to be as complete. This can help when JSON content may contain less fields, yet still come from the same event source.

If you’d like to spend some more time looking how a Go struct can be designed from a JSON source, I’d recommend taking a look at:

https://mholt.github.io/json-to-go/

Code Description

Let’s move onto the code itself now. Here’s our entire function as a reminder, before we burrow down into what it’s doing.

As mentioned previously, we’re going to use Cloudwatch for logging, using the aptly named log package. The first logging we’ll do is that of the S3Event data.

Initially, we need to marshal the data. Marshaling takes an interface (our struct in this case) and returns a JSON encoding of it.

With this done, we use the string function, which in turn converts our byte data to a string.

We then log this information to CloudWatch. A nice touch of CloudWatch is that it picks up that the string data is JSON and formats it nicely for us. You’ll see this firsthand in the final part of this series.

We then need to iterate through each record entry in the event data. We use for to do this, assigning the record variable on each interaction from s3Event.Records. We do not need to set an initial value in the declaration, hence the “_”.

Inside the loop, we set s3 to the value of the s3 branch, and from this we log the key referred to in the event. We will use this later as a parameter for our Transcribe job.

Any time we want to perform operations with another AWS service, a session needs to be created. We define the parameters that the session will use (a region of eu-west-1, and using my own development profile), then create it using transcribeservice.New, a function from the github.com/aws/aws-sdk-go/service/transcribeservice package.

Next, a check is made to ensure that a successful session was established. This is easily verified by ensuring that a non nil value was returned to our variable, transcriber. If a nil value was returned, we exit the function. We log the result irrespective to CloudWatch.

Now we want to get our parameters set before starting a transcription job.

  • A random job name needs be created, so the GUID function we created earlier is used to populate the variable jobname.
  • We set mediafileuri, using string expansion with the the bucket name and key name that we got from the S3EventData
  • Mediaformat is set to mp4
  • Lastly, we set a language code of en-US for the languagecode variable.

 

We define StrucMedia, which is of type transcribeservice.Media. One thing to be mentioned here is that we pass in a pointer to mediafileuri, not a string. This is because the MediaFileUri definition in the transcribeservice.Media struct specifies via *string that it expects to receive a pointer.

As such, our definition is as below.

 

Then, we invoke the StartTranscriptionJob function. This takes as its parameter a pointer to a StartTranscriptionJobInput struct, whose properties we set within it.  Lastly, a completion message is logged to CloudWatch.

Conclusion

In this post, we’ve covered the code within our lambda function and in doing so have covered how structs and json interoperate, logging to CloudWatch, marshaling, processing of S3 event data and finally how to create a Transcribe job.

We’re nearly there. In the next blog, we’ll run through the entire process of getting our code in S3, creating the lambda function, creating the event subscription, and triggering our function.

thanks for reading! Feedback always welcome. 🙂

cheersy,

Tim

Share

Beginnings in Golang and AWS – Part V – Events, Lambda and Transcribe (cont’d)

Introduction

In todays post we’ll cover an event handler that our Lambda function is going to use when it receives notification of an MP4 file being dropped in our S3 bucket from a subscribed S3 event. This will in turn cover the Context and Event objects. Lastly, we’ll look at the one specific to our function, S3Event.

Our Code

Because we’re only covering the handler and background info on the same and events, the code within the function is removed for this post.

 

Lambda Function Handlers for Go

For building a Lambda function handler in Go, you have a degree of scope with regards to the input and output parameters you use, provided they conform to, per the latest documentation, the following rules.

  • The handler may take between 0 and 2 arguments. If there are two arguments, the first argument must implement context.Context.
  • The handler may return between 0 and 2 arguments. If there is a single return value, it must implement error. If there are two return values, the second value must implement error.

Although not strictly required for our function, Handler, we are using two parameters. The first, per the requirements of above, will be the implementation of context.Context. The second is the actual event data.

Context Object

The service which calls your Lambda function carries metadata, which the developer may find useful to view, or use. This is where the Context object comes into play. When your function signature contains a parameter for this, information is passed into this. There’s a plethora of information that can be available, some of which are service specific and others standard. An example of one of the latter is the AwsRequestID, a unique identifier that can be used as a reference later should AWS support be required. The complete documentation for the Context object is available here:

Event Data

This is the core information passed from the service to the function. It’s format is completely based on said service. In order to manage this, the Go SDK features interfaces for most event sources. In our case, this is the events.s3Event one.

If you wish to look at it’s construction in more detail, you can find it in the s3.go file, located within the events directory of the aws-lambda-go package.

We’ll be setting up an event subscription so that once an MP4 file is dropped into our S3 bucket, it invokes the Lambda function. What does the typical S3 event data our function would be passed look like? Look below.

Here’s the type of information we could expect to see once we have our Lambda function fully in place and an event subscription created to our S3 bucket. More on the latter later.

There is a lot of information there, but the key part of information passed that we’ll be using is contained within the object section.

Conclusion
In this post, we’ve covered the basics of a Lambda event handler for Go, the valid signatures that can be used with it and their purpose. We’ve also looked at the typical information that we can expect to be passed into our S3 event.

In the next blog, we’ll dig deeper into the function and the code within.

thanks for reading! Feedback always welcome. 🙂

cheersy,

Tim

Share

Beginnings in Golang and AWS – Part IV- Events, Lambda and Transcribe

Introduction

The previous posts have taken us through the process of creating a Go executable for uploading a file to S3. We’ll now focus on the next stage of our project. Namely, creating a Transcribe job automatically when an mp4 file is dropped into an S3 bucket.

During these posts, we’ll be covering our code, S3 Events, Lambda, Cloudwatch and Transcribe. These areas will include, amongst others, the CreateObject event, subscriptions, handlers, marshalling, creating a reusable package, logging, reference date format, string slices, and a bit of a deeper look into structs.

Goal

Let’s recap our target by the end of this group of blogs. We want to setup a configuration that responds to an mp4 file being placed into an S3 bucket and runs code that will take the information, including the key, and from this create a job in Amazon Transcribe. Because our code will be running remotely, we also want to have some way to log information during execution, such as an action being undertaken or an error if one has occurred.

Our Code

As before, let’s start with our code, and then break it down.

Imports

We’re using several other packages in this code, some of which we’ve already used.

  • context
    • We will be using the context package, and particularly the Context struct as part of our Lambda function. This allows our Lambda function to obtain metadata from AWS Lambda. Although not per se required, it’s interesting to cover the type of information available.
  • json
    • implements encoding and decoding of JSON.
  • fmt
    • input and output functions, such as Printf
  • log
    • we use log to provide formatted output which will be used by Cloudwatch
  • strconv
    • is used in this project to allow us to perform some formatting on time and date information
  • time
    • for displaying and measuring date and time information
  • github.com/aws/aws-lambda-go/events
    • this package is split into separate Go files, representing the various AWS services which support events.
  • github.com/aws/aws-lambda-go/lambda
    • functions, primarily for dealing with lambda handlers
  • github.com/aws/aws-sdk-go/aws
    • the generic aws package
  • github.com/aws/aws-sdk-go/aws/session
    • used for creating session clients and storing configuration information about the session
  • github.com/aws/aws-sdk-go/service/transcribeservice
    • this package is used for our operations involving the Transcribe service.

GUID function

The purpose of this function is to generate a unique identifier that can be used for our Transcribe’s job number. I chose an arbitrary format for this.

The function introduces us for the first to the time package and two of its functions, Parse and Since.
From an operational point of view, Parse is used to decode a string and cast it into a time object. Since provides information on the period of time that has elapsed since a given date/time. These on their own are fairly straightforward to understand. Then we go onto reference date/time format…

Reference Date/Time Format

One area where Go differs from any other language I’ve worked with to date is on how it deals with parsing and formatting dates and times. Instead of using classic identifiers (such as hh, mmm, ss), it uses an actual reference based format to indicate how it should be interpreted. Confused? I was!
In we look at the code for the time.format package, we can see a set of constants that are used to define these reference points. The comments on the right hand side are the actual values associated with it.

Let’s say we have a string 01-01-1970, aka 1 January 1970. We want Go to take this string and covert it to a Time object. The interpreter needs to know what represents what though.
Looking the list above

01 (our day) uses as its indicator 02
01 (our month) uses as its indicator 01
1970 (our year) uses as its indicator 2006

So our parsing string (including the dashes) for 01-01-1970 is 02-01-2006

Back to the remainder of our GUID function code :-

The time.Parse function takes as input the layout format and the string to be parsed. Now when we look at this code again, it starts to make sense:

Then, we use the ad variable as a parameter in the function time.Since, assigning strsince to value of the number of nanoseconds since that moment.

When converting the result to a string, we specify that the number should be represented as base 10 (aka decimal)

String Slices

Now we’re going to format the results of strsince into a “Windowsesque” GUID format. To do this we’re going to be using substrings with additional formatting characters.

Here’s what’s happening:

  • The value of strsince will be a 19 digit number. In my code I wanted to make it four blocks of aka characters (i.e. 20 characters)
  • For the above, a zero is added onto the beginning of the string.
  • We now get into how Go deals with creating a string slice (aka substring). Go is different from archetypal formats you might have seen for creating a substring.
  • There is no direct substring function, we refer to the string within square braces, like the array format.
    • BUT instead of a [startindex:lastindex] format (with 0 being the first item), Go uses [startindex:lastcharacternumber]
  • For example:

Does not give us a substring of 5678

This produces the string 567

Index 5 is the number 5
Character 8 is 7

When we use the concatenation above, it will result in our forthcoming Transcribe jobs having a name of the following type:-

Conclusion
In this post, we’ve covered the various packages that we’ll be using, the reference date/time format, string slices, and string formatting.

In the next blog, we’re going to kick into S3 events and Lambda.

thanks for reading! Feedback always welcome. 🙂

cheersy,

Tim

Share

Beginnings in Golang and AWS – Part III – Uploading to S3 (cont’d)

Introduction

In the previous post, we covered areas in Go, such as pointers, packages, and variables. We also closed off with using the os.Flags function for parsing of command line parameters.

Reminder: You can find the repo for this entire project at https://github.com/tim-pringle/aws-lambda-go-transcribe2srt

The specific code for the Upload package is located within src/upload

Todays post will begin to use the specific AWS specific commands, and in doing so, introduce further areas such as returned values, blank identifiers, obtaining the value from a pointer location, nil, and also conditional statements. By the end of this, we’ll be able to compile and run the code, achieving our goal of being able to upload a file to an S3 bucket.

Create a New Upload Session

Now we’re also getting into AWS side of things. In this, we create a new session, storing it in the sess variable. We then also use this variable to create a new uploader object.

There’s quite a few things going on here in this part as well, despite it only being two commands.

  • You’ll probably have already noticed the := operator, mentioned in the previous section of code. What’s different this time though is that there is a comma and _ character on the left hand side as well.
  • In Go, the output from a function is carried out via the return command. Unlike some other languages, in Go if you wish to return more than one value, it does not need to be ‘packaged’ up into an object you later have to parse. Instead, you define one or more names (solely for use within the function), with types to be returned in your function header. At the exit point of the function, you simply use the return statement along with the variables being returned that match up with the declaration. A comma is used to separate these. e.g. return x, y
  • In some circumstances you may not be interested in a specific return value from a function. However, In Go we can use also known as a blank identifier, when the program logical requires a value to be returned, but we do not want to use this value.
  • With reference to the above code, a quick look at the documentation for session.NewSessionWithOptions function tells us it returns both a session object, but also an error object. So in the code above, we are simply receiving, but discarding, the error details returned.

Now define uploader, which will allow us to use the variable upload functions for upload to S3.

Validate the File Exists

We want to make sure that the filename being referred to actually exists before attempting any upload. If the file does not exist, then we want to display the error message, and then exit the program. We use the os.Open function to test this.

 

  • We now use both variables returned by os.Open
  • What does *filename mean? Well, remember when we assigned this variable, it was  pointer that was returned, not a value. If we were to pass it in a filename, all we would be passing it would be a memory address. To tell Go to pass in the value at the memory address, we prefix this with a *
  • Next, we check what err is set to. We does this via the if err != nil condition
  • The equivalent of is not equal to in go is !=
  • An uninitialized value in Go is referred to as nil, mostly akin to null in other programming languages.

Thus, our condition could read “if the value of err is not unitialized”

The actions to be undertaken if the condition above is true is carried out within the {….} block

  • Use fmt.Println to output to the console the err.Error, which contains the error text
  • Exit the program using os.Exit, returning the error code of 1 back.

Upload the File

The final part is to carry out the upload of the object to an S3 bucket, check if the task has completed successfully.

  • First, we define the value of key. Remember that in S3, there is no such thing as either a file, or a directory. However, we are able to define a key, which will be used for referencing it. On this occasion, we simply set the value of key to the name of the file.
  • On this occasion, we’re only interested if there’s an error occurred, as opposed to the other output of the function.
  • We use the upload.Upload function, supplying it with a pointer to a memory location holding a collection of the type UploadInput, which is in turn a struct.
  • A struct is quite simply a collection of names and values, akin to what we sometimes call hashtables in other languages.
  • In our case, we are submitting values in the struct for Bucket, Key, and Body.
  • What does & mean? In Go, prefixing a variable with an ampersand instructs it to use the memory location of it, as opposed the a value. upload.Upload expects a pointer as the parameter.
  • Finally, we check err in exactly the same manner as previous in the code, outputting the error, if one occurs.

Compiling the code

That’s us finished our first program for AWS in Go! The next step is to compile the program itself.

Start a terminal session and change your current directory to the one in which the .go file is
To carry out the compile action, generating the executable, enter the following :-

  • Building a Go package requires compiling .go in a directory structure. We use the go build statement for that.
  • By default, go build uses an output name that is the same as the .go file without the suffix.
  • This command can be overwritten using -o xxxx, where xxxx is the name of the file you wish generated.

You should see output similar to that of below:

Checking the Help Text

Forgotten how we use the command? If we want to get the help text for the package we’ve just compiled, we can just use:

Giving us the following:

Seem familiar to some code from a blog or two ago?

Running the Code

Let’s run our executable now, using a file I’ve got on my desktop.

Validating Upload

Finally, let’s double check that the file has indeed successfully uploaded.

Conclusion

In this post we’ve seen how values and returned from functions and how we can use them, the use of blank identifiers to ignore information we don’t need returned, obtaining values from a pointer location, the use of nil, conditional statements, and how to compile a package. Lastly, we found out how to get help on a compiled package, and run it with parameters.

This is the first part of our three stage project out of the way. In the next part, and similar to the PowerShell blog post, we’re going to be developing code which will create a Transcribe job, using a media file we’ve uploaded to S3.

However, we’re going to make it much more funky and automagic. So in addition to Transcribe, we’ll be using S3 events, and Lambda. By the end of it, we’ll have a system in place that just requires us to drop a mp4 file into a bucket and through the wonders of Lambda, a Transcribe job will be automatically created for us.

Thanks for reading!

Share