When Marvin Gaye Met Amazon Transcribe & PowerShell – Automating Subtitle Creation – Part III

“I Heard it Through the Grape Vernon”

Part two of this series saw us put the code in place to allow us to upload the media file, create the transcription job, and download the results. With this complete, it’s time to move on to the next, and final, stage. That is, processing the JSON file, and creating an SRT file from it.

NB. You can find the code used in this blog, and additional documentation, at my Github project, aws-powershell-transcribe2srt:

Initially, we’ll read the contents of the json file into a variable and specifically use the items section, which contains the word-by-word breakdown. At the same time, some other variables are set with defaults.

Next, the Transcription variable needs to be processed. There’s several things that need to be taken into account:

  • The obvious one is that we need to parse through each item in the object, requiring a loop.
  • We also want to ensure that the number of words displayed per line does not exceed recommendations
  • In relation to the above, we also need to define the end time for each sequence the same as the last word.
  • Lastly, punctuation needs to be taken into account.

Outer Loop – Beginning

Using a while loop, the variable $strlen is set to zero. This variable will contain a count of the number of characters in the current sequence being processed.

Process the Start Time Attribute

Then, the time at which the word was said, start_time is read. In its original json format, this is as below.

However, the format for an SRT file of a time element consists of the number of hours, minutes, seconds, and milliseconds. All but the last of these use a fixed two character zero-padded digit format. The last uses three digits.

We’ll convert this string into the required style by first converting it to a timespan object, and then using string formatting to set it as required. Variables are also set for the subtitle text, the sequence number, and a flag to indicate that we are beginning the first line of this subtitle sequence. A variable is set for the end time too, and the process continues until 64 characters have been exceeded.

Inner Loop

An inner loop is also required, since we want the subtitles to be refreshed after two lines and with a maximum of 64 characters. Whether the item is a pronunciation (aka word) or punctuation needs to be taken into account, as well as setting the end time marker for the sequence when two lines have been occupied.

The type of item is read (pronunciation or punctuation), its content, and the length of the string tally is increased in line with this. Based on the type of item, the subtitle string is appended accordingly. When the length of the string exceeds 32 characters, and we are still on the first row, a new line character is added, and the variable indicator set to indicate that the subsequent content will be on the second line.

Outer Loop – End

When the inner loop is complete, it signifies a new sequence is ready. This part simply creates the appropriate representation of the sequence as a string, and appends it to the variable holding the entire contents of what will become the SRT file.

Writing the SRT File

Lastly, the contents of the $srtinfo variable are written to file.

Viewing the Results

At this point, it really depends on what you want to do with the SRT file and accompanying media one. Media players like VLC allow you to manually add an SRT file to a playing video, and most current televisions with USB will be happy to display the subtitles provided the filenames (without extension) match.
If you really want to go full (non-pron) hardcore, you could add the SRT info as a stream directly into the media file, using a tool like FFMPEG, which allows you to multiplex, or even to “burn” the subtitles onto the video. Using this method, vloggers really wanting to reach their audience could make multiple language subtitles available in this video file.


The combination of AWS services S3 and Transcribe, coupled with PowerShell and their AWS module for it, make it a relatively straightforward process to obtain a transcription of a media file which can be converted to SRT format for later use.

Also, as a final word, bear in mind that as a service meant for transcribing relatively silent environments with spoken (not sung) words, sometimes the results of a media file can be a little bit…misunderstood. Often with humerous results… 🙂

Marvin’s on his own

Let’s help him with some words

Thats Marvin SRTd

Wrong words but’s let’s karaoke anyway

Thanks for reading, and feedback always welcome.

Coming soon…a slight departure from the norm with an advanced version of this but using Golang.

Think i need to see about getting me another domain name…

Shares 0

Leave a Reply

Your email address will not be published. Required fields are marked *