AWS SDK Exploration in Node

AWS SDK Exploration in Node

Installing AWS SDK for JavaScript

In this blog, I am going to show you how to install and use AWS SDK for Javascript. In this case, we will be using it with NodeJS although it can be used in the regular javascript in the browser. As you may already know, everything about AWS is about API calls. Each different service has its own set of APIs and you invoke this API to interact with AWS resources. Every API call that is made must be signed and authenticated in other to be successful. Luckily, with AWS SDK much of the complexity involved with interacting with these AWS APIs are handled for you. Taking a look at the cloud9, we can see how this is set up: In Cloud9, it is really easy to download the AWS SDK for node. We verify the version of node by running the following command at the cloud9 terminal:

node  -v

Now, we use npm to install the AWS SDK for javascript as follows:

npm install aws-sdk

The code we are going to consider is going to read dragon data from an s3 bucket using what is called Amazon S3 Select. S3 select is a feature of s3 that allows you to read data out of an s3 object using sql queries. Being able to query a subset of your data from an s3 object is actually really powerful. As you may know, s3 charges you based on number of requests, the amount of data and the data transfered. You might be looking for ways to reduce the amount of data you are reading and therefore transferring from the bucket in other to save cost or to save on processing time.

Oftentimes when working with larger datasets, you might download the entire object from S3 and then go through it line by line searching for the relevant data. But with S3 Select, you can just submit an SQL query and S3 will only return the relevant data. This is cheaper and faster than downloading the entire object when you don't need the whole object. All of the processing and filtering happens on the S3 side, not in your code. We'll be using this API throughout this example as a means to query our dragon data, which is being stored in S3. S3 Select only supports certain types of data to be queried. In our situation, the data is in JSON format, which S3 Select does support. Another AWS API we are calling in the code snippet we are about to go over is AWS Systems Manager Parameter Store. Parameter Store lets you store key-value pairs that are user-defined and we are going to use this service to store our S3 bucket name where the dragon data is located, as well as the file name or the key. This way, if the bucket name or the key ever changes, we won't need to go back in and change the code. Instead, the parameter can be updated in Parameter Store and the code will always pull the latest information every time it is run. That's all the code we are about to see will be doing for now, and we will be adding and modifying this code over time. I do recommend that you go through the AWS SDK documentation for JavaScript here: 'docs.aws.amazon.com/sdk-for-javascript/v3/d.. and read through the services, see what exists, and try incorporating different AWS services into your code.

Now, lets see our file we named listDragons.js.You can open this at the cloud9 environment.

var AWS =require("aws-sdk")

const s3 = new AWS.S3({
    region:"us-east-1"
})
const ssm = new AWS.SSM({
    region:"us-east-1"
})

async function readDragons(){
     var fileName = await getFileName()
     var bucketName  = await getBucketName()
     return  readDragonsFromS3(bucketName, fileName)
}

async function getFileName(){
   var fileNameParams = {
     Name: 'dragon_data_file_name',
     WithDecryption:false
   };
    var promise = await ssm.getParameter(fileNameParams).promise();
    return promise.Parameter.Value;
}

async function getBucketName(){
    var bucketNameParams ={
      Name:'dragon_data_bucket_name',
      WithDecryption:false

    };
   var promise = await ssm.getParameter(bucketNameParams).promise();
   return promise.Parameter.Value;

}

function readDragonsFromS3(bucketName, fileName){
  s3.SelectObjectContent({
     Bucket:bucketName,
     Expression:'select * from s3object s',
     ExpressionType:'SQL',
     key:fileName,
     InputSerialization:{
        JSON:{
           Type:'DOCUMENT',
        }
     },
    OutputSerialization:{
       JSON:{
          RecordDelimiter: ','
       }
    }
    }, function(err, data){
        if(err){
           console.log(err)
        }else{
            handleData(data)

        }
    })
}


function handleData(data) {
   data.Payload.on('data', (event)=>{
     if(event.Records){
         console.log(event.Records.Payload.toString())
     }
   })
}

readDragons()

STEP BY STEP CODE EXPLANATION

From the code, we are loading the AWS package into the Node app using require. This is so you can use the AWS SDK in your script.This means that this var AWS equals require AWS SDK.

var AWS = require("aws-sdk")

Next, we are creating the service objects for S3 and Parameter Store. The way these SDK works is it provides access to AWS services through what are called client classes. From these client classes, you create service objects. AWS services have one or more client classes that offer low-level APIs for using service features and resources.

const s3 = new AWS.S3({
    region:"us-east-1"
});

const ssm = new AWS.SSM({
    region:"us-east-1"
});

For example, in our code here, Amazon S3 APIs are available through the AWS.S3 class. Once you have a service object created, you can then invoke the methods on that client which will call the AWS APIs for you.

Just know that whatever you are trying to do with AWS, always read the SDK documentation as you work through and try to figure out what methods exist on the clients and how to use them for whatever AWS service you are trying to interact with.

The way that I've organized this code, I have this function here called readDragons that then calls three other methods.

  async function readDragons(){
     var fileName = await getFileName()
     var bucketName  = await getBucketName()
     return  readDragonsFromS3(bucketName, fileName)
}

The first method it's calling is getFileName(). That's this method shown below

 async function getFileName(){
   var fileNameParams = {
     Name: 'dragon_data_file_name',
     WithDecryption:false
   };
    var promise = await ssm.getParameter(fileNameParams).promise();
    return promise.Parameter.Value;
}

This is setting up some parameters for Parameter Store. We're setting up some parameters with the name of the parameter that we want to download, as well as whether we need to decrypt it or not. This particular parameter was not encrypted, that's set to false. Then we are calling await ssm.getParameter(), passing in the parameters.promise(). We're storing this promise here and then we are returning promise.Parameter.Value. This is pulling that value off of that parameter. In real life, you probably want to have some error handling here. But just to clearly demonstrate the purpose, we're keeping it simple. We take that file name and save it into this variable file name.

Next, we call getBucketName().

async function getBucketName(){
    var bucketNameParams ={
      Name:'dragon_data_bucket_name',
      WithDecryption:false

    };
   var promise = await ssm.getParameter(bucketNameParams).promise();
   return promise.Parameter.Value;

}

This is doing the exact same thing as getFileName(), but just getting a different parameter from parameter store. We're setting up the bucket name parameters. The bucket name parameter is called "dragon_data_bucket_name. " It is not encrypted. Then we call ssm.getParameter(), passing in those params. Then promise(), storing that promise and then returning promise.Parameter.Value. After we have our filename and our bucket name, we are then calling readDragonsFromS3() function, passing both the bucketName and the fileName in as parameters.

In this readDragonsFromS3() function show below:


function readDragonsFromS3(bucketName, fileName){
  s3.SelectObjectContent({
     Bucket:bucketName,
     Expression:'select * from s3object s',
     ExpressionType:'SQL',
     key:fileName,
     InputSerialization:{
        JSON:{
           Type:'DOCUMENT',
        }
     },
    OutputSerialization:{
       JSON:{
          RecordDelimiter: ','
       }
    }
    }, function(err, data){
        if(err){
           console.log(err)
        }else{
           handleData(data)
        }
    })
}

we are calling the s3 Client and then SelectObjectContent({}). This is the call to S3 select. SelectObjectContent({}) is the name of the API call for S3 Select. For this one, we're passing in the parameters bucketName expression. In this case, the expression is a select star from S3 object. We're just pulling back all of the information in that JSON object being stored in S3. In reality, you probably want to use S3 Select for passing in SQL queries that do some filtering. You would be looking for something that had a where Clause, because the real power of S3 Select is to pick and choose specific pieces of information out of a large JSON object.

The expression type is SQL and then we have the input serialization and output serialization both set to JSON, and the key name is the fileName that we grabbed from parameter store. Then on error, we are going to just do a log. In real life again, you probably want to have some more robust error handling. Otherwise, if it's successful, we want to handle that data. That calls this handleData() method right here:


function handleData(data) {
   data.Payload.on('data', (event)=>{
     if(event.Records){
         console.log(event.Records.Payload.toString())
     }
   })
}

We pass in the data and then payload on the data for this event, we are just going to simply log the events that came back. We aren't doing anything with this data yet, we're just logging it so that we can see the output. Then we are of course calling readDragons() method:

readDragons()

Calling the readDragons() function would kick off this whole process.

Summary

In Summary, we discussed how to install and use AWS SDK with NodeJS although it can be used in the regular javascript in the browser and we discovered that everything about AWS is about API calls. Each different service has its own set of APIs and you invoke this API to interact with AWS resources. Every API call that is made must be signed and authenticated in other to be successful.