Select

Amazon Web Services Simple Storage Service (S3) Operations

Amazon Web Services Simple Storage Service (S3) Operations

In this blog we would discuss on S3 operations and the key functions.

AWS S3 has buckets and objects. Let’s discuss the key operations of buckets and objects which can be utilized to perform more sophisticated data transformations. There are also some data management functions which can be used for uploading and managing large files.

Also read: Amazon Web Services Simple Storage Service (S3) Features

Also read: Amazon Web Services Simple Storage Service (S3) Security

AWS Bucket and Objects Operations

Listing

  • S3 allows listing of all the keys within a bucket
  • A single listing request would return a max of 1000 object keys with pagination support using an indicator in the response to indicate if the response was truncated
  • Keys within a bucket can be listed using Prefix and Delimiter
  • Prefix limits result to only those keys (kind of filtering) that begin with the specified prefix, and delimiter causes the list to roll up all keys that share a common prefix into a single summary list result

Retrieval

  • An object can be retrieved as a whole
  • An object can be retrieved in parts or partially (specific range of bytes) by using the Range HTTP header
  • Range HTTP header is helpful if only a partial object is needed for e.g. multiple files were uploaded as a single archive for fault-tolerant downloads where the network connectivity is poor
  • Objects can also be downloaded by sharing pre-signed URLs
  • Metadata of the object is returned in the response headers

Object Uploads

  • Single operation – Objects of 5GB in size can be uploaded in a single PUT operation
  • Multipart upload – Can be used for objects of size > 5GB and supports the max size of 5TB. It is recommended for objects above size 100MB
  • Pre-signed URLs can also be used shared for uploading objects
  • Objects if uploaded successfully can be verified if the request received a successful response. Additionally, returned ETag can be compared to the calculated MD5 value of the upload object

Copying Objects

  • Copying of objects up to 5GB can be performed using a single operation and multipart upload can be used for uploads up to 5TB
  • When an object is copied user-controlled system metadata e.g. storage class and user-defined metadata are also copied
  • System controlled metadata e.g. the creation date etc is reset
  • Copying Objects can be needed to create multiple object copies
  • Copy object across locations or regions
  • Renaming of the objects
  • Change object metadata for e.g. storage class, encryption, etc
  • Updating any metadata for an object requires all the metadata fields to be specified again

Deleting Objects

  • S3 allows deletion of a single object or multiple objects (max 1000) in a single call
  • For non-versioned buckets, the object key needs to be provided and the object is permanently deleted
  • For versioned buckets, if an object key is provided, S3 inserts a delete marker, and the previous current object becomes the non-current object
  • If an object key with a version ID is provided, the object is permanently deleted
  • If the version ID is of the delete marker, the delete marker is removed and the previous non-current version becomes the current version object
  • Deletion can be MFA enabled for adding extra security

Restoring Objects from Glacier

  • Objects must be restored before accessing an archived object
  • Restoration of an object can take about 3 to 5 hours for standard retrievals. S3 Glacier now offers expedited retrievals within minutes
  • Restoration request also needs to specify the number of days for which the object copy needs to be maintained.
  • During this period, storage cost applies for both the archive and the copy

AWS S3 Key Functions

S3 also provides some key functions which will be handy when working on large data sets and data migration. Let’s discuss them below:

Pre-Signed URLs

  • All buckets and objects are by default private
  • Pre-signed URLs allows user to be able to download or upload a specific object without requiring AWS security credentials or permissions
  • Pre-signed URLs allow anyone access to the object identified in the URL, provided the creator of the URL has permissions to access that object
  • Creation of the pre-signed URLs requires the creator to provide his security credentials, specify a bucket name, an object key, an HTTP method (GET for download object and PUT of uploading objects), and expiration date and time
  • Pre-signed URLs are valid only until the expiration date and time

Multipart Upload

  • Multipart upload allows the user to upload a single large object as a set of parts. Each part is a contiguous portion of the object’s data
  • Multipart uploads support 1 to 10000 parts, and each part can be from 5MB to 5GB with last part size allowed to be less than 5MB
  • Multipart uploads allow max upload size of 5TB
  • Object parts can be uploaded independently and in any order. If transmission of any part fails, it can be retransmitted without affecting other parts
  • After all parts of the object are uploaded and complete initiated, S3 assembles these parts and creates the object
  • Using multipart upload provides the following advantages:
    • Improved throughput – Parallel upload of parts to improve throughput
    • Quick recovery from any network issues – Smaller part size minimizes the impact of restarting a failed upload due to a network error
    • Pause and resume object uploads – Object parts can be uploaded over time. Once a multipart upload is initiated there is no expiry; you must explicitly complete or abort the multipart upload
    • Begin an upload before the final object size is known – an object can be uploaded as-is it being created
  • Three step process:
    • Multipart Upload Initiation
      • Initiation of a Multipart upload request to S3 returns a unique ID for each multipart upload
      • This ID needs to be provided for each part uploads, completion, or abort request and listing of parts call
      • All the object metadata required needs to be provided during the Initiation call
    • Parts Upload
      • Parts upload of objects can be performed using the unique upload ID
      • A part number (between 1 – 10000) needs to be specified with each request which identifies each part and its position in the object
      • If a part with the same part number is uploaded, the previous part would be overwritten
      • After the part upload is successful, S3 returns an ETag header in the response which must be recorded along with the part number to be provided during the multipart completion request
    • Multipart Upload Completion or Abort
      • On Multipart Upload Completion request, S3 creates an object by concatenating the parts in ascending order based on the part number and associates the metadata with the object
      • Multipart upload completion request should include the unique upload ID with all the parts and the ETag information
      • S3 response includes an ETag that uniquely identifies the combined object data
      • On multipart upload abort request, the upload is aborted, and all parts are removed. Any new part upload would fail. However, any in-progress part upload is completed, and hence an abort request must be sent after all the parts uploads have been completed
      • S3 should receive a multipart upload completion or abort request else it will not delete the parts and storage would be charged

S3 Access Points

  • S3 Access Points simplify data access for any AWS service or customer application that stores data in S3
  • Access Points named network endpoints are attached to buckets and can be used to perform S3 object operations, such as GetObject and PutObject
  • Each Access Point has distinct permissions and network controls that S3 applies for any request that is made through that Access Point
  • Each Access Point enforces a customized access point policy that works in conjunction with the bucket policy that is attached to the underlying bucket
  • An Access Point can be configured to accept requests only from a VPC to restrict S3 data access to a private network
  • Custom block public access settings can be configured for each Access Point

S3 Transfer Acceleration

  • S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between the client and an S3 bucket
  • Transfer Acceleration takes advantage of CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to S3 over an optimized network path

S3 Batch Operations

  • S3 Batch Operations help perform large-scale batch operations on S3 objects and can perform a single operation on lists of specified S3 objects
  • A single job can perform a specified operation on billions of objects containing exabytes of data
  • S3 tracks progress, sends notifications, and stores a detailed completion report of all actions, providing a fully managed, auditable, and serverless experience
  • S3 Batch Operations can be used with S3 Inventory to get the object list and use S3 Select to filter the objects
  • S3 Batch Operations can be used for copying objects, modify object metadata, applying ACLs, encrypting objects, transforming objects, invoke a custom lambda function, etc

AWS S3 for Seamless Data Operations

As you can see, S3 provides multiple functions to perform the data transformation, data management, and large file uploads and batch operations.

Apexon offers comprehensive cloud consulting and engineering capabilities to support customers’ digital initiatives including cloud strategy, migration, service discovery, and public/private cloud optimization. Our partnerships with AWS, Azure and GCP also equip us to unearth the full potential of these platforms for our clients. If you’re interested in learning more, check out Apexon’s Cloud Native Platform Engineering services or get in touch directly using the form below. 

Interested in our Cloud Services?

Contact Apexon +1 408-727-1100

By submitting this form, you agree that you have read and understand Apexon’s Terms and Conditions. You can opt-out of communications at any time. We respect your privacy.

By submitting this form, you agree that you have read and understand Apexon’s Terms and Conditions. You can opt-out of communications at any time. We respect your privacy.