Amazon Web Services Simple Storage Service (S3) Operations

Venkateshwar Rao Nagapuri

Manager

Nov 29, 2022 |

Posted in Cloud

In this blog we would discuss on S3 operations and the key functions.

AWS S3 has buckets and objects. Let’s discuss the key operations of buckets and objects which can be utilized to perform more sophisticated data transformations. There are also some data management functions which can be used for uploading and managing large files.

Also read: Amazon Web Services Simple Storage Service (S3) Features

Also read: Amazon Web Services Simple Storage Service (S3) Security

AWS Bucket and Objects Operations

Listing

S3 allows listing of all the keys within a bucket
A single listing request would return a max of 1000 object keys with pagination support using an indicator in the response to indicate if the response was truncated
Keys within a bucket can be listed using Prefix and Delimiter
Prefix limits result to only those keys (kind of filtering) that begin with the specified prefix, and delimiter causes the list to roll up all keys that share a common prefix into a single summary list result

Retrieval

An object can be retrieved as a whole
An object can be retrieved in parts or partially (specific range of bytes) by using the Range HTTP header
Range HTTP header is helpful if only a partial object is needed for e.g. multiple files were uploaded as a single archive for fault-tolerant downloads where the network connectivity is poor
Objects can also be downloaded by sharing pre-signed URLs
Metadata of the object is returned in the response headers

Object Uploads

Single operation – Objects of 5GB in size can be uploaded in a single PUT operation
Multipart upload – Can be used for objects of size > 5GB and supports the max size of 5TB. It is recommended for objects above size 100MB
Pre-signed URLs can also be used shared for uploading objects
Objects if uploaded successfully can be verified if the request received a successful response. Additionally, returned ETag can be compared to the calculated MD5 value of the upload object

Copying Objects

Copying of objects up to 5GB can be performed using a single operation and multipart upload can be used for uploads up to 5TB
When an object is copied user-controlled system metadata e.g. storage class and user-defined metadata are also copied
System controlled metadata e.g. the creation date etc is reset
Copying Objects can be needed to create multiple object copies
Copy object across locations or regions
Renaming of the objects
Change object metadata for e.g. storage class, encryption, etc
Updating any metadata for an object requires all the metadata fields to be specified again

Deleting Objects

S3 allows deletion of a single object or multiple objects (max 1000) in a single call
For non-versioned buckets, the object key needs to be provided and the object is permanently deleted
For versioned buckets, if an object key is provided, S3 inserts a delete marker, and the previous current object becomes the non-current object
If an object key with a version ID is provided, the object is permanently deleted
If the version ID is of the delete marker, the delete marker is removed and the previous non-current version becomes the current version object
Deletion can be MFA enabled for adding extra security

Restoring Objects from Glacier

Objects must be restored before accessing an archived object
Restoration of an object can take about 3 to 5 hours for standard retrievals. S3 Glacier now offers expedited retrievals within minutes
Restoration request also needs to specify the number of days for which the object copy needs to be maintained.
During this period, storage cost applies for both the archive and the copy

AWS S3 Key Functions

S3 also provides some key functions which will be handy when working on large data sets and data migration. Let’s discuss them below:

Pre-Signed URLs

All buckets and objects are by default private
Pre-signed URLs allows user to be able to download or upload a specific object without requiring AWS security credentials or permissions
Pre-signed URLs allow anyone access to the object identified in the URL, provided the creator of the URL has permissions to access that object
Creation of the pre-signed URLs requires the creator to provide his security credentials, specify a bucket name, an object key, an HTTP method (GET for download object and PUT of uploading objects), and expiration date and time
Pre-signed URLs are valid only until the expiration date and time

Multipart Upload

Multipart upload allows the user to upload a single large object as a set of parts. Each part is a contiguous portion of the object’s data
Multipart uploads support 1 to 10000 parts, and each part can be from 5MB to 5GB with last part size allowed to be less than 5MB
Multipart uploads allow max upload size of 5TB
Object parts can be uploaded independently and in any order. If transmission of any part fails, it can be retransmitted without affecting other parts
After all parts of the object are uploaded and complete initiated, S3 assembles these parts and creates the object
Using multipart upload provides the following advantages:
- Improved throughput – Parallel upload of parts to improve throughput
- Quick recovery from any network issues – Smaller part size minimizes the impact of restarting a failed upload due to a network error
- Pause and resume object uploads – Object parts can be uploaded over time. Once a multipart upload is initiated there is no expiry; you must explicitly complete or abort the multipart upload
- Begin an upload before the final object size is known – an object can be uploaded as-is it being created
Three step process:
- Multipart Upload Initiation
  - Initiation of a Multipart upload request to S3 returns a unique ID for each multipart upload
  - This ID needs to be provided for each part uploads, completion, or abort request and listing of parts call
  - All the object metadata required needs to be provided during the Initiation call
- Parts Upload
  - Parts upload of objects can be performed using the unique upload ID
  - A part number (between 1 – 10000) needs to be specified with each request which identifies each part and its position in the object
  - If a part with the same part number is uploaded, the previous part would be overwritten
  - After the part upload is successful, S3 returns an ETag header in the response which must be recorded along with the part number to be provided during the multipart completion request
- Multipart Upload Completion or Abort
  - On Multipart Upload Completion request, S3 creates an object by concatenating the parts in ascending order based on the part number and associates the metadata with the object
  - Multipart upload completion request should include the unique upload ID with all the parts and the ETag information
  - S3 response includes an ETag that uniquely identifies the combined object data
  - On multipart upload abort request, the upload is aborted, and all parts are removed. Any new part upload would fail. However, any in-progress part upload is completed, and hence an abort request must be sent after all the parts uploads have been completed
  - S3 should receive a multipart upload completion or abort request else it will not delete the parts and storage would be charged

S3 Access Points

S3 Access Points simplify data access for any AWS service or customer application that stores data in S3
Access Points named network endpoints are attached to buckets and can be used to perform S3 object operations, such as GetObject and PutObject
Each Access Point has distinct permissions and network controls that S3 applies for any request that is made through that Access Point
Each Access Point enforces a customized access point policy that works in conjunction with the bucket policy that is attached to the underlying bucket
An Access Point can be configured to accept requests only from a VPC to restrict S3 data access to a private network
Custom block public access settings can be configured for each Access Point

S3 Transfer Acceleration

S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between the client and an S3 bucket
Transfer Acceleration takes advantage of CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to S3 over an optimized network path

S3 Batch Operations

S3 Batch Operations help perform large-scale batch operations on S3 objects and can perform a single operation on lists of specified S3 objects
A single job can perform a specified operation on billions of objects containing exabytes of data
S3 tracks progress, sends notifications, and stores a detailed completion report of all actions, providing a fully managed, auditable, and serverless experience
S3 Batch Operations can be used with S3 Inventory to get the object list and use S3 Select to filter the objects
S3 Batch Operations can be used for copying objects, modify object metadata, applying ACLs, encrypting objects, transforming objects, invoke a custom lambda function, etc

AWS S3 for Seamless Data Operations

As you can see, S3 provides multiple functions to perform the data transformation, data management, and large file uploads and batch operations.

Apexon offers comprehensive cloud consulting and engineering capabilities to support customers’ digital initiatives including cloud strategy, migration, service discovery, and public/private cloud optimization. Our partnerships with AWS, Azure and GCP also equip us to unearth the full potential of these platforms for our clients. If you’re interested in learning more, check out Apexon’s Cloud Native Platform Engineering services or get in touch directly using the form below.

Interested in our Cloud Services?

By submitting this form, you agree that you have read and understand Apexon’s Terms and Conditions. You can opt-out of communications at any time. We respect your privacy.

Amazon Web Services Simple Storage Service (S3) Operations

Interested in our Cloud Services?

Subscribe to our Newsletter