S3 URI Parsing is now available in AWS SDK for Java 2.x

May 8, 2023 By Mark Otto Off

The AWS SDK for Java team is pleased to announce the general availability of Amazon Simple Storage Service (Amazon S3) URI parsing in the AWS SDK for Java 2.x. You can now parse path-style and virtual-hosted-style S3 URIs to easily retrieve the bucket, key, region, style, and query parameters. The new parseUri() API and S3Uri class provide the highly-requested parsing features that many customers miss from the AWS SDK for Java 1.x. Please note that Amazon S3 AccessPoints and Amazon S3 on Outposts URI parsing are not supported.

Motivation

Users often need to extract important components like bucket and key from stored S3 URIs to use in S3Client operations. The new parsing APIs allow users to conveniently do so, bypassing the need for manual parsing or storing the components separately.

Getting Started

To begin, first add the dependency for S3 to your project.

<dependency> <groupId>software.amazon.awssdk</groupId> <artifactId>s3</artifactId> <version>${s3.version}</version>
</dependency>

Next, instantiate S3Client and S3Utilities objects.

S3Client s3Client = S3Client.create();
S3Utilities s3Utilities = s3Client.utilities();

Parsing an S3 URI

To parse your S3 URI, call parseUri() from S3Utilities, passing in the URI. This will return a parsed S3Uri object. If you have a String of the URI, you’ll need to convert it into an URI object first.

String url = "https://s3.us-west-1.amazonaws.com/myBucket/resources/doc.txt?versionId=abc123&partNumber=77&partNumber=88";
URI uri = URI.create(url);
S3Uri s3Uri = s3Utilities.parseUri(uri);

With the S3Uri, you can call the appropriate getter methods to retrieve the bucket, key, region, style, and query parameters. If the bucket, key, or region is not specified in the URI, an empty Optional will be returned. If query parameters are not specified in the URI, an empty map will be returned. If the field is encoded in the URI, it will be returned decoded.

Region region = s3Uri.region().orElse(null); // Region.US_WEST_1
String bucket = s3Uri.bucket().orElse(null); // "myBucket"
String key = s3Uri.key().orElse(null); // "resources/doc.txt"
boolean isPathStyle = s3Uri.isPathStyle(); // true

Retrieving query parameters

There are several APIs for retrieving the query parameters. You can return a Map<String, List<String>> of the query parameters. Alternatively, you can specify a query parameter to return the first value for the given query, or return the list of values for the given query.

Map<String, List<String>> queryParams = s3Uri.rawQueryParameters(); // {versionId=["abc123"], partNumber=["77", "88"]}
String versionId = s3Uri.firstMatchingRawQueryParameter("versionId").orElse(null); // "abc123"
String partNumber = s3Uri.firstMatchingRawQueryParameter("partNumber").orElse(null); // "77"
List<String> partNumbers = s3Uri.firstMatchingRawQueryParameters("partNumber"); // ["77", "88"]

Caveats

Special Characters

If you work with object keys or query parameters with reserved or unsafe characters, they must be URL-encoded, e.g., replace whitespace " " with "%20".

Valid:
"https://s3.us-west-1.amazonaws.com/myBucket/object%20key?query=%5Bbrackets%5D"

Invalid:
"https://s3.us-west-1.amazonaws.com/myBucket/object key?query=[brackets]"

Virtual-hosted-style URIs

If you work with virtual-hosted-style URIs with bucket names that contain a dot, i.e., ".", the dot must not be URL-encoded.

Valid:
"https://my.Bucket.s3.us-west-1.amazonaws.com/key"

Invalid:
"https://my%2EBucket.s3.us-west-1.amazonaws.com/key"

Conclusion

In this post, I discussed parsing S3 URIs in the AWS SDK for Java 2.x and provided code examples for retrieving the bucket, key, region, style, and query parameters. To learn more about how to set up and begin using the feature, visit our Developer Guide. If you are curious about how it is implemented, check out the source code on GitHub. As always, the AWS SDK for Java team welcomes bug reports, feature requests, and pull requests on the aws-sdk-java-v2 GitHub repository.