RDS Snapshot Restoration with S3
Auto or manually created DB snapshots are always welcome. If you'd like to export DB cluster data to S3, then you need to follow these steps.
1. Create AWS KMS(Key Management Service)
Create KMS on https://aws.amazon.com/kms/
2. Prepare IAM role for accessing bucket
Your IAM Role > Policy
and set some policies that you need like below.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ExportPolicy",
"Effect": "Allow",
"Action": [
"s3:PutObject*",
"s3:ListBucket",
"s3:GetObject*",
"s3:DeleteObject*",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::<YOUR_BUCKET_NAME>",
"arn:aws:s3:::<YOUR_BUCKET_NAME>/*"
]
}
]
}
Then, your IAM Role > Trust Relationship
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "export.rds.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Here, "Service": "export.rds.amazonaws.com"
is the key part.
Then, you can export your snapshots to S3 Bucket.
3. Check your data
Now you can download table data from .parquet
files on your S3 Buckets.
Let's check this data quick and move on as you need.
- Terminal
- parquet-cli
$ aws s3api select-object-content \
--bucket <YOUR_BUCKET_NAME> \
--key "<YOUR_PARQUET_OBJECT_KEY>" \
--expression "select * from s3object limit 10" \
--expression-type 'SQL' \
--input-serialization '{"Parquet": {}, "CompressionType": "NONE"}' \
--output-serialization '{"JSON": {}}' "output.json"
Here, <YOUR_PARQUET_OBJECT_KEY>
looks like /path/to/your/exported/parquet/file/page-num/some-uid.000.gz.parquet
$ brew install parquet-cli
$ parquet cat <YOUR_PARQUET_FILE> | awk 'BEGIN {print "["} {print (NR==1 ? "" : ","), $0} END {print "]"}'