Activities

November 2013
M T W T F S S
« Oct   Dec »
 123
45678910
11121314151617
18192021222324
252627282930  

Amazon S3 : Awesome usage of s3cmd tools.

S3cmd is awesome commandline tool to manage the bucket having tones of files and folders. One of my recent project have 5.27 Lakhs of images files and containing 2234 folders on it.

What I see on AWS S3 management console, it is hard,

a. To find a file existence among tones of files.
b. To list the files inside the folder.
c. Publishing folders to public
d. Moving huge files/folders to Amazon S3
e. Downloading files from S3
f. Setting metadata values to lot of folders and files.

As of my limited knowledge, I did not see such powerful tools like s3cmd anywhere in the web. CloudBerry Explorer and S3bucket Explorer are useful but not good for managing huge files as it is taking long to for each update.

How do I install s3cmd tool in Linux,

a. Download the respective yum repo. file from this link http://s3tools.org/repositories

-sh-3.2# wget http://s3tools.org/repo/RHEL_5/s3tools.repo
-sh-3.2# mv s3tools.repo /etc/yum.repos.d/
-sh-3.2# yum install s3cmd
Loaded plugins: downloadonly, fastestmirror
Loading mirror speeds from cached hostfile
 * addons: mirror.isoc.org.il
 * base: centos.secrel.com.br
 * epel: mirror.nexcess.net
 * extras: centos.secrel.com.br
 * rpmforge: mirror.rit.edu
 * updates: centos.secrel.com.br
s3tools                                                  | 1.3 kB     00:00
s3tools/primary                                          | 1.0 kB     00:00
s3tools                                                                     3/3
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package s3cmd.i386 0:1.0.0-4.1 set to be updated
--> Finished Dependency Resolution
Dependencies Resolved
================================================================================
 Package          Arch            Version              Repository          Size
================================================================================
Installing:
 s3cmd            i386            1.0.0-4.1            s3tools             92 k
Transaction Summary
================================================================================
Install       1 Package(s)
Upgrade       0 Package(s)
Total download size: 92 k
Is this ok [y/N]:

b. Installing from the source code

Download the source code from,

#wget http://sourceforge.net/projects/s3tools/files/s3cmd
#cd /home/installation/s3cmd-1.5.0-alpha1
#python setup.py install

That’s it !

Providing S3 Access key and Access ID to s3cmd config. file

-sh-3.2# s3cmd --configure
Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.
Access key and Secret key are your identifiers for Amazon S3
Access Key: AKsdfsdfdsfdfTG4XQ
Secret Key: 7Kl+2OdfdfdfsdfsMfV2m+isM3K
Encryption password is used to protect your files from reading
by unauthorized persons while in transfer to S3
Encryption password:
Path to GPG program [/usr/bin/gpg]:
When using secure HTTPS protocol all communication with Amazon S3
servers is protected from 3rd party eavesdropping. This method is
slower than plain HTTP and can't be used if you're behind a proxy
Use HTTPS protocol [No]:
On some networks all internet access must go through a HTTP proxy.
Try setting it here if you can't conect to S3 directly
HTTP Proxy server name:
New settings:
  Access Key: AKIAdsfdfdG4XQ
  Secret Key: 7Kl+2O2X/usdfdsfdfsdfsdsMfV2m+isM3K
  Encryption password:
  Path to GPG program: /usr/bin/gpg
  Use HTTPS protocol: False
  HTTP Proxy server name:
  HTTP Proxy server port: 0
Test access with supplied credentials? [Y/n] y

You almost done !! let’s start explore 🙂

S3cmd commands and examples

a. Creating New Bucket

-sh-3.2# s3cmd mb  s3://s3test_bucket
Bucket 's3://s3test_bucket/' created
-sh-3.2#

b. Listing buckets

-sh-3.2# s3cmd ls
2013-11-11 04:00  s3://s3test_bucket
2013-11-06 04:36  s3://testrain
2013-11-05 08:23  s3://buckjet1
2013-10-10 08:27  s3://bucket2

c. Deleting bucket

-sh-3.2# s3cmd rb s3://s3test_bucket
Bucket 's3://s3test_bucket/' removed
-sh-3.2#

Note : You need to delete all the objects from bucket before deleting. Otherwise it won’t work.

-sh-3.2# s3cmd rb s3://s3test_bucket
ERROR: S3 error: 409 (BucketNotEmpty): The bucket you tried to delete is not empty

Solution : Delete all the files from the bucket. s3cmd del s3://bucketname/*

d. How to upload files to S3 Bucket

s3cmd does not create new folder in S3. But it will copy the same local folders to s3. So you need to use the parent folder name if you want to create same folder there. I have few files (1.jpg,2.jpg and 3.jpg in a folder “image“. I want to upload this folder to S3

-sh-3.2# s3cmd put --recursive images  s3://s3test_bucket/
images/1.jpg -> s3://s3test_bucket/images/1.jpg  [1 of 3]
 3513491 of 3513491   100% in    6s   542.84 kB/s  done
images/2.jpg -> s3://s3test_bucket/images/2.jpg  [2 of 3]
 3155440 of 3155440   100% in    3s   838.35 kB/s  done
images/3.jpg -> s3://s3test_bucket/images/3.jpg  [3 of 3]
 3240736 of 3240736   100% in    3s   920.91 kB/s  done
-sh-3.2#

e. Uploading a file to S3

-sh-3.2# s3cmd put images/1.jpg  s3://s3test_bucket/

s3-1

If you want to upload directly to S3 bucket.

-sh-3.2# s3cmd put --recursive images/*  s3://s3test_bucket/

s3-2

Note: You can not modify Metadata for folders or files easily. So always use –add-header along with put command. Also it is hard to make all folders to public. You can do all it with this command.

s3cmd put –recursive setacl –acl-public –recursive –add-header=”Expires:`date -u +”%a, %d %b %Y %H:%M:%S GMT” –date “+5 years”`” –add-header=’Cache-Control:max-age=31536000, public’ img/* s3://s3test_bucket/

-sh-3.2# s3cmd put --recursive setacl --acl-public --recursive  --add-header="Expires:`date -u +"%a, %d %b %Y %H:%M:%S GMT" --date "+5 years"`" --add-header='Cache-Control:max-age=31536000, public' /root/test/img/  s3://s3test_bucket/
/root/test/img/4.jpg -> s3://s3test_bucket/4.jpg  [1 of 3]
 3513491 of 3513491   100% in    3s  1033.40 kB/s  done
Public URL of the object is: http://s3.amazonaws.com/s3test_bucket/4.jpg
/root/test/img/5.jpg -> s3://s3test_bucket/5.jpg  [2 of 3]
 3155440 of 3155440   100% in    4s   710.08 kB/s  done
Public URL of the object is: http://s3.amazonaws.com/s3test_bucket/5.jpg

f. How do I list the S3 files

-sh-3.2# s3cmd ls  s3://s3test_bucket/
                       DIR   s3://s3test_bucket/images/
2013-11-11 04:26   3513491   s3://s3test_bucket/1.jpg
2013-11-11 04:26   3155440   s3://s3test_bucket/2.jpg
2013-11-11 04:26   3240736   s3://s3test_bucket/3.jpg
-sh-3.2#

g. How do I delete a folder from S3 bucket

If you want to delete a folder “images” from s3test_bucket. Apply the following command

-sh-3.2# s3cmd del  --recursive  s3://s3test_bucket/images
File s3://s3test_bucket/images/1.jpg deleted
File s3://s3test_bucket/images/2.jpg deleted
File s3://s3test_bucket/images/3.jpg deleted
-sh-3.2#

h. Delete a file from S3 bucket

-sh-3.2# s3cmd del s3://s3test_bucket/1.jpg
File s3://s3test_bucket/1.jpg deleted
-sh-3.2#

another eg: s3cmd del –recursive –force s3://s3test_bucket/

i. How do I publish a folder globally

You can use setacl –acl-public –recursive to publish a S3 objects

 -sh-3.2# s3cmd  setacl --acl-public --recursive  s3://s3test_bucket/img/        s3://s3test_bucket/img/4.jpg: ACL set to Public  [1 of 3]
s3://s3test_bucket/img/5.jpg: ACL set to Public  [2 of 3]
s3://s3test_bucket/img/6.jpg: ACL set to Public  [3 of 3]
-sh-3.2#

j. How do I synchronize a local folder to S3 storage

You can use a “sync” parameter to synchronize the local changes to Amazon S3 cloud bucket. This parameter would help you to synchroze the local changes to Amazon S3.
sync setacl –acl-public –recursive
Local folder : /root/test/img
Destination : s3://s3test_bucket/img

Usage :

-sh-3.2# s3cmd sync  setacl --acl-public --recursive /root/test/img/ s3://s3test_bucket/img/
-sh-3.2# touch /root/test/img/test3.txt
-sh-3.2# s3cmd sync  setacl --acl-public --recursive /root/test/img/ s3://s3test_bucket/img/
/root/test/img/test3.txt -> s3://s3test_bucket/img/test3.txt  [1 of 1]
 0 of 0     0% in    0s     0.00 B/s  done
-sh-3.2#

j. How do I synchronize S3 bucket to a local folder or S3 incremental Backup

The parameter “sync sync –skip-existing –delete-removed” will check both local and remote location.
Local folder : /root/test/img
Destination : s3://s3test_bucket/img

 [root@host1 webfiles]# /usr/bin/s3cmd  sync   --skip-existing   --delete-removed     s3://s3test_bucket/    /root/test
s3://s3test_bucket/SSL-adhi.JPG -> <fdopen>  [1 of 1]
 36217 of 36217   100% in    0s   805.32 kB/s  done
Done. Downloaded 36217 bytes in 0.0 seconds, 718.89 kB/s
[root@host1 webfiles]# /usr/bin/s3cmd  sync   --skip-existing   --delete-removed     s3://s3test_bucket/    /root/test
deleted: /root/test/SSL-adhi.JPG
s3://s3test_bucket/Hub_2_node_added.jpg -> <fdopen>  [1 of 3
 31495 of 31495   100% in    0s    67.24 kB/s  done
s3://s3test_bucket/Hub_node_added.jpg -> <fdopen>  [2 of 3]
 145180 of 145180   100% in    0s   181.13 kB/s  done
s3://s3test_bucket/SE_standalone.jpg -> <fdopen>  [3 of3]
 16248 of 16248   100% in    0s    41.15 kB/s  done
Done. Downloaded 268946 bytes in 2.8 seconds, 93.20 kB/s

Pls note there is a file SSL-adhi.JPG which is uploaded first time and deleted next time as it was removed from S3 bucket. Now I’m creating a file sample.txt locally to ensure the one side mirror is happen.

[root@host1 webfiles]# /usr/bin/s3cmd  sync   --skip-existing   --delete-removed     s3://s3test_bucket/    /root/test/
[root@host1 webfiles]# touch /root/test/sample.txt
[root@host1 webfiles]# /usr/bin/s3cmd  sync   --skip-existing   --delete-removed     s3://s3test_bucket/    /root/test/
deleted: /root/test/sample.txt
[root@host1 webfiles]#

That file sample.txt deleted next time when synchronization is happen.

k. How do I set Metadata for certain type of files inside a folders.

I have few folders on s3 bucket where images and styles sheets are copied in it. Now I want to set different Expiry headers value for js, styles files as there is frequent updation happen on these files. You may need to create separate folders depending on the files type for easiness.

a. Setting Expiry Headers : –add-header=”Expires:`date -u +”%a, %d %b %Y %H:%M:%S GMT” –date “+5 years”`”
b. Setting Cache Expiration : –add-header=’Cache-Control:max-age=31536000, public’
c. Enabling gzip compression : –add-header=’Content-Encoding: gzip’

The following s3cmd may suit for common CSS and Style scripts objects in Amazon S3

-sh-3.2# s3cmd put --recursive  setacl --acl-public --recursive  --add-header="Expires:`date -u +"%a, %d %b %Y %H:%M:%S GMT" --date "+5 years"`" --add-header='Cache-Control:max-age=31536000, public'  --add-header='Content-Encoding: gzip'  /root/test/img/  s3://s3test_bucket/js-scripts/

l. How do I check the size of bucket /folders in bucket

Usage : s3cmd du -H s3://bucketname

-sh-3.2# s3cmd du  -H  s3://s3test_bucket/js-scripts
112k     s3://s3test_bucket/js-scripts
-sh-3.2# s3cmd du  -H  s3://s3test_bucket/
112k     s3://s3test_bucket/
-sh-3.2#

m. How to remove the public access of objects in a bucket

Eg : s3cmd setacl s3://s3test_bucket/secret_files –acl-private –recursive

root@cc1606 [~/test]# s3cmd setacl s3://mydomain.us/test/ --acl-private --recursive                  
s3://mydomain.us/test/asd.txt: ACL set to Private  [1 of 4]
s3://mydomain.us/test/asd1.txt: ACL set to Private  [2 of 4]
s3://mydomain.us/test/asd2.txt: ACL set to Private  [3 of 4]
s3://mydomain.us/test/sample.txt: ACL set to Private  [4 of 4]

N. Configuring multiple S3 account in s3cmd tool

You can set bash alias to associate different account to s3cmd.
a. Open the ~/.bashrc file
b. Add the lines for your different S3 Access credentials,
alias s3cmd-office=’s3cmd -c ~/.s3cfg-office’
alias s3cmd-personal=’s3cmd -c ~/.s3cfg-personal’

O. How do I upload video files to Amazon S3

if you want to stream video files over S3 storage, you may need to set certain meta data value associate with each S3 object those are uploaded and also make it to public. Otherwise whenever you try to access the url, browser will execute download function rather than playing directly from the web. You need to set –acl-public and adding meta data for mp4 video to “-m video/mp4

s3cmd sync  setacl  --acl-public  -m video/mp4   --recursive /home/mydomain/public_html/   s3://myvideofiles_bucket/

Pls note “.s3cfg-office are the config. file created through s3cmd tool.

~njoy 🙂 I’ve been spend around 9 days hardly to explore most of the s3cmd features and familiarize most of the services related to S3 happily 🙂

2 comments to Amazon S3 : Awesome usage of s3cmd tools.

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>