ggtakec@gmail.com 3274f58948 Changes codes for performance(part 3)
* Summay
   This revision includes big change about temporary file and local cache file. 
   By this big change, s3fs works with good performance when s3fs opens/
   closes/syncs/reads object.
   I made a big change about the handling about temporary file and local cache
   file to do this implementation.

* Detail
1) About temporary file(local file)
   s3fs uses a temporary file on local file system when s3fs does download/
   upload/open/seek object on S3.
   After this revision, s3fs calls ftruncate() function when s3fs makes the 
   temporary file.
   In this way s3fs can set a file size of precisely length without downloading.
   (Notice - ftruncate function is for XSI-compliant systems, so that possibly
    you have a problem on non-XSI-compliant systems.)

   By this change, s3fs can download a part of a object by requesting with 
   "Range" http header. It seems like downloading by each block unit.
   The default block(part) size is 50MB, it is caused the result which is default 
   parallel requests count(5) by default multipart upload size(10MB).
   If you need to change this block size, you can change by new option 
   "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes.

   So that, you have to take care about that fdcache.cpp(and fdcache.h) were 
   changed a lot.

2) About local cache
   Local cache files which are in directory specified by "use_cache" option do 
   not have always all of object data.
   This cause is that s3fs uses ftruncate function and reads(writes) each block 
   unit of a temporary file.
   s3fs manages each block unit's status which are "downloaded area" or "not".
   For this status, s3fs makes new temporary file in cache directory which is 
   specified by "use_cache" option. This status files is in a directory which is 
   named "<use_cache sirectory>/.<bucket_name>/".

   When s3fs opens this status file, s3fs locks this file for exclusive control by 
   calling flock function. You need to take care about this, the status files can 
   not be laid on network drive(like NFS).

   This revision changes about file open mode, s3fs always opens a local cache 
   file and each status file with writable mode.
   Last, this revision adds new option "del_cache", this option means that s3fs 
   deletes all local cache file when s3fs starts and exits.

3) Uploading
   When s3fs writes data to file descriptor through FUSE request, old s3fs 
   revision downloads all of the object. But new revision does not download all, 
   it downloads only small percial area(some block units) including writing data 
   area.
   And when s3fs closes or flushes the file descriptor, s3fs downloads other area 
   which is not downloaded from server. After that,  s3fs uploads all of data.
   Already r456 revision has parallel upload function, then this revision with 
   r456 and r457 are very big change for performance.

4) Downloading
   By changing a temporary file and a local cache file, when s3fs downloads a 
   object, it downloads only the required range(some block units). 
   And s3fs downloads units by parallel GET request, it is same as a case of 
   uploading. (Maximum parallel request count and each download size are 
   specified same parameters for uploading.)

   In the new revision, when s3fs opens file, s3fs returns file descriptor soon.
   Because s3fs only opens(makes) the file descriptor with no downloading 
   data. And when s3fs reads a data, s3fs downloads only some block unit 
   including specified area.
   This result is good for performance.

5) Changes option name
   The option "parallel_upload" which added at r456 is changed to new option 
   name as "parallel_count". This reason is this option value is not only used by 
   uploading object, but a uploading object also uses this option. (For a while, 
   you can use old option name "parallel_upload" for compatibility.)



git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
2013-07-23 16:01:48 +00:00
2013-07-23 16:01:48 +00:00
2013-03-23 14:16:18 +00:00
2013-02-08 15:52:44 +00:00
2011-08-25 16:34:10 +00:00
2013-06-15 17:21:21 +00:00

THIS README CONTAINS OUTDATED INFORMATION - please refer to the wiki or --help

S3FS-Fuse

S3FS is FUSE (File System in User Space) based solution to mount/unmount an Amazon S3 storage buckets and use system commands with S3 just like it was another Hard Disk.

In order to compile s3fs, You'll need the following requirements:

* Kernel-devel packages (or kernel source) installed that is the SAME version of your running kernel
* LibXML2-devel packages
* CURL-devel packages (or compile curl from sources at: curl.haxx.se/ use 7.15.X)
* GCC, GCC-C++
* pkgconfig
* FUSE (>= 2.8.4)
* FUSE Kernel module installed and running (RHEL 4.x/CentOS 4.x users - read below)
* OpenSSL-devel (0.9.8)
* Subversion

If you're using YUM or APT to install those packages, then it might require additional packaging, allow it to be installed.

Downloading & Compiling:
------------------------
In order to download s3fs, user the following command:
svn checkout http://s3fs.googlecode.com/svn/trunk/ s3fs-read-only

Go inside the directory that has been created (s3fs-read-only/s3fs) and run: ./autogen.sh
This will generate a number of scripts in the project directory, including a configure script which you should run with: ./configure
If configure succeeded, you can now run: make. If it didn't, make sure you meet the dependencies above.
This should compile the code. If everything goes OK, you'll be greated with "ok!" at the end and you'll have a binary file called "s3fs"
in the src/ directory.

As root (you can use su, su -, sudo) do: "make install" -this will copy the "s3fs" binary to /usr/local/bin.

Congratulations. S3fs is now compiled and installed.

Usage:
------
In order to use s3fs, make sure you have the Access Key and the Secret Key handy. (refer to the wiki)
First, create a directory where to mount the S3 bucket you want to use.
Example (as root): mkdir -p /mnt/s3
Then run: s3fs mybucket[:path] /mnt/s3

This will mount your bucket to /mnt/s3. You can do a simple "ls -l /mnt/s3" to see the content of your bucket.

If you want to allow other people access the same bucket in the same machine, you can add "-o allow _other" to read/write/delete content of the bucket.

You can add a fixed mount point in /etc/fstab, here's an example:

s3fs#mybucket /mnt/s3 fuse allow_other 0 0

This will mount upon reboot (or by launching: mount -a) your bucket on your machine.

All other options can be read at: http://code.google.com/p/s3fs/wiki/FuseOverAmazon

Known Issues:
-------------
s3fs should be working fine with S3 storage. However, There are couple of limitations:

* There is no full UID/GID support yet, everything looks as "root" and if you allow others to access the bucket, others can erase files. There is, however, permissions support built in.
* Currently s3fs could hang the CPU if you have lots of time-outs. This is *NOT* a fault of s3fs but rather libcurl. This happends when you try to copy thousands of files in 1 session, it doesn't happend when you upload hundreds of files or less.
* CentOS 4.x/RHEL 4.x users - if you use the kernel that shipped with your distribution and didn't upgrade to the latest kernel RedHat/CentOS gives, you might have a problem loading the "fuse" kernel. Please upgrade to the latest kernel (2.6.16 or above) and make sure "fuse" kernel module is compiled and loadable since FUSE requires this kernel module and s3fs requires it as well.
* Moving/renaming/erasing files takes time since the whole file needs to be accessed first. A workaround could be to use s3fs's cache support with the use_cache option.

Description
FUSE-based file system backed by Amazon S3
Readme 12 MiB
Languages
C++ 85.2%
Shell 12.2%
M4 1%
C 0.7%
Makefile 0.6%
Other 0.3%