s3fs-fuse/doc/man/s3fs.1
ggtakec@gmail.com 3274f58948 Changes codes for performance(part 3)
* Summay
   This revision includes big change about temporary file and local cache file. 
   By this big change, s3fs works with good performance when s3fs opens/
   closes/syncs/reads object.
   I made a big change about the handling about temporary file and local cache
   file to do this implementation.

* Detail
1) About temporary file(local file)
   s3fs uses a temporary file on local file system when s3fs does download/
   upload/open/seek object on S3.
   After this revision, s3fs calls ftruncate() function when s3fs makes the 
   temporary file.
   In this way s3fs can set a file size of precisely length without downloading.
   (Notice - ftruncate function is for XSI-compliant systems, so that possibly
    you have a problem on non-XSI-compliant systems.)

   By this change, s3fs can download a part of a object by requesting with 
   "Range" http header. It seems like downloading by each block unit.
   The default block(part) size is 50MB, it is caused the result which is default 
   parallel requests count(5) by default multipart upload size(10MB).
   If you need to change this block size, you can change by new option 
   "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes.

   So that, you have to take care about that fdcache.cpp(and fdcache.h) were 
   changed a lot.

2) About local cache
   Local cache files which are in directory specified by "use_cache" option do 
   not have always all of object data.
   This cause is that s3fs uses ftruncate function and reads(writes) each block 
   unit of a temporary file.
   s3fs manages each block unit's status which are "downloaded area" or "not".
   For this status, s3fs makes new temporary file in cache directory which is 
   specified by "use_cache" option. This status files is in a directory which is 
   named "<use_cache sirectory>/.<bucket_name>/".

   When s3fs opens this status file, s3fs locks this file for exclusive control by 
   calling flock function. You need to take care about this, the status files can 
   not be laid on network drive(like NFS).

   This revision changes about file open mode, s3fs always opens a local cache 
   file and each status file with writable mode.
   Last, this revision adds new option "del_cache", this option means that s3fs 
   deletes all local cache file when s3fs starts and exits.

3) Uploading
   When s3fs writes data to file descriptor through FUSE request, old s3fs 
   revision downloads all of the object. But new revision does not download all, 
   it downloads only small percial area(some block units) including writing data 
   area.
   And when s3fs closes or flushes the file descriptor, s3fs downloads other area 
   which is not downloaded from server. After that,  s3fs uploads all of data.
   Already r456 revision has parallel upload function, then this revision with 
   r456 and r457 are very big change for performance.

4) Downloading
   By changing a temporary file and a local cache file, when s3fs downloads a 
   object, it downloads only the required range(some block units). 
   And s3fs downloads units by parallel GET request, it is same as a case of 
   uploading. (Maximum parallel request count and each download size are 
   specified same parameters for uploading.)

   In the new revision, when s3fs opens file, s3fs returns file descriptor soon.
   Because s3fs only opens(makes) the file descriptor with no downloading 
   data. And when s3fs reads a data, s3fs downloads only some block unit 
   including specified area.
   This result is good for performance.

5) Changes option name
   The option "parallel_upload" which added at r456 is changed to new option 
   name as "parallel_count". This reason is this option value is not only used by 
   uploading object, but a uploading object also uses this option. (For a while, 
   you can use old option name "parallel_upload" for compatibility.)



git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00

159 lines
7.9 KiB
Groff

.TH S3FS "1" "February 2011" "S3FS" "User Commands"
.SH NAME
S3FS \- FUSE-based file system backed by Amazon S3
.SH SYNOPSIS
.SS mounting
.TP
\fBs3fs bucket[:/path] mountpoint \fP [options]
.SS unmounting
.TP
\fBumount mountpoint
.SH DESCRIPTION
s3fs is a FUSE filesystem that allows you to mount an Amazon S3 bucket as a local filesystem. It stores files natively and transparently in S3 (i.e., you can use other programs to access the same files).
.SH AUTHENTICATION
The s3fs password file has this format (use this format if you have only one set of credentials):
.RS 4
\fBaccessKeyId\fP:\fBsecretAccessKey\fP
.RE
If you have more than one set of credentials, this syntax is also recognized:
.RS 4
\fBbucketName\fP:\fBaccessKeyId\fP:\fBsecretAccessKey\fP
.RE
.PP
Password files can be stored in two locations:
.RS 4
\fB/etc/passwd-s3fs\fP [0640]
\fB$HOME/.passwd-s3fs\fP [0600]
.RE
.SH OPTIONS
.SS "general options"
.TP
\fB\-h\fR \fB\-\-help\fR
print help
.TP
\fB\ \fR \fB\-\-version\fR
print version
.TP
\fB\-f\fR
FUSE foreground option - do not run as daemon.
.TP
\fB\-s\fR
FUSE singlethreaded option (disables multi-threaded operation)
.SS "mount options"
.TP
All s3fs options must given in the form where "opt" is:
<option_name>=<option_value>
.TP
\fB\-o\fR default_acl (default="private")
the default canned acl to apply to all written S3 objects, e.g., "public-read".
Any created files will have this canned acl.
Any updated files will also have this canned acl applied!
.TP
\fB\-o\fR prefix (default="") (coming soon!)
a prefix to append to all S3 objects.
.TP
\fB\-o\fR retries (default="2")
number of times to retry a failed S3 transaction.
.TP
\fB\-o\fR use_cache (default="" which means disabled)
local folder to use for local file cache.
.TP
\fB\-o\fR del_cache - delete local file cache
delete local file cache when s3fs starts and exits.
.TP
\fB\-o\fR use_rrs (default is disable)
use Amazon's Reduced Redundancy Storage.
this option can not be specified with use_sse.
(can specify use_rrs=1 for old version)
.TP
\fB\-o\fR use_sse (default is disable)
use Amazon's Server Site Encryption.
this option can not be specified with use_rrs.
(can specify use_sse=1 for old version)
.TP
\fB\-o\fR passwd_file (default="")
specify the path to the password file, which which takes precedence over the password in $HOME/.passwd-s3fs and /etc/passwd-s3fs
.TP
\fB\-o\fR public_bucket (default="" which means disabled)
anonymously mount a public bucket when set to 1, ignores the $HOME/.passwd-s3fs and /etc/passwd-s3fs files.
.TP
\fB\-o\fR connect_timeout (default="10" seconds)
time to wait for connection before giving up.
.TP
\fB\-o\fR readwrite_timeout (default="30" seconds)
time to wait between read/write activity before giving up.
.TP
\fB\-o\fR max_stat_cache_size (default="10000" entries (about 4MB))
maximum number of entries in the stat cache
.TP
\fB\-o\fR stat_cache_expire (default is no expire)
specify expire time(seconds) for entries in the stat cache
.TP
\fB\-o\fR enable_noobj_cache (default is disable)
enable cache entries for the object which does not exist.
s3fs always has to check whether file(or sub directory) exists under object(path) when s3fs does some command, since s3fs has recognized a directory which does not exist and has files or sub directories under itself.
It increases ListBucket request and makes performance bad.
You can specify this option for performance, s3fs memorizes in stat cache that the object(file or directory) does not exist.
.TP
\fB\-o\fR nodnscache - disable dns cache.
s3fs is always using dns cache, this option make dns cache disable.
.TP
\fB\-o\fR multireq_max (default="500")
maximum number of parallel request for listing objects.
.TP
\fB\-o\fR parallel_count (default="5")
number of parallel request for uploading big objects.
s3fs uploads large object(over 20MB) by multipart post request, and sends parallel requests.
This option limits parallel request count which s3fs requests at once.
It is necessary to set this value depending on a CPU and a network band.
.TP
\fB\-o\fR fd_page_size(default="52428800"(50MB))
number of internal management page size for each file discriptor.
For delayed reading and writing by s3fs, s3fs manages pages which is separated from object. Each pages has a status that data is already loaded(or not loaded yet).
This option should not be changed when you don't have a trouble with performance.
.TP
\fB\-o\fR url (default="http://s3.amazonaws.com")
sets the url to use to access Amazon S3. If you want to use HTTPS, then you can set url=https://s3.amazonaws.com
.TP
\fB\-o\fR nomultipart - disable multipart uploads
.TP
\fB\-o\fR enable_content_md5 ( default is disable )
verifying uploaded data without multipart by content-md5 header.
Enable to send "Content-MD5" header when uploading a object without multipart posting.
If this option is enabled, it has some influences on a performance of s3fs when uploading small object.
Because s3fs always checks MD5 when uploading large object, this option does not affect on large object.
.TP
\fB\-o\fR noxmlns - disable registing xml name space.
disable registing xml name space for response of ListBucketResult and ListVersionsResult etc. Default name space is looked up from "http://s3.amazonaws.com/doc/2006-03-01".
This option should not be specified now, because s3fs looks up xmlns automatically after v1.66.
.TP
\fB\-o\fR nocopyapi - for other incomplete compatibility object storage.
For a distributed object storage which is compatibility S3 API without PUT(copy api).
If you set this option, s3fs do not use PUT with "x-amz-copy-source"(copy api). Because traffic is increased 2-3 times by this option, we do not recommend this.
.TP
\fB\-o\fR norenameapi - for other incomplete compatibility object storage.
For a distributed object storage which is compatibility S3 API without PUT(copy api).
This option is a subset of nocopyapi option. The nocopyapi option does not use copy-api for all command(ex. chmod, chown, touch, mv, etc), but this option does not use copy-api for only rename command(ex. mv).
If this option is specified with nocopapi, the s3fs ignores it.
.SH FUSE/MOUNT OPTIONS
.TP
Most of the generic mount options described in 'man mount' are supported (ro, rw, suid, nosuid, dev, nodev, exec, noexec, atime, noatime, sync async, dirsync). Filesystems are mounted with '-onodev,nosuid' by default, which can only be overridden by a privileged user.
.TP
There are many FUSE specific mount options that can be specified. e.g. allow_other. See the FUSE README for the full set.
.SH NOTES
.TP
Maximum file size=64GB (limited by s3fs, not Amazon).
.TP
If enabled via the "use_cache" option, s3fs automatically maintains a local cache of files in the folder specified by use_cache. Whenever s3fs needs to read or write a file on S3, it first downloads the entire file locally to the folder specified by use_cache and operates on it. When fuse_release() is called, s3fs will re-upload the file to S3 if it has been changed. s3fs uses md5 checksums to minimize downloads from S3.
.TP
The folder specified by use_cache is just a local cache. It can be deleted at any time. s3fs rebuilds it on demand.
.TP
Local file caching works by calculating and comparing md5 checksums (ETag HTTP header).
.TP
s3fs leverages /etc/mime.types to "guess" the "correct" content-type based on file name extension. This means that you can copy a website to S3 and serve it up directly from S3 with correct content-types!
.SH BUGS
Due to S3's "eventual consistency" limitations, file creation can and will occasionally fail. Even after a successful create, subsequent reads can fail for an indeterminate time, even after one or more successful reads. Create and read enough files and you will eventually encounter this failure. This is not a flaw in s3fs and it is not something a FUSE wrapper like s3fs can work around. The retries option does not address this issue. Your application must either tolerate or compensate for these failures, for example by retrying creates or reads.
.SH AUTHOR
s3fs has been written by Randy Rizun <rrizun@gmail.com>.