s3fs-fuse/src/fdcache.cpp

1937 lines
52 KiB
C++
Raw Normal View History

/*
* s3fs - FUSE-based file system backed by Amazon S3
*
* Copyright 2007-2013 Takeshi Nakatani <ggtakec.com>
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*/
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/types.h>
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
#include <sys/time.h>
#include <sys/file.h>
#include <stdint.h>
#include <unistd.h>
#include <pthread.h>
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
#include <syslog.h>
#include <errno.h>
#include <string.h>
#include <assert.h>
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
#include <curl/curl.h>
#include <string>
#include <iostream>
#include <sstream>
#include <map>
#include <list>
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
#include <vector>
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
#include "common.h"
#include "fdcache.h"
#include "s3fs.h"
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
#include "s3fs_util.h"
#include "string_util.h"
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
#include "curl.h"
using namespace std;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
//------------------------------------------------
// Symbols
//------------------------------------------------
2014-03-30 07:53:41 +00:00
#define MAX_MULTIPART_CNT 10000 // S3 multipart max count
//
// For cache directory top path
//
#if defined(P_tmpdir)
#define TMPFILE_DIR_0PATH P_tmpdir
#else
#define TMPFILE_DIR_0PATH "/tmp"
#endif
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
//------------------------------------------------
// CacheFileStat class methods
//------------------------------------------------
bool CacheFileStat::MakeCacheFileStatPath(const char* path, string& sfile_path, bool is_create_dir)
{
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
// make stat dir top path( "/<cache_dir>/.<bucket_name>.stat" )
string top_path = FdManager::GetCacheDir();
top_path += "/.";
top_path += bucket;
top_path += ".stat";
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(is_create_dir){
int result;
if(0 != (result = mkdirp(top_path + mydirname(path), 0777))){
S3FS_PRN_ERR("failed to create dir(%s) by errno(%d).", path, result);
return false;
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
if(!path || '\0' == path[0]){
sfile_path = top_path;
}else{
sfile_path = top_path + SAFESTRPTR(path);
}
return true;
}
bool CacheFileStat::CheckCacheFileStatTopDir(void)
{
if(!FdManager::IsCacheDir()){
return true;
}
// make stat dir top path( "/<cache_dir>/.<bucket_name>.stat" )
string top_path = FdManager::GetCacheDir();
top_path += "/.";
top_path += bucket;
top_path += ".stat";
return check_exist_dir_permission(top_path.c_str());
}
bool CacheFileStat::DeleteCacheFileStat(const char* path)
{
if(!path || '\0' == path[0]){
return false;
}
// stat path
string sfile_path;
if(!CacheFileStat::MakeCacheFileStatPath(path, sfile_path, false)){
S3FS_PRN_ERR("failed to create cache stat file path(%s)", path);
return false;
}
if(0 != unlink(sfile_path.c_str())){
if(ENOENT == errno){
S3FS_PRN_DBG("failed to delete file(%s): errno=%d", path, errno);
}else{
S3FS_PRN_ERR("failed to delete file(%s): errno=%d", path, errno);
}
return false;
}
return true;
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
//------------------------------------------------
// CacheFileStat methods
//------------------------------------------------
CacheFileStat::CacheFileStat(const char* tpath) : path(""), fd(-1)
{
if(tpath && '\0' != tpath[0]){
SetPath(tpath, true);
}
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
CacheFileStat::~CacheFileStat()
{
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
Release();
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
bool CacheFileStat::SetPath(const char* tpath, bool is_open)
{
if(!tpath || '\0' == tpath[0]){
return false;
}
if(!Release()){
// could not close old stat file.
return false;
}
if(tpath){
path = tpath;
}
if(!is_open){
return true;
}
return Open();
}
bool CacheFileStat::Open(void)
{
if(0 == path.size()){
return false;
}
if(-1 != fd){
// already opened
return true;
}
// stat path
string sfile_path;
if(!CacheFileStat::MakeCacheFileStatPath(path.c_str(), sfile_path, true)){
S3FS_PRN_ERR("failed to create cache stat file path(%s)", path.c_str());
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
return false;
}
// open
if(-1 == (fd = open(sfile_path.c_str(), O_CREAT|O_RDWR, 0600))){
S3FS_PRN_ERR("failed to open cache stat file path(%s) - errno(%d)", path.c_str(), errno);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
return false;
}
// lock
if(-1 == flock(fd, LOCK_EX)){
S3FS_PRN_ERR("failed to lock cache stat file(%s) - errno(%d)", path.c_str(), errno);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
close(fd);
fd = -1;
return false;
}
// seek top
if(0 != lseek(fd, 0, SEEK_SET)){
S3FS_PRN_ERR("failed to lseek cache stat file(%s) - errno(%d)", path.c_str(), errno);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
flock(fd, LOCK_UN);
close(fd);
fd = -1;
return false;
}
S3FS_PRN_DBG("file locked(%s - %s)", path.c_str(), sfile_path.c_str());
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
return true;
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
bool CacheFileStat::Release(void)
{
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(-1 == fd){
// already release
return true;
}
// unlock
if(-1 == flock(fd, LOCK_UN)){
S3FS_PRN_ERR("failed to unlock cache stat file(%s) - errno(%d)", path.c_str(), errno);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
return false;
}
S3FS_PRN_DBG("file unlocked(%s)", path.c_str());
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(-1 == close(fd)){
S3FS_PRN_ERR("failed to close cache stat file(%s) - errno(%d)", path.c_str(), errno);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
return false;
}
fd = -1;
return true;
}
//------------------------------------------------
// PageList methods
//------------------------------------------------
void PageList::FreeList(fdpage_list_t& list)
{
for(fdpage_list_t::iterator iter = list.begin(); iter != list.end(); iter = list.erase(iter)){
delete (*iter);
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
list.clear();
}
PageList::PageList(size_t size, bool is_loaded)
{
Init(size, is_loaded);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
PageList::~PageList()
{
Clear();
}
void PageList::Clear(void)
{
PageList::FreeList(pages);
}
bool PageList::Init(size_t size, bool is_loaded)
{
Clear();
fdpage* page = new fdpage(0, size, is_loaded);
pages.push_back(page);
return true;
}
size_t PageList::Size(void) const
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
{
2015-08-12 15:04:16 +00:00
if(pages.empty()){
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
return 0;
}
fdpage_list_t::const_reverse_iterator riter = pages.rbegin();
return static_cast<size_t>((*riter)->next());
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
bool PageList::Compress(void)
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
{
bool is_first = true;
bool is_last_loaded = false;
for(fdpage_list_t::iterator iter = pages.begin(); iter != pages.end(); ){
if(is_first){
is_first = false;
is_last_loaded = (*iter)->loaded;
++iter;
}else{
if(is_last_loaded == (*iter)->loaded){
fdpage_list_t::iterator biter = iter;
--biter;
(*biter)->bytes += (*iter)->bytes;
2016-01-28 03:16:06 +00:00
delete *iter;
iter = pages.erase(iter);
}else{
is_last_loaded = (*iter)->loaded;
++iter;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
}
}
return true;
}
bool PageList::Parse(off_t new_pos)
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
{
for(fdpage_list_t::iterator iter = pages.begin(); iter != pages.end(); ++iter){
if(new_pos == (*iter)->offset){
// nothing to do
return true;
}else if((*iter)->offset < new_pos && new_pos < (*iter)->next()){
fdpage* page = new fdpage((*iter)->offset, static_cast<size_t>(new_pos - (*iter)->offset), (*iter)->loaded);
(*iter)->bytes -= (new_pos - (*iter)->offset);
(*iter)->offset = new_pos;
pages.insert(iter, page);
return true;
}
}
return false;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
bool PageList::Resize(size_t size, bool is_loaded)
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
{
size_t total = Size();
if(0 == total){
Init(size, is_loaded);
}else if(total < size){
// add new area
fdpage* page = new fdpage(static_cast<off_t>(total), (size - total), is_loaded);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
pages.push_back(page);
}else if(size < total){
// cut area
for(fdpage_list_t::iterator iter = pages.begin(); iter != pages.end(); ){
if(static_cast<size_t>((*iter)->next()) <= size){
++iter;
}else{
if(size <= static_cast<size_t>((*iter)->offset)){
2016-01-28 03:16:06 +00:00
delete *iter;
iter = pages.erase(iter);
}else{
(*iter)->bytes = size - static_cast<size_t>((*iter)->offset);
}
}
}
}else{ // total == size
// nothing to do
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
// compress area
return Compress();
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
bool PageList::IsPageLoaded(off_t start, size_t size) const
{
for(fdpage_list_t::const_iterator iter = pages.begin(); iter != pages.end(); ++iter){
if((*iter)->end() < start){
continue;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
if(!(*iter)->loaded){
return false;
}
if(0 != size && static_cast<size_t>(start + size) <= static_cast<size_t>((*iter)->next())){
break;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
}
return true;
}
bool PageList::SetPageLoadedStatus(off_t start, size_t size, bool is_loaded, bool is_compress)
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
{
size_t now_size = Size();
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(now_size <= static_cast<size_t>(start)){
if(now_size < static_cast<size_t>(start)){
// add
Resize(static_cast<size_t>(start), false);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
Resize(static_cast<size_t>(start + size), is_loaded);
}else if(now_size <= static_cast<size_t>(start + size)){
// cut
Resize(static_cast<size_t>(start), false);
// add
Resize(static_cast<size_t>(start + size), is_loaded);
}else{
// start-size are inner pages area
// parse "start", and "start + size" position
Parse(start);
Parse(start + size);
// set loaded flag
for(fdpage_list_t::iterator iter = pages.begin(); iter != pages.end(); ++iter){
if((*iter)->end() < start){
continue;
}else if(static_cast<off_t>(start + size) <= (*iter)->offset){
break;
}else{
(*iter)->loaded = is_loaded;
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
}
// compress area
return (is_compress ? Compress() : true);
}
bool PageList::FindUnloadedPage(off_t start, off_t& resstart, size_t& ressize) const
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
{
for(fdpage_list_t::const_iterator iter = pages.begin(); iter != pages.end(); ++iter){
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(start <= (*iter)->end()){
if(!(*iter)->loaded){
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
resstart = (*iter)->offset;
ressize = (*iter)->bytes;
return true;
}
}
}
return false;
}
size_t PageList::GetTotalUnloadedPageSize(off_t start, size_t size) const
{
size_t restsize = 0;
off_t next = static_cast<off_t>(start + size);
for(fdpage_list_t::const_iterator iter = pages.begin(); iter != pages.end(); ++iter){
if((*iter)->next() <= start){
continue;
}
if(next <= (*iter)->offset){
break;
}
if((*iter)->loaded){
continue;
}
size_t tmpsize;
if((*iter)->offset <= start){
if((*iter)->next() <= next){
tmpsize = static_cast<size_t>((*iter)->next() - start);
}else{
tmpsize = static_cast<size_t>(next - start); // = size
}
}else{
if((*iter)->next() <= next){
tmpsize = static_cast<size_t>((*iter)->next() - (*iter)->offset); // = (*iter)->bytes
}else{
tmpsize = static_cast<size_t>(next - (*iter)->offset);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
}
restsize += tmpsize;
}
return restsize;
}
int PageList::GetUnloadedPages(fdpage_list_t& unloaded_list, off_t start, size_t size) const
{
// If size is 0, it means loading to end.
if(0 == size){
if(static_cast<size_t>(start) < Size()){
size = static_cast<size_t>(Size() - start);
}
}
off_t next = static_cast<off_t>(start + size);
for(fdpage_list_t::const_iterator iter = pages.begin(); iter != pages.end(); ++iter){
if((*iter)->next() <= start){
continue;
}
if(next <= (*iter)->offset){
break;
}
if((*iter)->loaded){
continue; // already loaded
}
// page area
off_t page_start = max((*iter)->offset, start);
off_t page_next = min((*iter)->next(), next);
size_t page_size = static_cast<size_t>(page_next - page_start);
// add list
fdpage_list_t::reverse_iterator riter = unloaded_list.rbegin();
if(riter != unloaded_list.rend() && (*riter)->next() == page_start){
// merge to before page
(*riter)->bytes += page_size;
}else{
fdpage* page = new fdpage(page_start, page_size, false);
unloaded_list.push_back(page);
}
}
return unloaded_list.size();
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
bool PageList::Serialize(CacheFileStat& file, bool is_output)
{
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(!file.Open()){
return false;
}
if(is_output){
//
// put to file
//
stringstream ssall;
ssall << Size();
2015-08-12 15:04:16 +00:00
for(fdpage_list_t::iterator iter = pages.begin(); iter != pages.end(); ++iter){
ssall << "\n" << (*iter)->offset << ":" << (*iter)->bytes << ":" << ((*iter)->loaded ? "1" : "0");
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
string strall = ssall.str();
if(0 >= pwrite(file.GetFd(), strall.c_str(), strall.length(), 0)){
S3FS_PRN_ERR("failed to write stats(%d)", errno);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
return false;
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}else{
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
//
// loading from file
//
struct stat st;
memset(&st, 0, sizeof(struct stat));
if(-1 == fstat(file.GetFd(), &st)){
S3FS_PRN_ERR("fstat is failed. errno(%d)", errno);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
return false;
}
if(0 >= st.st_size){
// nothing
Init(0, false);
return true;
}
char* ptmp;
if(NULL == (ptmp = (char*)calloc(st.st_size + 1, sizeof(char)))){
S3FS_PRN_CRIT("could not allocate memory.");
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
S3FS_FUSE_EXIT();
return false;
}
// read from file
if(0 >= pread(file.GetFd(), ptmp, st.st_size, 0)){
S3FS_PRN_ERR("failed to read stats(%d)", errno);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
free(ptmp);
return false;
}
string oneline;
stringstream ssall(ptmp);
// loaded
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
Clear();
// load(size)
if(!getline(ssall, oneline, '\n')){
S3FS_PRN_ERR("failed to parse stats.");
free(ptmp);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
return false;
}
size_t total = s3fs_strtoofft(oneline.c_str());
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
// load each part
bool is_err = false;
while(getline(ssall, oneline, '\n')){
string part;
stringstream ssparts(oneline);
// offset
if(!getline(ssparts, part, ':')){
is_err = true;
break;
}
off_t offset = s3fs_strtoofft(part.c_str());
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
// size
if(!getline(ssparts, part, ':')){
is_err = true;
break;
}
off_t size = s3fs_strtoofft(part.c_str());
// loaded
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(!getline(ssparts, part, ':')){
is_err = true;
break;
}
bool is_loaded = (1 == s3fs_strtoofft(part.c_str()) ? true : false);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
// add new area
SetPageLoadedStatus(offset, size, is_loaded);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
free(ptmp);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(is_err){
S3FS_PRN_ERR("failed to parse stats.");
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
Clear();
return false;
}
// check size
if(total != Size()){
S3FS_PRN_ERR("different size(%jd - %jd).", (intmax_t)total, (intmax_t)Size());
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
Clear();
return false;
}
}
return true;
}
void PageList::Dump(void)
{
int cnt = 0;
S3FS_PRN_DBG("pages = {");
2015-08-12 15:04:16 +00:00
for(fdpage_list_t::iterator iter = pages.begin(); iter != pages.end(); ++iter, ++cnt){
S3FS_PRN_DBG(" [%08d] -> {%014jd - %014zu : %s}", cnt, (intmax_t)((*iter)->offset), (*iter)->bytes, (*iter)->loaded ? "true" : "false");
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
S3FS_PRN_DBG("}");
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
//------------------------------------------------
// FdEntity class methods
//------------------------------------------------
int FdEntity::FillFile(int fd, unsigned char byte, size_t size, off_t start)
{
unsigned char bytes[1024 * 32]; // 32kb
memset(bytes, byte, min(sizeof(bytes), size));
for(ssize_t total = 0, onewrote = 0; static_cast<size_t>(total) < size; total += onewrote){
if(-1 == (onewrote = pwrite(fd, bytes, min(sizeof(bytes), (size - static_cast<size_t>(total))), start + total))){
S3FS_PRN_ERR("pwrite failed. errno(%d)", errno);
return -errno;
}
}
return 0;
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
//------------------------------------------------
// FdEntity methods
//------------------------------------------------
FdEntity::FdEntity(const char* tpath, const char* cpath)
: is_lock_init(false), refcnt(0), path(SAFESTRPTR(tpath)), cachepath(SAFESTRPTR(cpath)),
fd(-1), pfile(NULL), is_modify(false), size_orgmeta(0), upload_id(""), mp_start(0), mp_size(0)
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
{
try{
pthread_mutexattr_t attr;
pthread_mutexattr_init(&attr);
pthread_mutexattr_settype(&attr, S3FS_MUTEX_RECURSIVE); // recursive mutex
pthread_mutex_init(&fdent_lock, &attr);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
is_lock_init = true;
}catch(exception& e){
S3FS_PRN_CRIT("failed to init mutex");
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
}
FdEntity::~FdEntity()
{
Clear();
if(is_lock_init){
try{
pthread_mutex_destroy(&fdent_lock);
}catch(exception& e){
S3FS_PRN_CRIT("failed to destroy mutex");
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
is_lock_init = false;
}
}
void FdEntity::Clear(void)
{
AutoLock auto_lock(&fdent_lock);
if(pfile){
if(0 != cachepath.size()){
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
CacheFileStat cfstat(path.c_str());
if(!pagelist.Serialize(cfstat, true)){
S3FS_PRN_WARN("failed to save cache stat file(%s).", path.c_str());
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
}
fclose(pfile);
pfile = NULL;
fd = -1;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
pagelist.Init(0, false);
refcnt = 0;
path = "";
cachepath = "";
is_modify = false;
}
void FdEntity::Close(void)
{
S3FS_PRN_DBG("[path=%s][fd=%d][refcnt=%d]", path.c_str(), fd, (-1 != fd ? refcnt - 1 : refcnt));
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(-1 != fd){
AutoLock auto_lock(&fdent_lock);
if(0 < refcnt){
refcnt--;
}
if(0 == refcnt){
if(0 != cachepath.size()){
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
CacheFileStat cfstat(path.c_str());
if(!pagelist.Serialize(cfstat, true)){
S3FS_PRN_WARN("failed to save cache stat file(%s).", path.c_str());
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
}
fclose(pfile);
pfile = NULL;
fd = -1;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
}
}
int FdEntity::Dup(void)
{
S3FS_PRN_DBG("[path=%s][fd=%d][refcnt=%d]", path.c_str(), fd, (-1 != fd ? refcnt + 1 : refcnt));
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(-1 != fd){
AutoLock auto_lock(&fdent_lock);
refcnt++;
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
return fd;
}
// [NOTE]
// This method does not lock fdent_lock, because FdManager::fd_manager_lock
// is locked before calling.
//
int FdEntity::Open(headers_t* pmeta, ssize_t size, time_t time)
{
S3FS_PRN_DBG("[path=%s][fd=%d][size=%jd][time=%jd]", path.c_str(), fd, (intmax_t)size, (intmax_t)time);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(-1 != fd){
// already opened, needs to increment refcnt.
Dup();
return 0;
}
bool need_save_csf = false; // need to save(reset) cache stat file
bool is_truncate = false; // need to truncate
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(0 != cachepath.size()){
// using cache
// open cache and cache stat file, load page info.
CacheFileStat cfstat(path.c_str());
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(pagelist.Serialize(cfstat, false) && -1 != (fd = open(cachepath.c_str(), O_RDWR))){
// success to open cache file
struct stat st;
memset(&st, 0, sizeof(struct stat));
if(-1 == fstat(fd, &st)){
S3FS_PRN_ERR("fstat is failed. errno(%d)", errno);
fd = -1;
return (0 == errno ? -EIO : -errno);
}
// check size, st_size, loading stat file
if(-1 == size){
if(static_cast<size_t>(st.st_size) != pagelist.Size()){
pagelist.Resize(st.st_size, false);
need_save_csf = true; // need to update page info
}
size = static_cast<ssize_t>(st.st_size);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}else{
if(static_cast<size_t>(size) != pagelist.Size()){
pagelist.Resize(static_cast<size_t>(size), false);
need_save_csf = true; // need to update page info
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
if(static_cast<size_t>(size) != static_cast<size_t>(st.st_size)){
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
is_truncate = true;
}
}
}else{
// could not load stat file or open file
if(-1 == (fd = open(cachepath.c_str(), O_CREAT|O_RDWR|O_TRUNC, 0600))){
S3FS_PRN_ERR("failed to open file(%s). errno(%d)", cachepath.c_str(), errno);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
return (0 == errno ? -EIO : -errno);
}
need_save_csf = true; // need to update page info
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(-1 == size){
size = 0;
pagelist.Init(0, false);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}else{
pagelist.Resize(static_cast<size_t>(size), false);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
is_truncate = true;
}
}
// make file pointer(for being same tmpfile)
if(NULL == (pfile = fdopen(fd, "wb"))){
S3FS_PRN_ERR("failed to get fileno(%s). errno(%d)", cachepath.c_str(), errno);
close(fd);
fd = -1;
return (0 == errno ? -EIO : -errno);
}
}else{
// not using cache
// open temporary file
if(NULL == (pfile = tmpfile()) || -1 ==(fd = fileno(pfile))){
S3FS_PRN_ERR("failed to open tmp file. err(%d)", errno);
if(pfile){
fclose(pfile);
pfile = NULL;
}
return (0 == errno ? -EIO : -errno);
}
if(-1 == size){
size = 0;
pagelist.Init(0, false);
}else{
pagelist.Resize(static_cast<size_t>(size), false);
is_truncate = true;
}
}
// truncate cache(tmp) file
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(is_truncate){
if(0 != ftruncate(fd, static_cast<off_t>(size)) || 0 != fsync(fd)){
S3FS_PRN_ERR("ftruncate(%s) or fsync returned err(%d)", cachepath.c_str(), errno);
fclose(pfile);
pfile = NULL;
fd = -1;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
return (0 == errno ? -EIO : -errno);
}
}
// reset cache stat file
if(need_save_csf){
CacheFileStat cfstat(path.c_str());
if(!pagelist.Serialize(cfstat, true)){
S3FS_PRN_WARN("failed to save cache stat file(%s), but continue...", path.c_str());
}
}
// init internal data
refcnt = 1;
is_modify = false;
// set original headers and size in it.
if(pmeta){
orgmeta = *pmeta;
size_orgmeta = static_cast<size_t>(get_size(orgmeta));
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}else{
orgmeta.clear();
size_orgmeta = 0;
}
2015-11-29 15:53:53 +00:00
// set mtime(set "x-amz-meta-mtime" in orgmeta)
if(-1 != time){
if(0 != SetMtime(time)){
S3FS_PRN_ERR("failed to set mtime. errno(%d)", errno);
fclose(pfile);
pfile = NULL;
fd = -1;
return (0 == errno ? -EIO : -errno);
}
}
return 0;
}
// [NOTE]
// This method is called from olny nocopapi functions.
// So we do not check disk space for this option mode, if there is no enough
// disk space this method will be failed.
//
bool FdEntity::OpenAndLoadAll(headers_t* pmeta, size_t* size, bool force_load)
{
int result;
S3FS_PRN_INFO3("[path=%s][fd=%d]", path.c_str(), fd);
if(-1 == fd){
if(0 != Open(pmeta)){
return false;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
}
AutoLock auto_lock(&fdent_lock);
if(force_load){
SetAllStatusUnloaded();
}
//
// TODO: possibly do background for delay loading
//
if(0 != (result = Load())){
S3FS_PRN_ERR("could not download, result(%d)", result);
return false;
}
if(is_modify){
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
is_modify = false;
}
if(size){
*size = pagelist.Size();
}
return true;
}
bool FdEntity::GetStats(struct stat& st)
{
if(-1 == fd){
return false;
}
AutoLock auto_lock(&fdent_lock);
memset(&st, 0, sizeof(struct stat));
if(-1 == fstat(fd, &st)){
S3FS_PRN_ERR("fstat failed. errno(%d)", errno);
return false;
}
return true;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
int FdEntity::SetMtime(time_t time)
{
S3FS_PRN_INFO3("[path=%s][fd=%d][time=%jd]", path.c_str(), fd, (intmax_t)time);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(-1 == time){
return 0;
}
if(-1 != fd){
AutoLock auto_lock(&fdent_lock);
struct timeval tv[2];
tv[0].tv_sec = time;
tv[0].tv_usec= 0L;
tv[1].tv_sec = tv[0].tv_sec;
tv[1].tv_usec= 0L;
if(-1 == futimes(fd, tv)){
S3FS_PRN_ERR("futimes failed. errno(%d)", errno);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
return -errno;
}
}else if(0 < cachepath.size()){
// not opened file yet.
struct utimbuf n_mtime;
n_mtime.modtime = time;
n_mtime.actime = time;
if(-1 == utime(cachepath.c_str(), &n_mtime)){
S3FS_PRN_ERR("utime failed. errno(%d)", errno);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
return -errno;
}
}
orgmeta["x-amz-meta-mtime"] = str(time);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
return 0;
}
bool FdEntity::UpdateMtime(void)
{
struct stat st;
if(!GetStats(st)){
return false;
}
orgmeta["x-amz-meta-mtime"] = str(st.st_mtime);
return true;
}
bool FdEntity::GetSize(size_t& size)
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
{
if(-1 == fd){
return false;
}
AutoLock auto_lock(&fdent_lock);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
size = pagelist.Size();
return true;
}
bool FdEntity::SetMode(mode_t mode)
{
orgmeta["x-amz-meta-mode"] = str(mode);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
return true;
}
bool FdEntity::SetUId(uid_t uid)
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
{
orgmeta["x-amz-meta-uid"] = str(uid);
return true;
}
bool FdEntity::SetGId(gid_t gid)
{
orgmeta["x-amz-meta-gid"] = str(gid);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
return true;
2015-10-20 15:47:07 +00:00
}
2015-10-20 15:47:07 +00:00
bool FdEntity::SetContentType(const char* path)
{
if(!path){
return false;
}
orgmeta["Content-Type"] = S3fsCurl::LookupMimeType(string(path));
return true;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
bool FdEntity::SetAllStatus(bool is_loaded)
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
{
S3FS_PRN_INFO3("[path=%s][fd=%d][%s]", path.c_str(), fd, is_loaded ? "loaded" : "unloaded");
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(-1 == fd){
return false;
}
// [NOTE]
// this method is only internal use, and calling after locking.
// so do not lock now.
//
//AutoLock auto_lock(&fdent_lock);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
// get file size
struct stat st;
memset(&st, 0, sizeof(struct stat));
if(-1 == fstat(fd, &st)){
S3FS_PRN_ERR("fstat is failed. errno(%d)", errno);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
return false;
}
// Reinit
pagelist.Init(st.st_size, is_loaded);
return true;
}
int FdEntity::Load(off_t start, size_t size)
{
S3FS_PRN_DBG("[path=%s][fd=%d][offset=%jd][size=%jd]", path.c_str(), fd, (intmax_t)start, (intmax_t)size);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(-1 == fd){
return -EBADF;
}
AutoLock auto_lock(&fdent_lock);
int result = 0;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
// check loaded area & load
fdpage_list_t unloaded_list;
if(0 < pagelist.GetUnloadedPages(unloaded_list, start, size)){
for(fdpage_list_t::iterator iter = unloaded_list.begin(); iter != unloaded_list.end(); ++iter){
if(0 != size && static_cast<size_t>(start + size) <= static_cast<size_t>((*iter)->offset)){
// reached end
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
break;
}
// check loading size
size_t need_load_size = 0;
if(static_cast<size_t>((*iter)->offset) < size_orgmeta){
// original file size(on S3) is smaller than request.
need_load_size = (static_cast<size_t>((*iter)->next()) <= size_orgmeta ? (*iter)->bytes : (size_orgmeta - (*iter)->offset));
}
size_t over_size = (*iter)->bytes - need_load_size;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
// download
if(static_cast<size_t>(2 * S3fsCurl::GetMultipartSize()) < need_load_size && !nomultipart){ // default 20MB
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
// parallel request
// Additional time is needed for large files
time_t backup = 0;
if(120 > S3fsCurl::GetReadwriteTimeout()){
backup = S3fsCurl::SetReadwriteTimeout(120);
}
result = S3fsCurl::ParallelGetObjectRequest(path.c_str(), fd, (*iter)->offset, need_load_size);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(0 != backup){
S3fsCurl::SetReadwriteTimeout(backup);
}
}else{
// single request
S3fsCurl s3fscurl;
result = s3fscurl.GetObjectRequest(path.c_str(), fd, (*iter)->offset, need_load_size);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
if(0 != result){
break;
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
// initialize for the area of over original size
if(0 < over_size){
if(0 != (result = FdEntity::FillFile(fd, 0, over_size, (*iter)->offset + need_load_size))){
S3FS_PRN_ERR("failed to fill rest bytes for fd(%d). errno(%d)", fd, result);
break;
}
// set modify flag
is_modify = false;
}
// Set loaded flag
pagelist.SetPageLoadedStatus((*iter)->offset, static_cast<off_t>((*iter)->bytes), true);
}
PageList::FreeList(unloaded_list);
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
return result;
}
// [NOTE]
// At no disk space for caching object.
// This method is downloading by dividing an object of the specified range
// and uploading by multipart after finishing downloading it.
//
// [NOTICE]
// Need to lock before calling this method.
//
int FdEntity::NoCacheLoadAndPost(off_t start, size_t size)
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
{
int result = 0;
S3FS_PRN_INFO3("[path=%s][fd=%d][offset=%jd][size=%jd]", path.c_str(), fd, (intmax_t)start, (intmax_t)size);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(-1 == fd){
return -EBADF;
}
// [NOTE]
// This method calling means that the cache file is never used no more.
//
if(0 != cachepath.size()){
// remove cache files(and cache stat file)
FdManager::DeleteCacheFile(path.c_str());
// cache file path does not use no more.
cachepath.erase();
}
// Change entity key in manager mapping
FdManager::get()->ChangeEntityToTempPath(this, path.c_str());
// open temporary file
FILE* ptmpfp;
int tmpfd;
if(NULL == (ptmpfp = tmpfile()) || -1 ==(tmpfd = fileno(ptmpfp))){
S3FS_PRN_ERR("failed to open tmp file. err(%d)", errno);
if(ptmpfp){
fclose(ptmpfp);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
return (0 == errno ? -EIO : -errno);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
// loop uploading by multipart
for(fdpage_list_t::iterator iter = pagelist.pages.begin(); iter != pagelist.pages.end(); ++iter){
if((*iter)->end() < start){
continue;
}
if(0 != size && static_cast<size_t>(start + size) <= static_cast<size_t>((*iter)->offset)){
break;
}
// download earch multipart size(default 10MB) in unit
for(size_t oneread = 0, totalread = ((*iter)->offset < start ? start : 0); totalread < (*iter)->bytes; totalread += oneread){
int upload_fd = fd;
off_t offset = (*iter)->offset + totalread;
oneread = min(((*iter)->bytes - totalread), static_cast<size_t>(S3fsCurl::GetMultipartSize()));
// check rest size is over minimum part size
//
// [NOTE]
// If the final part size is smaller than 5MB, it is not allowed by S3 API.
// For this case, if the previous part of the final part is not over 5GB,
// we incorporate the final part to the previous part. If the previous part
// is over 5GB, we want to even out the last part and the previous part.
//
if(((*iter)->bytes - totalread - oneread) < MIN_MULTIPART_SIZE){
if(FIVE_GB < ((*iter)->bytes - totalread)){
oneread = ((*iter)->bytes - totalread) / 2;
}else{
oneread = ((*iter)->bytes - totalread);
}
}
if(!(*iter)->loaded){
//
// loading or initializing
//
upload_fd = tmpfd;
// load offset & size
size_t need_load_size = 0;
if(size_orgmeta <= static_cast<size_t>(offset)){
// all area is over of original size
need_load_size = 0;
}else{
if(size_orgmeta < (offset + oneread)){
// original file size(on S3) is smaller than request.
need_load_size = size_orgmeta - offset;
}else{
need_load_size = oneread;
}
}
size_t over_size = oneread - need_load_size;
// [NOTE]
// truncate file to zero and set length to part offset + size
// after this, file length is (offset + size), but file does not use any disk space.
//
if(-1 == ftruncate(tmpfd, 0) || -1 == ftruncate(tmpfd, (offset + oneread))){
S3FS_PRN_ERR("failed to tatic_cast<size_t>runcate temporary file(%d).", tmpfd);
result = -EIO;
break;
}
// single area get request
if(0 < need_load_size){
S3fsCurl s3fscurl;
if(0 != (result = s3fscurl.GetObjectRequest(path.c_str(), tmpfd, offset, oneread))){
S3FS_PRN_ERR("failed to get object(start=%zd, size=%zu) for file(%d).", offset, oneread, tmpfd);
break;
}
}
// initialize fd without loading
if(0 < over_size){
if(0 != (result = FdEntity::FillFile(tmpfd, 0, over_size, offset + need_load_size))){
S3FS_PRN_ERR("failed to fill rest bytes for fd(%d). errno(%d)", tmpfd, result);
break;
}
// set modify flag
is_modify = false;
}
}else{
// already loaded area
}
// single area upload by multipart post
if(0 != (result = NoCacheMultipartPost(upload_fd, offset, oneread))){
S3FS_PRN_ERR("failed to multipart post(start=%zd, size=%zu) for file(%d).", offset, oneread, upload_fd);
break;
}
}
if(0 != result){
break;
}
// set loaded flag
if(!(*iter)->loaded){
if((*iter)->offset < start){
fdpage* page = new fdpage((*iter)->offset, static_cast<size_t>(start - (*iter)->offset), (*iter)->loaded);
(*iter)->bytes -= (start - (*iter)->offset);
(*iter)->offset = start;
pagelist.pages.insert(iter, page);
}
if(0 != size && static_cast<size_t>(start + size) < static_cast<size_t>((*iter)->next())){
fdpage* page = new fdpage((*iter)->offset, static_cast<size_t>((start + size) - (*iter)->offset), true);
(*iter)->bytes -= static_cast<size_t>((start + size) - (*iter)->offset);
(*iter)->offset = start + size;
pagelist.pages.insert(iter, page);
}else{
(*iter)->loaded = true;
}
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
if(0 == result){
// compress pagelist
pagelist.Compress();
// fd data do empty
if(-1 == ftruncate(fd, 0)){
S3FS_PRN_ERR("failed to truncate file(%d), but continue...", fd);
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
// close temporary
fclose(ptmpfp);
return result;
}
// [NOTE]
// At no disk space for caching object.
// This method is starting multipart uploading.
//
int FdEntity::NoCachePreMultipartPost(void)
{
// initialize multipart upload values
upload_id.erase();
etaglist.clear();
S3fsCurl s3fscurl(true);
int result;
if(0 != (result = s3fscurl.PreMultipartPostRequest(path.c_str(), orgmeta, upload_id, false))){
return result;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
s3fscurl.DestroyCurlHandle();
return 0;
}
// [NOTE]
// At no disk space for caching object.
// This method is uploading one part of multipart.
//
int FdEntity::NoCacheMultipartPost(int tgfd, off_t start, size_t size)
{
if(-1 == tgfd || upload_id.empty()){
S3FS_PRN_ERR("Need to initialize for multipart post.");
return -EIO;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
S3fsCurl s3fscurl(true);
return s3fscurl.MultipartUploadRequest(upload_id, path.c_str(), tgfd, start, size, etaglist);
}
// [NOTE]
// At no disk space for caching object.
// This method is finishing multipart uploading.
//
int FdEntity::NoCacheCompleteMultipartPost(void)
{
if(upload_id.empty() || etaglist.empty()){
S3FS_PRN_ERR("There is no upload id or etag list.");
return -EIO;
}
S3fsCurl s3fscurl(true);
int result;
if(0 != (result = s3fscurl.CompleteMultipartPostRequest(path.c_str(), upload_id, etaglist))){
return result;
}
s3fscurl.DestroyCurlHandle();
// reset values
upload_id.erase();
etaglist.clear();
mp_start = 0;
mp_size = 0;
return 0;
}
int FdEntity::RowFlush(const char* tpath, bool force_sync)
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
{
int result;
S3FS_PRN_INFO3("[tpath=%s][path=%s][fd=%d]", SAFESTRPTR(tpath), path.c_str(), fd);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(-1 == fd){
return -EBADF;
}
AutoLock auto_lock(&fdent_lock);
if(!force_sync && !is_modify){
// nothing to update.
return 0;
}
// If there is no loading all of the area, loading all area.
size_t restsize = pagelist.GetTotalUnloadedPageSize();
if(0 < restsize){
if(0 == upload_id.length()){
// check disk space
if(FdManager::IsSafeDiskSpace(NULL, restsize)){
// enough disk space
// Load all unitialized area
if(0 != (result = Load())){
S3FS_PRN_ERR("failed to upload all area(errno=%d)", result);
return static_cast<ssize_t>(result);
}
}else{
// no enough disk space
// upload all by multipart uploading
if(0 != (result = NoCacheLoadAndPost())){
S3FS_PRN_ERR("failed to upload all area by multipart uploading(errno=%d)", result);
return static_cast<ssize_t>(result);
}
}
}else{
// alreay start miltipart uploading
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
if(0 == upload_id.length()){
// normal uploading
/*
* Make decision to do multi upload (or not) based upon file size
*
* According to the AWS spec:
* - 1 to 10,000 parts are allowed
* - minimum size of parts is 5MB (expect for the last part)
*
* For our application, we will define minimum part size to be 10MB (10 * 2^20 Bytes)
* minimum file size will be 64 GB - 2 ** 36
*
* Initially uploads will be done serially
*
* If file is > 20MB, then multipart will kick in
*/
if(pagelist.Size() > static_cast<size_t>(MAX_MULTIPART_CNT * S3fsCurl::GetMultipartSize())){
// close f ?
return -ENOTSUP;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
// seek to head of file.
if(0 != lseek(fd, 0, SEEK_SET)){
S3FS_PRN_ERR("lseek error(%d)", errno);
return -errno;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
if(pagelist.Size() >= static_cast<size_t>(2 * S3fsCurl::GetMultipartSize()) && !nomultipart){ // default 20MB
// Additional time is needed for large files
time_t backup = 0;
if(120 > S3fsCurl::GetReadwriteTimeout()){
backup = S3fsCurl::SetReadwriteTimeout(120);
}
result = S3fsCurl::ParallelMultipartUploadRequest(tpath ? tpath : path.c_str(), orgmeta, fd);
if(0 != backup){
S3fsCurl::SetReadwriteTimeout(backup);
}
}else{
S3fsCurl s3fscurl(true);
result = s3fscurl.PutRequest(tpath ? tpath : path.c_str(), orgmeta, fd);
}
// seek to head of file.
if(0 == result && 0 != lseek(fd, 0, SEEK_SET)){
S3FS_PRN_ERR("lseek error(%d)", errno);
return -errno;
}
}else{
// upload rest data
if(0 < mp_size){
if(0 != (result = NoCacheMultipartPost(fd, mp_start, mp_size))){
S3FS_PRN_ERR("failed to multipart post(start=%zd, size=%zu) for file(%d).", mp_start, mp_size, fd);
return result;
}
mp_start = 0;
mp_size = 0;
}
// complete multipart uploading.
if(0 != (result = NoCacheCompleteMultipartPost())){
S3FS_PRN_ERR("failed to complete(finish) multipart post for file(%d).", fd);
return result;
}
// truncate file to zero
if(-1 == ftruncate(fd, 0)){
// So the file has already been removed, skip error.
S3FS_PRN_ERR("failed to truncate file(%d) to zero, but continue...", fd);
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
if(0 == result){
is_modify = false;
}
return result;
}
ssize_t FdEntity::Read(char* bytes, off_t start, size_t size, bool force_load)
{
S3FS_PRN_DBG("[path=%s][fd=%d][offset=%jd][size=%zu]", path.c_str(), fd, (intmax_t)start, size);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(-1 == fd){
return -EBADF;
}
AutoLock auto_lock(&fdent_lock);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(force_load){
pagelist.SetPageLoadedStatus(start, size, false);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
int result;
ssize_t rsize;
// check disk space
if(0 < pagelist.GetTotalUnloadedPageSize(start, size)){
if(!FdManager::IsSafeDiskSpace(NULL, size)){
// [NOTE]
// If the area of this entity fd used can be released, try to do it.
// But If file data is updated, we can not even release of fd.
// Fundamentally, this method will fail as long as the disk capacity
// is not ensured.
//
if(!is_modify){
// try to clear all cache for this fd.
pagelist.Init(pagelist.Size(), false);
if(-1 == ftruncate(fd, 0) || -1 == ftruncate(fd, pagelist.Size())){
S3FS_PRN_ERR("failed to truncate temporary file(%d).", fd);
return -ENOSPC;
}
}
}
// load size(for prefetch)
size_t load_size = size;
if(static_cast<size_t>(start + size) < pagelist.Size()){
size_t prefetch_max_size = max(size, static_cast<size_t>(S3fsCurl::GetMultipartSize() * S3fsCurl::GetMaxParallelCount()));
if(static_cast<size_t>(start + prefetch_max_size) < pagelist.Size()){
load_size = prefetch_max_size;
}else{
load_size = static_cast<size_t>(pagelist.Size() - start);
}
}
// Loading
if(0 < size && 0 != (result = Load(start, load_size))){
S3FS_PRN_ERR("could not download. start(%jd), size(%zu), errno(%d)", (intmax_t)start, size, result);
return -EIO;
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
// Reading
if(-1 == (rsize = pread(fd, bytes, size, start))){
S3FS_PRN_ERR("pread failed. errno(%d)", errno);
return -errno;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
return rsize;
}
ssize_t FdEntity::Write(const char* bytes, off_t start, size_t size)
{
S3FS_PRN_DBG("[path=%s][fd=%d][offset=%jd][size=%zu]", path.c_str(), fd, (intmax_t)start, size);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(-1 == fd){
return -EBADF;
}
AutoLock auto_lock(&fdent_lock);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
int result;
ssize_t wsize;
if(0 == upload_id.length()){
// check disk space
size_t restsize = pagelist.GetTotalUnloadedPageSize(0, start) + size;
if(FdManager::IsSafeDiskSpace(NULL, restsize)){
// enough disk space
// Load unitialized area which starts from 0 to (start + size) before writing.
if(0 < start && 0 != (result = Load(0, static_cast<size_t>(start)))){
S3FS_PRN_ERR("failed to load uninitialized area before writing(errno=%d)", result);
return static_cast<ssize_t>(result);
}
}else{
// no enough disk space
if(0 != (result = NoCachePreMultipartPost())){
S3FS_PRN_ERR("failed to switch multipart uploading with no cache(errno=%d)", result);
return static_cast<ssize_t>(result);
}
// start multipart uploading
if(0 != (result = NoCacheLoadAndPost(0, start))){
S3FS_PRN_ERR("failed to load uninitialized area and multipart uploading it(errno=%d)", result);
return static_cast<ssize_t>(result);
}
mp_start = start;
mp_size = 0;
}
}else{
// alreay start miltipart uploading
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
// Writing
if(-1 == (wsize = pwrite(fd, bytes, size, start))){
S3FS_PRN_ERR("pwrite failed. errno(%d)", errno);
return -errno;
}
if(!is_modify){
is_modify = true;
}
if(0 < wsize){
pagelist.SetPageLoadedStatus(start, static_cast<size_t>(wsize), true);
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
// check multipart uploading
if(0 < upload_id.length()){
mp_size += static_cast<size_t>(wsize);
if(static_cast<size_t>(S3fsCurl::GetMultipartSize()) <= mp_size){
// over one multipart size
if(0 != (result = NoCacheMultipartPost(fd, mp_start, mp_size))){
S3FS_PRN_ERR("failed to multipart post(start=%zd, size=%zu) for file(%d).", mp_start, mp_size, fd);
return result;
}
// [NOTE]
// truncate file to zero and set length to part offset + size
// after this, file length is (offset + size), but file does not use any disk space.
//
if(-1 == ftruncate(fd, 0) || -1 == ftruncate(fd, (mp_start + mp_size))){
S3FS_PRN_ERR("failed to truncate file(%d).", fd);
return -EIO;
}
mp_start += mp_size;
mp_size = 0;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
}
return wsize;
}
//------------------------------------------------
// FdManager symbol
//------------------------------------------------
// [NOTE]
// NOCACHE_PATH_PREFIX symbol needs for not using cache mode.
// Now s3fs I/F functions in s3fs.cpp has left the processing
// to FdManager and FdEntity class. FdManager class manages
2015-07-10 18:50:40 +00:00
// the list of local file stat and file descriptor in conjunction
// with the FdEntity class.
// When s3fs is not using local cache, it means FdManager must
2015-07-10 18:50:40 +00:00
// return new temporary file descriptor at each opening it.
// Then FdManager caches fd by key which is dummy file path
// instead of real file path.
// This process may not be complete, but it is easy way can
// be realized.
//
#define NOCACHE_PATH_PREFIX_FORM " __S3FS_UNEXISTED_PATH_%lx__ / " // important space words for simply
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
//------------------------------------------------
// FdManager class valiable
//------------------------------------------------
FdManager FdManager::singleton;
pthread_mutex_t FdManager::fd_manager_lock;
bool FdManager::is_lock_init(false);
string FdManager::cache_dir("");
size_t FdManager::free_disk_space = 0;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
//------------------------------------------------
// FdManager class methods
//------------------------------------------------
bool FdManager::SetCacheDir(const char* dir)
{
if(!dir || '\0' == dir[0]){
cache_dir = "";
}else{
cache_dir = dir;
}
return true;
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
bool FdManager::DeleteCacheDirectory(void)
{
if(0 == FdManager::cache_dir.size()){
return true;
}
string cache_dir;
if(!FdManager::MakeCachePath(NULL, cache_dir, false)){
return false;
}
return delete_files_in_dir(cache_dir.c_str(), true);
}
int FdManager::DeleteCacheFile(const char* path)
{
S3FS_PRN_INFO3("[path=%s]", SAFESTRPTR(path));
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(!path){
return -EIO;
}
if(0 == FdManager::cache_dir.size()){
return 0;
}
string cache_path = "";
if(!FdManager::MakeCachePath(path, cache_path, false)){
return 0;
}
int result = 0;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(0 != unlink(cache_path.c_str())){
if(ENOENT == errno){
S3FS_PRN_DBG("failed to delete file(%s): errno=%d", path, errno);
}else{
S3FS_PRN_ERR("failed to delete file(%s): errno=%d", path, errno);
}
result = -errno;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
if(!CacheFileStat::DeleteCacheFileStat(path)){
if(ENOENT == errno){
S3FS_PRN_DBG("failed to delete stat file(%s): errno=%d", path, errno);
}else{
S3FS_PRN_ERR("failed to delete stat file(%s): errno=%d", path, errno);
}
if(0 != errno){
result = -errno;
}else{
result = -EIO;
}
}
return result;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
bool FdManager::MakeCachePath(const char* path, string& cache_path, bool is_create_dir)
{
if(0 == FdManager::cache_dir.size()){
cache_path = "";
return true;
}
string resolved_path(FdManager::cache_dir + "/" + bucket);
if(is_create_dir){
int result;
if(0 != (result = mkdirp(resolved_path + mydirname(path), 0777))){
S3FS_PRN_ERR("failed to create dir(%s) by errno(%d).", path, result);
return false;
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
if(!path || '\0' == path[0]){
cache_path = resolved_path;
}else{
cache_path = resolved_path + SAFESTRPTR(path);
}
return true;
}
bool FdManager::CheckCacheTopDir(void)
{
if(0 == FdManager::cache_dir.size()){
return true;
}
string toppath(FdManager::cache_dir + "/" + bucket);
return check_exist_dir_permission(toppath.c_str());
}
bool FdManager::MakeRandomTempPath(const char* path, string& tmppath)
{
char szBuff[64];
sprintf(szBuff, NOCACHE_PATH_PREFIX_FORM, random()); // warry for performance, but maybe don't warry.
tmppath = szBuff;
tmppath += path ? path : "";
return true;
}
size_t FdManager::SetEnsureFreeDiskSpace(size_t size)
{
size_t old = FdManager::free_disk_space;
if(0 == size){
if(0 == FdManager::free_disk_space){
FdManager::free_disk_space = static_cast<size_t>(S3fsCurl::GetMultipartSize() * S3fsCurl::GetMaxParallelCount());
}
}else{
if(0 == FdManager::free_disk_space){
FdManager::free_disk_space = max(size, static_cast<size_t>(S3fsCurl::GetMultipartSize() * S3fsCurl::GetMaxParallelCount()));
}else{
if(static_cast<size_t>(S3fsCurl::GetMultipartSize() * S3fsCurl::GetMaxParallelCount()) <= size){
FdManager::free_disk_space = size;
}
}
}
return old;
}
fsblkcnt_t FdManager::GetFreeDiskSpace(const char* path)
{
struct statvfs vfsbuf;
string ctoppath;
if(0 < FdManager::cache_dir.size()){
ctoppath = FdManager::cache_dir + "/";
}else{
ctoppath = TMPFILE_DIR_0PATH "/";
}
if(path && '\0' != *path){
ctoppath += path;
}else{
ctoppath += ".";
}
if(-1 == statvfs(ctoppath.c_str(), &vfsbuf)){
S3FS_PRN_ERR("could not get vfs stat by errno(%d)", errno);
return 0;
}
return (vfsbuf.f_bavail * vfsbuf.f_bsize);
}
bool FdManager::IsSafeDiskSpace(const char* path, size_t size)
{
fsblkcnt_t fsize = FdManager::GetFreeDiskSpace(path);
return ((size + FdManager::GetEnsureFreeDiskSpace()) <= fsize);
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
//------------------------------------------------
// FdManager methods
//------------------------------------------------
FdManager::FdManager()
{
if(this == FdManager::get()){
try{
pthread_mutex_init(&FdManager::fd_manager_lock, NULL);
FdManager::is_lock_init = true;
}catch(exception& e){
FdManager::is_lock_init = false;
S3FS_PRN_CRIT("failed to init mutex");
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
}else{
assert(false);
}
}
FdManager::~FdManager()
{
if(this == FdManager::get()){
2015-08-12 15:04:16 +00:00
for(fdent_map_t::iterator iter = fent.begin(); fent.end() != iter; ++iter){
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
FdEntity* ent = (*iter).second;
delete ent;
}
fent.clear();
if(FdManager::is_lock_init){
try{
pthread_mutex_destroy(&FdManager::fd_manager_lock);
}catch(exception& e){
S3FS_PRN_CRIT("failed to init mutex");
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
FdManager::is_lock_init = false;
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}else{
assert(false);
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
FdEntity* FdManager::GetFdEntity(const char* path, int existfd)
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
{
S3FS_PRN_INFO3("[path=%s][fd=%d]", SAFESTRPTR(path), existfd);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(!path || '\0' == path[0]){
return NULL;
}
AutoLock auto_lock(&FdManager::fd_manager_lock);
fdent_map_t::iterator iter = fent.find(string(path));
if(fent.end() != iter && (-1 == existfd || (*iter).second->GetFd() == existfd)){
return (*iter).second;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
if(-1 != existfd){
2015-08-12 15:04:16 +00:00
for(iter = fent.begin(); iter != fent.end(); ++iter){
if((*iter).second && (*iter).second->GetFd() == existfd){
// found opend fd in map
if(0 == strcmp((*iter).second->GetPath(), path)){
return (*iter).second;
}
2015-07-10 18:50:40 +00:00
// found fd, but it is used another file(file descriptor is recycled)
// so returns NULL.
break;
}
}
}
return NULL;
}
FdEntity* FdManager::Open(const char* path, headers_t* pmeta, ssize_t size, time_t time, bool force_tmpfile, bool is_create)
{
S3FS_PRN_DBG("[path=%s][size=%jd][time=%jd]", SAFESTRPTR(path), (intmax_t)size, (intmax_t)time);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(!path || '\0' == path[0]){
return NULL;
}
AutoLock auto_lock(&FdManager::fd_manager_lock);
fdent_map_t::iterator iter = fent.find(string(path));
FdEntity* ent;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if(fent.end() != iter){
// found
ent = (*iter).second;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}else if(is_create){
// not found
string cache_path = "";
if(!force_tmpfile && !FdManager::MakeCachePath(path, cache_path, true)){
S3FS_PRN_ERR("failed to make cache path for object(%s).", path);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
return NULL;
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
// make new obj
ent = new FdEntity(path, cache_path.c_str());
if(0 < cache_path.size()){
// using cache
fent[string(path)] = ent;
}else{
// not using cache, so the key of fdentity is set not really existsing path.
// (but not strictly unexisting path.)
//
// [NOTE]
// The reason why this process here, please look at the definition of the
// comments of NOCACHE_PATH_PREFIX_FORM symbol.
//
string tmppath("");
FdManager::MakeRandomTempPath(path, tmppath);
fent[tmppath] = ent;
}
}else{
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
return NULL;
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
// open
if(-1 == ent->Open(pmeta, size, time)){
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
return NULL;
}
return ent;
}
2015-11-29 15:53:53 +00:00
FdEntity* FdManager::ExistOpen(const char* path, int existfd, bool ignore_existfd)
{
2015-11-29 15:53:53 +00:00
S3FS_PRN_DBG("[path=%s][fd=%d][ignore_existfd=%s]", SAFESTRPTR(path), existfd, ignore_existfd ? "true" : "false");
// search by real path
FdEntity* ent = Open(path, NULL, -1, -1, false, false);
2015-11-29 15:53:53 +00:00
if(!ent && (ignore_existfd || (-1 != existfd))){
// search from all fdentity because of not using cache.
AutoLock auto_lock(&FdManager::fd_manager_lock);
2015-08-12 15:04:16 +00:00
for(fdent_map_t::iterator iter = fent.begin(); iter != fent.end(); ++iter){
2015-11-29 15:53:53 +00:00
if((*iter).second && (*iter).second->IsOpen() && (ignore_existfd || ((*iter).second->GetFd() == existfd))){
// found opend fd in map
if(0 == strcmp((*iter).second->GetPath(), path)){
ent = (*iter).second;
ent->Dup();
}else{
2015-07-10 18:50:40 +00:00
// found fd, but it is used another file(file descriptor is recycled)
// so returns NULL.
}
break;
}
}
}
return ent;
}
void FdManager::Rename(const std::string &from, const std::string &to)
{
fdent_map_t::iterator iter = fent.find(from);
if(fent.end() != iter){
// found
S3FS_PRN_DBG("[from=%s][to=%s]", from.c_str(), to.c_str());
FdEntity* ent = (*iter).second;
fent.erase(iter);
ent->SetPath(to);
fent[to] = ent;
}
}
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
bool FdManager::Close(FdEntity* ent)
{
S3FS_PRN_DBG("[ent->file=%s][ent->fd=%d]", ent ? ent->GetPath() : "", ent ? ent->GetFd() : -1);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
AutoLock auto_lock(&FdManager::fd_manager_lock);
2015-08-12 15:04:16 +00:00
for(fdent_map_t::iterator iter = fent.begin(); iter != fent.end(); ++iter){
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
if((*iter).second == ent){
ent->Close();
if(!ent->IsOpen()){
delete (*iter).second;
fent.erase(iter);
return true;
}
}
}
return false;
}
bool FdManager::ChangeEntityToTempPath(FdEntity* ent, const char* path)
{
AutoLock auto_lock(&FdManager::fd_manager_lock);
2015-10-20 15:47:07 +00:00
for(fdent_map_t::iterator iter = fent.begin(); iter != fent.end(); ){
if((*iter).second == ent){
2015-10-20 15:47:07 +00:00
fent.erase(iter++);
string tmppath("");
FdManager::MakeRandomTempPath(path, tmppath);
fent[tmppath] = ent;
2015-10-20 15:47:07 +00:00
}else{
++iter;
}
}
return false;
}
2014-09-07 15:08:27 +00:00
/*
* Local variables:
* tab-width: 4
* c-basic-offset: 4
* End:
* vim600: noet sw=4 ts=4 fdm=marker
* vim<600: noet sw=4 ts=4
*/