2013-05-28 05:54:09 +00:00
/*
* s3fs - FUSE - based file system backed by Amazon S3
*
2017-05-07 11:24:17 +00:00
* Copyright ( C ) 2007 Takeshi Nakatani < ggtakec . com >
2013-05-28 05:54:09 +00:00
*
* This program is free software ; you can redistribute it and / or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation ; either version 2
* of the License , or ( at your option ) any later version .
*
* This program is distributed in the hope that it will be useful ,
* but WITHOUT ANY WARRANTY ; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE . See the
* GNU General Public License for more details .
*
* You should have received a copy of the GNU General Public License
* along with this program ; if not , write to the Free Software
* Foundation , Inc . , 51 Franklin Street , Fifth Floor , Boston , MA 02110 - 1301 , USA .
*/
2019-07-12 10:33:53 +00:00
# include <cstdio>
# include <cstdlib>
# include <cerrno>
2020-08-22 12:40:53 +00:00
# include <unistd.h>
# include <sys/types.h>
# include <dirent.h>
2018-01-29 11:19:39 +00:00
2020-08-22 12:40:53 +00:00
# include "common.h"
# include "s3fs.h"
# include "fdcache.h"
2021-05-23 16:28:50 +00:00
# include "fdcache_pseudofd.h"
2020-08-22 12:40:53 +00:00
# include "s3fs_util.h"
2021-02-04 01:41:29 +00:00
# include "s3fs_logger.h"
2020-08-22 12:40:53 +00:00
# include "string_util.h"
# include "autolock.h"
Changes codes for performance(part 3)
* Summay
This revision includes big change about temporary file and local cache file.
By this big change, s3fs works with good performance when s3fs opens/
closes/syncs/reads object.
I made a big change about the handling about temporary file and local cache
file to do this implementation.
* Detail
1) About temporary file(local file)
s3fs uses a temporary file on local file system when s3fs does download/
upload/open/seek object on S3.
After this revision, s3fs calls ftruncate() function when s3fs makes the
temporary file.
In this way s3fs can set a file size of precisely length without downloading.
(Notice - ftruncate function is for XSI-compliant systems, so that possibly
you have a problem on non-XSI-compliant systems.)
By this change, s3fs can download a part of a object by requesting with
"Range" http header. It seems like downloading by each block unit.
The default block(part) size is 50MB, it is caused the result which is default
parallel requests count(5) by default multipart upload size(10MB).
If you need to change this block size, you can change by new option
"fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes.
So that, you have to take care about that fdcache.cpp(and fdcache.h) were
changed a lot.
2) About local cache
Local cache files which are in directory specified by "use_cache" option do
not have always all of object data.
This cause is that s3fs uses ftruncate function and reads(writes) each block
unit of a temporary file.
s3fs manages each block unit's status which are "downloaded area" or "not".
For this status, s3fs makes new temporary file in cache directory which is
specified by "use_cache" option. This status files is in a directory which is
named "<use_cache sirectory>/.<bucket_name>/".
When s3fs opens this status file, s3fs locks this file for exclusive control by
calling flock function. You need to take care about this, the status files can
not be laid on network drive(like NFS).
This revision changes about file open mode, s3fs always opens a local cache
file and each status file with writable mode.
Last, this revision adds new option "del_cache", this option means that s3fs
deletes all local cache file when s3fs starts and exits.
3) Uploading
When s3fs writes data to file descriptor through FUSE request, old s3fs
revision downloads all of the object. But new revision does not download all,
it downloads only small percial area(some block units) including writing data
area.
And when s3fs closes or flushes the file descriptor, s3fs downloads other area
which is not downloaded from server. After that, s3fs uploads all of data.
Already r456 revision has parallel upload function, then this revision with
r456 and r457 are very big change for performance.
4) Downloading
By changing a temporary file and a local cache file, when s3fs downloads a
object, it downloads only the required range(some block units).
And s3fs downloads units by parallel GET request, it is same as a case of
uploading. (Maximum parallel request count and each download size are
specified same parameters for uploading.)
In the new revision, when s3fs opens file, s3fs returns file descriptor soon.
Because s3fs only opens(makes) the file descriptor with no downloading
data. And when s3fs reads a data, s3fs downloads only some block unit
including specified area.
This result is good for performance.
5) Changes option name
The option "parallel_upload" which added at r456 is changed to new option
name as "parallel_count". This reason is this option value is not only used by
uploading object, but a uploading object also uses this option. (For a while,
you can use old option name "parallel_upload" for compatibility.)
git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
2020-08-22 12:40:53 +00:00
//------------------------------------------------
// Symbols
//------------------------------------------------
# define TMPFILE_FOR_CHECK_HOLE " / tmp / .s3fs_hole_check.tmp"
Changes codes for performance(part 3)
* Summay
This revision includes big change about temporary file and local cache file.
By this big change, s3fs works with good performance when s3fs opens/
closes/syncs/reads object.
I made a big change about the handling about temporary file and local cache
file to do this implementation.
* Detail
1) About temporary file(local file)
s3fs uses a temporary file on local file system when s3fs does download/
upload/open/seek object on S3.
After this revision, s3fs calls ftruncate() function when s3fs makes the
temporary file.
In this way s3fs can set a file size of precisely length without downloading.
(Notice - ftruncate function is for XSI-compliant systems, so that possibly
you have a problem on non-XSI-compliant systems.)
By this change, s3fs can download a part of a object by requesting with
"Range" http header. It seems like downloading by each block unit.
The default block(part) size is 50MB, it is caused the result which is default
parallel requests count(5) by default multipart upload size(10MB).
If you need to change this block size, you can change by new option
"fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes.
So that, you have to take care about that fdcache.cpp(and fdcache.h) were
changed a lot.
2) About local cache
Local cache files which are in directory specified by "use_cache" option do
not have always all of object data.
This cause is that s3fs uses ftruncate function and reads(writes) each block
unit of a temporary file.
s3fs manages each block unit's status which are "downloaded area" or "not".
For this status, s3fs makes new temporary file in cache directory which is
specified by "use_cache" option. This status files is in a directory which is
named "<use_cache sirectory>/.<bucket_name>/".
When s3fs opens this status file, s3fs locks this file for exclusive control by
calling flock function. You need to take care about this, the status files can
not be laid on network drive(like NFS).
This revision changes about file open mode, s3fs always opens a local cache
file and each status file with writable mode.
Last, this revision adds new option "del_cache", this option means that s3fs
deletes all local cache file when s3fs starts and exits.
3) Uploading
When s3fs writes data to file descriptor through FUSE request, old s3fs
revision downloads all of the object. But new revision does not download all,
it downloads only small percial area(some block units) including writing data
area.
And when s3fs closes or flushes the file descriptor, s3fs downloads other area
which is not downloaded from server. After that, s3fs uploads all of data.
Already r456 revision has parallel upload function, then this revision with
r456 and r457 are very big change for performance.
4) Downloading
By changing a temporary file and a local cache file, when s3fs downloads a
object, it downloads only the required range(some block units).
And s3fs downloads units by parallel GET request, it is same as a case of
uploading. (Maximum parallel request count and each download size are
specified same parameters for uploading.)
In the new revision, when s3fs opens file, s3fs returns file descriptor soon.
Because s3fs only opens(makes) the file descriptor with no downloading
data. And when s3fs reads a data, s3fs downloads only some block unit
including specified area.
This result is good for performance.
5) Changes option name
The option "parallel_upload" which added at r456 is changed to new option
name as "parallel_count". This reason is this option value is not only used by
uploading object, but a uploading object also uses this option. (For a while,
you can use old option name "parallel_upload" for compatibility.)
git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
2020-08-02 13:37:06 +00:00
//
2020-08-22 12:40:53 +00:00
// For cache directory top path
//
# if defined(P_tmpdir)
# define TMPFILE_DIR_0PATH P_tmpdir
# else
# define TMPFILE_DIR_0PATH " / tmp"
# endif
2020-08-02 13:37:06 +00:00
2020-08-22 12:40:53 +00:00
//
// The following symbols are used by FdManager::RawCheckAllCache().
//
# define CACHEDBG_FMT_DIR_PROB "Directory: %s"
2021-02-04 01:41:29 +00:00
# define CACHEDBG_FMT_HEAD "---------------------------------------------------------------------------\n" \
" Check cache file and its stats file consistency at %s \n " \
" --------------------------------------------------------------------------- "
# define CACHEDBG_FMT_FOOT "---------------------------------------------------------------------------\n" \
2020-08-22 12:40:53 +00:00
" Summary - Total files: %d \n " \
" Detected error files: %d \n " \
" Detected error directories: %d \n " \
2021-02-04 01:41:29 +00:00
" --------------------------------------------------------------------------- "
2020-08-22 12:40:53 +00:00
# define CACHEDBG_FMT_FILE_OK "File: %s%s -> [OK] no problem"
# define CACHEDBG_FMT_FILE_PROB "File: %s%s"
# define CACHEDBG_FMT_DIR_PROB "Directory: %s"
# define CACHEDBG_FMT_ERR_HEAD " -> [E] there is a mark that data exists in stats, but there is no data in the cache file."
# define CACHEDBG_FMT_WARN_HEAD " -> [W] These show no data in stats, but there is evidence of data in the cache file(no problem)."
# define CACHEDBG_FMT_WARN_OPEN "\n -> [W] This file is currently open and may not provide accurate analysis results."
# define CACHEDBG_FMT_CRIT_HEAD " -> [C] %s"
# define CACHEDBG_FMT_CRIT_HEAD2 " -> [C] "
# define CACHEDBG_FMT_PROB_BLOCK " 0x%016zx(0x%016zx bytes)"
2020-08-02 13:37:06 +00:00
2015-03-04 08:48:37 +00:00
// [NOTE]
// NOCACHE_PATH_PREFIX symbol needs for not using cache mode.
// Now s3fs I/F functions in s3fs.cpp has left the processing
// to FdManager and FdEntity class. FdManager class manages
2015-07-10 18:50:40 +00:00
// the list of local file stat and file descriptor in conjunction
2015-03-04 08:48:37 +00:00
// with the FdEntity class.
// When s3fs is not using local cache, it means FdManager must
2015-07-10 18:50:40 +00:00
// return new temporary file descriptor at each opening it.
2015-03-04 08:48:37 +00:00
// Then FdManager caches fd by key which is dummy file path
// instead of real file path.
// This process may not be complete, but it is easy way can
// be realized.
//
# define NOCACHE_PATH_PREFIX_FORM " __S3FS_UNEXISTED_PATH_%lx__ / " // important space words for simply
Changes codes for performance(part 3)
* Summay
This revision includes big change about temporary file and local cache file.
By this big change, s3fs works with good performance when s3fs opens/
closes/syncs/reads object.
I made a big change about the handling about temporary file and local cache
file to do this implementation.
* Detail
1) About temporary file(local file)
s3fs uses a temporary file on local file system when s3fs does download/
upload/open/seek object on S3.
After this revision, s3fs calls ftruncate() function when s3fs makes the
temporary file.
In this way s3fs can set a file size of precisely length without downloading.
(Notice - ftruncate function is for XSI-compliant systems, so that possibly
you have a problem on non-XSI-compliant systems.)
By this change, s3fs can download a part of a object by requesting with
"Range" http header. It seems like downloading by each block unit.
The default block(part) size is 50MB, it is caused the result which is default
parallel requests count(5) by default multipart upload size(10MB).
If you need to change this block size, you can change by new option
"fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes.
So that, you have to take care about that fdcache.cpp(and fdcache.h) were
changed a lot.
2) About local cache
Local cache files which are in directory specified by "use_cache" option do
not have always all of object data.
This cause is that s3fs uses ftruncate function and reads(writes) each block
unit of a temporary file.
s3fs manages each block unit's status which are "downloaded area" or "not".
For this status, s3fs makes new temporary file in cache directory which is
specified by "use_cache" option. This status files is in a directory which is
named "<use_cache sirectory>/.<bucket_name>/".
When s3fs opens this status file, s3fs locks this file for exclusive control by
calling flock function. You need to take care about this, the status files can
not be laid on network drive(like NFS).
This revision changes about file open mode, s3fs always opens a local cache
file and each status file with writable mode.
Last, this revision adds new option "del_cache", this option means that s3fs
deletes all local cache file when s3fs starts and exits.
3) Uploading
When s3fs writes data to file descriptor through FUSE request, old s3fs
revision downloads all of the object. But new revision does not download all,
it downloads only small percial area(some block units) including writing data
area.
And when s3fs closes or flushes the file descriptor, s3fs downloads other area
which is not downloaded from server. After that, s3fs uploads all of data.
Already r456 revision has parallel upload function, then this revision with
r456 and r457 are very big change for performance.
4) Downloading
By changing a temporary file and a local cache file, when s3fs downloads a
object, it downloads only the required range(some block units).
And s3fs downloads units by parallel GET request, it is same as a case of
uploading. (Maximum parallel request count and each download size are
specified same parameters for uploading.)
In the new revision, when s3fs opens file, s3fs returns file descriptor soon.
Because s3fs only opens(makes) the file descriptor with no downloading
data. And when s3fs reads a data, s3fs downloads only some block unit
including specified area.
This result is good for performance.
5) Changes option name
The option "parallel_upload" which added at r456 is changed to new option
name as "parallel_count". This reason is this option value is not only used by
uploading object, but a uploading object also uses this option. (For a while,
you can use old option name "parallel_upload" for compatibility.)
git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
//------------------------------------------------
2014-04-05 05:11:55 +00:00
// FdManager class variable
Changes codes for performance(part 3)
* Summay
This revision includes big change about temporary file and local cache file.
By this big change, s3fs works with good performance when s3fs opens/
closes/syncs/reads object.
I made a big change about the handling about temporary file and local cache
file to do this implementation.
* Detail
1) About temporary file(local file)
s3fs uses a temporary file on local file system when s3fs does download/
upload/open/seek object on S3.
After this revision, s3fs calls ftruncate() function when s3fs makes the
temporary file.
In this way s3fs can set a file size of precisely length without downloading.
(Notice - ftruncate function is for XSI-compliant systems, so that possibly
you have a problem on non-XSI-compliant systems.)
By this change, s3fs can download a part of a object by requesting with
"Range" http header. It seems like downloading by each block unit.
The default block(part) size is 50MB, it is caused the result which is default
parallel requests count(5) by default multipart upload size(10MB).
If you need to change this block size, you can change by new option
"fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes.
So that, you have to take care about that fdcache.cpp(and fdcache.h) were
changed a lot.
2) About local cache
Local cache files which are in directory specified by "use_cache" option do
not have always all of object data.
This cause is that s3fs uses ftruncate function and reads(writes) each block
unit of a temporary file.
s3fs manages each block unit's status which are "downloaded area" or "not".
For this status, s3fs makes new temporary file in cache directory which is
specified by "use_cache" option. This status files is in a directory which is
named "<use_cache sirectory>/.<bucket_name>/".
When s3fs opens this status file, s3fs locks this file for exclusive control by
calling flock function. You need to take care about this, the status files can
not be laid on network drive(like NFS).
This revision changes about file open mode, s3fs always opens a local cache
file and each status file with writable mode.
Last, this revision adds new option "del_cache", this option means that s3fs
deletes all local cache file when s3fs starts and exits.
3) Uploading
When s3fs writes data to file descriptor through FUSE request, old s3fs
revision downloads all of the object. But new revision does not download all,
it downloads only small percial area(some block units) including writing data
area.
And when s3fs closes or flushes the file descriptor, s3fs downloads other area
which is not downloaded from server. After that, s3fs uploads all of data.
Already r456 revision has parallel upload function, then this revision with
r456 and r457 are very big change for performance.
4) Downloading
By changing a temporary file and a local cache file, when s3fs downloads a
object, it downloads only the required range(some block units).
And s3fs downloads units by parallel GET request, it is same as a case of
uploading. (Maximum parallel request count and each download size are
specified same parameters for uploading.)
In the new revision, when s3fs opens file, s3fs returns file descriptor soon.
Because s3fs only opens(makes) the file descriptor with no downloading
data. And when s3fs reads a data, s3fs downloads only some block unit
including specified area.
This result is good for performance.
5) Changes option name
The option "parallel_upload" which added at r456 is changed to new option
name as "parallel_count". This reason is this option value is not only used by
uploading object, but a uploading object also uses this option. (For a while,
you can use old option name "parallel_upload" for compatibility.)
git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
//------------------------------------------------
FdManager FdManager : : singleton ;
pthread_mutex_t FdManager : : fd_manager_lock ;
2017-04-02 07:22:12 +00:00
pthread_mutex_t FdManager : : cache_cleanup_lock ;
2018-01-29 11:19:39 +00:00
pthread_mutex_t FdManager : : reserved_diskspace_lock ;
Changes codes for performance(part 3)
* Summay
This revision includes big change about temporary file and local cache file.
By this big change, s3fs works with good performance when s3fs opens/
closes/syncs/reads object.
I made a big change about the handling about temporary file and local cache
file to do this implementation.
* Detail
1) About temporary file(local file)
s3fs uses a temporary file on local file system when s3fs does download/
upload/open/seek object on S3.
After this revision, s3fs calls ftruncate() function when s3fs makes the
temporary file.
In this way s3fs can set a file size of precisely length without downloading.
(Notice - ftruncate function is for XSI-compliant systems, so that possibly
you have a problem on non-XSI-compliant systems.)
By this change, s3fs can download a part of a object by requesting with
"Range" http header. It seems like downloading by each block unit.
The default block(part) size is 50MB, it is caused the result which is default
parallel requests count(5) by default multipart upload size(10MB).
If you need to change this block size, you can change by new option
"fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes.
So that, you have to take care about that fdcache.cpp(and fdcache.h) were
changed a lot.
2) About local cache
Local cache files which are in directory specified by "use_cache" option do
not have always all of object data.
This cause is that s3fs uses ftruncate function and reads(writes) each block
unit of a temporary file.
s3fs manages each block unit's status which are "downloaded area" or "not".
For this status, s3fs makes new temporary file in cache directory which is
specified by "use_cache" option. This status files is in a directory which is
named "<use_cache sirectory>/.<bucket_name>/".
When s3fs opens this status file, s3fs locks this file for exclusive control by
calling flock function. You need to take care about this, the status files can
not be laid on network drive(like NFS).
This revision changes about file open mode, s3fs always opens a local cache
file and each status file with writable mode.
Last, this revision adds new option "del_cache", this option means that s3fs
deletes all local cache file when s3fs starts and exits.
3) Uploading
When s3fs writes data to file descriptor through FUSE request, old s3fs
revision downloads all of the object. But new revision does not download all,
it downloads only small percial area(some block units) including writing data
area.
And when s3fs closes or flushes the file descriptor, s3fs downloads other area
which is not downloaded from server. After that, s3fs uploads all of data.
Already r456 revision has parallel upload function, then this revision with
r456 and r457 are very big change for performance.
4) Downloading
By changing a temporary file and a local cache file, when s3fs downloads a
object, it downloads only the required range(some block units).
And s3fs downloads units by parallel GET request, it is same as a case of
uploading. (Maximum parallel request count and each download size are
specified same parameters for uploading.)
In the new revision, when s3fs opens file, s3fs returns file descriptor soon.
Because s3fs only opens(makes) the file descriptor with no downloading
data. And when s3fs reads a data, s3fs downloads only some block unit
including specified area.
This result is good for performance.
5) Changes option name
The option "parallel_upload" which added at r456 is changed to new option
name as "parallel_count". This reason is this option value is not only used by
uploading object, but a uploading object also uses this option. (For a while,
you can use old option name "parallel_upload" for compatibility.)
git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
bool FdManager : : is_lock_init ( false ) ;
2020-09-11 09:37:24 +00:00
std : : string FdManager : : cache_dir ;
2017-04-02 08:10:16 +00:00
bool FdManager : : check_cache_dir_exist ( false ) ;
2019-06-14 09:04:57 +00:00
off_t FdManager : : free_disk_space = 0 ;
2020-06-28 08:00:41 +00:00
std : : string FdManager : : check_cache_output ;
2020-08-16 12:37:11 +00:00
bool FdManager : : checked_lseek ( false ) ;
bool FdManager : : have_lseek_hole ( false ) ;
Changes codes for performance(part 3)
* Summay
This revision includes big change about temporary file and local cache file.
By this big change, s3fs works with good performance when s3fs opens/
closes/syncs/reads object.
I made a big change about the handling about temporary file and local cache
file to do this implementation.
* Detail
1) About temporary file(local file)
s3fs uses a temporary file on local file system when s3fs does download/
upload/open/seek object on S3.
After this revision, s3fs calls ftruncate() function when s3fs makes the
temporary file.
In this way s3fs can set a file size of precisely length without downloading.
(Notice - ftruncate function is for XSI-compliant systems, so that possibly
you have a problem on non-XSI-compliant systems.)
By this change, s3fs can download a part of a object by requesting with
"Range" http header. It seems like downloading by each block unit.
The default block(part) size is 50MB, it is caused the result which is default
parallel requests count(5) by default multipart upload size(10MB).
If you need to change this block size, you can change by new option
"fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes.
So that, you have to take care about that fdcache.cpp(and fdcache.h) were
changed a lot.
2) About local cache
Local cache files which are in directory specified by "use_cache" option do
not have always all of object data.
This cause is that s3fs uses ftruncate function and reads(writes) each block
unit of a temporary file.
s3fs manages each block unit's status which are "downloaded area" or "not".
For this status, s3fs makes new temporary file in cache directory which is
specified by "use_cache" option. This status files is in a directory which is
named "<use_cache sirectory>/.<bucket_name>/".
When s3fs opens this status file, s3fs locks this file for exclusive control by
calling flock function. You need to take care about this, the status files can
not be laid on network drive(like NFS).
This revision changes about file open mode, s3fs always opens a local cache
file and each status file with writable mode.
Last, this revision adds new option "del_cache", this option means that s3fs
deletes all local cache file when s3fs starts and exits.
3) Uploading
When s3fs writes data to file descriptor through FUSE request, old s3fs
revision downloads all of the object. But new revision does not download all,
it downloads only small percial area(some block units) including writing data
area.
And when s3fs closes or flushes the file descriptor, s3fs downloads other area
which is not downloaded from server. After that, s3fs uploads all of data.
Already r456 revision has parallel upload function, then this revision with
r456 and r457 are very big change for performance.
4) Downloading
By changing a temporary file and a local cache file, when s3fs downloads a
object, it downloads only the required range(some block units).
And s3fs downloads units by parallel GET request, it is same as a case of
uploading. (Maximum parallel request count and each download size are
specified same parameters for uploading.)
In the new revision, when s3fs opens file, s3fs returns file descriptor soon.
Because s3fs only opens(makes) the file descriptor with no downloading
data. And when s3fs reads a data, s3fs downloads only some block unit
including specified area.
This result is good for performance.
5) Changes option name
The option "parallel_upload" which added at r456 is changed to new option
name as "parallel_count". This reason is this option value is not only used by
uploading object, but a uploading object also uses this option. (For a while,
you can use old option name "parallel_upload" for compatibility.)
git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
//------------------------------------------------
// FdManager class methods
//------------------------------------------------
bool FdManager : : SetCacheDir ( const char * dir )
{
2020-08-22 12:40:53 +00:00
if ( ! dir | | ' \0 ' = = dir [ 0 ] ) {
cache_dir = " " ;
} else {
cache_dir = dir ;
}
return true ;
2013-05-28 05:54:09 +00:00
}
2020-06-28 08:00:41 +00:00
bool FdManager : : SetCacheCheckOutput ( const char * path )
{
2020-08-22 12:40:53 +00:00
if ( ! path | | ' \0 ' = = path [ 0 ] ) {
check_cache_output . erase ( ) ;
} else {
check_cache_output = path ;
}
return true ;
2020-06-28 08:00:41 +00:00
}
2019-01-23 23:44:50 +00:00
bool FdManager : : DeleteCacheDirectory ( )
Changes codes for performance(part 3)
* Summay
This revision includes big change about temporary file and local cache file.
By this big change, s3fs works with good performance when s3fs opens/
closes/syncs/reads object.
I made a big change about the handling about temporary file and local cache
file to do this implementation.
* Detail
1) About temporary file(local file)
s3fs uses a temporary file on local file system when s3fs does download/
upload/open/seek object on S3.
After this revision, s3fs calls ftruncate() function when s3fs makes the
temporary file.
In this way s3fs can set a file size of precisely length without downloading.
(Notice - ftruncate function is for XSI-compliant systems, so that possibly
you have a problem on non-XSI-compliant systems.)
By this change, s3fs can download a part of a object by requesting with
"Range" http header. It seems like downloading by each block unit.
The default block(part) size is 50MB, it is caused the result which is default
parallel requests count(5) by default multipart upload size(10MB).
If you need to change this block size, you can change by new option
"fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes.
So that, you have to take care about that fdcache.cpp(and fdcache.h) were
changed a lot.
2) About local cache
Local cache files which are in directory specified by "use_cache" option do
not have always all of object data.
This cause is that s3fs uses ftruncate function and reads(writes) each block
unit of a temporary file.
s3fs manages each block unit's status which are "downloaded area" or "not".
For this status, s3fs makes new temporary file in cache directory which is
specified by "use_cache" option. This status files is in a directory which is
named "<use_cache sirectory>/.<bucket_name>/".
When s3fs opens this status file, s3fs locks this file for exclusive control by
calling flock function. You need to take care about this, the status files can
not be laid on network drive(like NFS).
This revision changes about file open mode, s3fs always opens a local cache
file and each status file with writable mode.
Last, this revision adds new option "del_cache", this option means that s3fs
deletes all local cache file when s3fs starts and exits.
3) Uploading
When s3fs writes data to file descriptor through FUSE request, old s3fs
revision downloads all of the object. But new revision does not download all,
it downloads only small percial area(some block units) including writing data
area.
And when s3fs closes or flushes the file descriptor, s3fs downloads other area
which is not downloaded from server. After that, s3fs uploads all of data.
Already r456 revision has parallel upload function, then this revision with
r456 and r457 are very big change for performance.
4) Downloading
By changing a temporary file and a local cache file, when s3fs downloads a
object, it downloads only the required range(some block units).
And s3fs downloads units by parallel GET request, it is same as a case of
uploading. (Maximum parallel request count and each download size are
specified same parameters for uploading.)
In the new revision, when s3fs opens file, s3fs returns file descriptor soon.
Because s3fs only opens(makes) the file descriptor with no downloading
data. And when s3fs reads a data, s3fs downloads only some block unit
including specified area.
This result is good for performance.
5) Changes option name
The option "parallel_upload" which added at r456 is changed to new option
name as "parallel_count". This reason is this option value is not only used by
uploading object, but a uploading object also uses this option. (For a while,
you can use old option name "parallel_upload" for compatibility.)
git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
{
2020-08-22 12:40:53 +00:00
if ( FdManager : : cache_dir . empty ( ) ) {
return true ;
}
2019-01-26 02:08:14 +00:00
2020-09-11 09:37:24 +00:00
std : : string cache_path ;
2020-08-22 12:40:53 +00:00
if ( ! FdManager : : MakeCachePath ( NULL , cache_path , false ) ) {
return false ;
}
if ( ! delete_files_in_dir ( cache_path . c_str ( ) , true ) ) {
return false ;
}
2019-01-26 02:08:14 +00:00
2020-09-11 09:37:24 +00:00
std : : string mirror_path = FdManager : : cache_dir + " /. " + bucket + " .mirror " ;
2020-08-22 12:40:53 +00:00
if ( ! delete_files_in_dir ( mirror_path . c_str ( ) , true ) ) {
return false ;
}
2019-01-26 02:08:14 +00:00
2020-08-22 12:40:53 +00:00
return true ;
Changes codes for performance(part 3)
* Summay
This revision includes big change about temporary file and local cache file.
By this big change, s3fs works with good performance when s3fs opens/
closes/syncs/reads object.
I made a big change about the handling about temporary file and local cache
file to do this implementation.
* Detail
1) About temporary file(local file)
s3fs uses a temporary file on local file system when s3fs does download/
upload/open/seek object on S3.
After this revision, s3fs calls ftruncate() function when s3fs makes the
temporary file.
In this way s3fs can set a file size of precisely length without downloading.
(Notice - ftruncate function is for XSI-compliant systems, so that possibly
you have a problem on non-XSI-compliant systems.)
By this change, s3fs can download a part of a object by requesting with
"Range" http header. It seems like downloading by each block unit.
The default block(part) size is 50MB, it is caused the result which is default
parallel requests count(5) by default multipart upload size(10MB).
If you need to change this block size, you can change by new option
"fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes.
So that, you have to take care about that fdcache.cpp(and fdcache.h) were
changed a lot.
2) About local cache
Local cache files which are in directory specified by "use_cache" option do
not have always all of object data.
This cause is that s3fs uses ftruncate function and reads(writes) each block
unit of a temporary file.
s3fs manages each block unit's status which are "downloaded area" or "not".
For this status, s3fs makes new temporary file in cache directory which is
specified by "use_cache" option. This status files is in a directory which is
named "<use_cache sirectory>/.<bucket_name>/".
When s3fs opens this status file, s3fs locks this file for exclusive control by
calling flock function. You need to take care about this, the status files can
not be laid on network drive(like NFS).
This revision changes about file open mode, s3fs always opens a local cache
file and each status file with writable mode.
Last, this revision adds new option "del_cache", this option means that s3fs
deletes all local cache file when s3fs starts and exits.
3) Uploading
When s3fs writes data to file descriptor through FUSE request, old s3fs
revision downloads all of the object. But new revision does not download all,
it downloads only small percial area(some block units) including writing data
area.
And when s3fs closes or flushes the file descriptor, s3fs downloads other area
which is not downloaded from server. After that, s3fs uploads all of data.
Already r456 revision has parallel upload function, then this revision with
r456 and r457 are very big change for performance.
4) Downloading
By changing a temporary file and a local cache file, when s3fs downloads a
object, it downloads only the required range(some block units).
And s3fs downloads units by parallel GET request, it is same as a case of
uploading. (Maximum parallel request count and each download size are
specified same parameters for uploading.)
In the new revision, when s3fs opens file, s3fs returns file descriptor soon.
Because s3fs only opens(makes) the file descriptor with no downloading
data. And when s3fs reads a data, s3fs downloads only some block unit
including specified area.
This result is good for performance.
5) Changes option name
The option "parallel_upload" which added at r456 is changed to new option
name as "parallel_count". This reason is this option value is not only used by
uploading object, but a uploading object also uses this option. (For a while,
you can use old option name "parallel_upload" for compatibility.)
git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
int FdManager : : DeleteCacheFile ( const char * path )
2013-05-28 05:54:09 +00:00
{
2020-08-22 12:40:53 +00:00
S3FS_PRN_INFO3 ( " [path=%s] " , SAFESTRPTR ( path ) ) ;
if ( ! path ) {
return - EIO ;
2015-10-18 17:03:41 +00:00
}
2020-08-22 12:40:53 +00:00
if ( FdManager : : cache_dir . empty ( ) ) {
return 0 ;
2015-10-18 17:03:41 +00:00
}
2020-09-11 09:37:24 +00:00
std : : string cache_path ;
2020-08-22 12:40:53 +00:00
if ( ! FdManager : : MakeCachePath ( path , cache_path , false ) ) {
return 0 ;
}
int result = 0 ;
if ( 0 ! = unlink ( cache_path . c_str ( ) ) ) {
if ( ENOENT = = errno ) {
S3FS_PRN_DBG ( " failed to delete file(%s): errno=%d " , path , errno ) ;
} else {
S3FS_PRN_ERR ( " failed to delete file(%s): errno=%d " , path , errno ) ;
}
result = - errno ;
}
if ( ! CacheFileStat : : DeleteCacheFileStat ( path ) ) {
if ( ENOENT = = errno ) {
S3FS_PRN_DBG ( " failed to delete stat file(%s): errno=%d " , path , errno ) ;
} else {
S3FS_PRN_ERR ( " failed to delete stat file(%s): errno=%d " , path , errno ) ;
}
if ( 0 ! = errno ) {
result = - errno ;
} else {
result = - EIO ;
}
2013-07-29 08:20:19 +00:00
}
2020-08-22 12:40:53 +00:00
return result ;
Changes codes for performance(part 3)
* Summay
This revision includes big change about temporary file and local cache file.
By this big change, s3fs works with good performance when s3fs opens/
closes/syncs/reads object.
I made a big change about the handling about temporary file and local cache
file to do this implementation.
* Detail
1) About temporary file(local file)
s3fs uses a temporary file on local file system when s3fs does download/
upload/open/seek object on S3.
After this revision, s3fs calls ftruncate() function when s3fs makes the
temporary file.
In this way s3fs can set a file size of precisely length without downloading.
(Notice - ftruncate function is for XSI-compliant systems, so that possibly
you have a problem on non-XSI-compliant systems.)
By this change, s3fs can download a part of a object by requesting with
"Range" http header. It seems like downloading by each block unit.
The default block(part) size is 50MB, it is caused the result which is default
parallel requests count(5) by default multipart upload size(10MB).
If you need to change this block size, you can change by new option
"fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes.
So that, you have to take care about that fdcache.cpp(and fdcache.h) were
changed a lot.
2) About local cache
Local cache files which are in directory specified by "use_cache" option do
not have always all of object data.
This cause is that s3fs uses ftruncate function and reads(writes) each block
unit of a temporary file.
s3fs manages each block unit's status which are "downloaded area" or "not".
For this status, s3fs makes new temporary file in cache directory which is
specified by "use_cache" option. This status files is in a directory which is
named "<use_cache sirectory>/.<bucket_name>/".
When s3fs opens this status file, s3fs locks this file for exclusive control by
calling flock function. You need to take care about this, the status files can
not be laid on network drive(like NFS).
This revision changes about file open mode, s3fs always opens a local cache
file and each status file with writable mode.
Last, this revision adds new option "del_cache", this option means that s3fs
deletes all local cache file when s3fs starts and exits.
3) Uploading
When s3fs writes data to file descriptor through FUSE request, old s3fs
revision downloads all of the object. But new revision does not download all,
it downloads only small percial area(some block units) including writing data
area.
And when s3fs closes or flushes the file descriptor, s3fs downloads other area
which is not downloaded from server. After that, s3fs uploads all of data.
Already r456 revision has parallel upload function, then this revision with
r456 and r457 are very big change for performance.
4) Downloading
By changing a temporary file and a local cache file, when s3fs downloads a
object, it downloads only the required range(some block units).
And s3fs downloads units by parallel GET request, it is same as a case of
uploading. (Maximum parallel request count and each download size are
specified same parameters for uploading.)
In the new revision, when s3fs opens file, s3fs returns file descriptor soon.
Because s3fs only opens(makes) the file descriptor with no downloading
data. And when s3fs reads a data, s3fs downloads only some block unit
including specified area.
This result is good for performance.
5) Changes option name
The option "parallel_upload" which added at r456 is changed to new option
name as "parallel_count". This reason is this option value is not only used by
uploading object, but a uploading object also uses this option. (For a while,
you can use old option name "parallel_upload" for compatibility.)
git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
2020-09-11 09:37:24 +00:00
bool FdManager : : MakeCachePath ( const char * path , std : : string & cache_path , bool is_create_dir , bool is_mirror_path )
Changes codes for performance(part 3)
* Summay
This revision includes big change about temporary file and local cache file.
By this big change, s3fs works with good performance when s3fs opens/
closes/syncs/reads object.
I made a big change about the handling about temporary file and local cache
file to do this implementation.
* Detail
1) About temporary file(local file)
s3fs uses a temporary file on local file system when s3fs does download/
upload/open/seek object on S3.
After this revision, s3fs calls ftruncate() function when s3fs makes the
temporary file.
In this way s3fs can set a file size of precisely length without downloading.
(Notice - ftruncate function is for XSI-compliant systems, so that possibly
you have a problem on non-XSI-compliant systems.)
By this change, s3fs can download a part of a object by requesting with
"Range" http header. It seems like downloading by each block unit.
The default block(part) size is 50MB, it is caused the result which is default
parallel requests count(5) by default multipart upload size(10MB).
If you need to change this block size, you can change by new option
"fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes.
So that, you have to take care about that fdcache.cpp(and fdcache.h) were
changed a lot.
2) About local cache
Local cache files which are in directory specified by "use_cache" option do
not have always all of object data.
This cause is that s3fs uses ftruncate function and reads(writes) each block
unit of a temporary file.
s3fs manages each block unit's status which are "downloaded area" or "not".
For this status, s3fs makes new temporary file in cache directory which is
specified by "use_cache" option. This status files is in a directory which is
named "<use_cache sirectory>/.<bucket_name>/".
When s3fs opens this status file, s3fs locks this file for exclusive control by
calling flock function. You need to take care about this, the status files can
not be laid on network drive(like NFS).
This revision changes about file open mode, s3fs always opens a local cache
file and each status file with writable mode.
Last, this revision adds new option "del_cache", this option means that s3fs
deletes all local cache file when s3fs starts and exits.
3) Uploading
When s3fs writes data to file descriptor through FUSE request, old s3fs
revision downloads all of the object. But new revision does not download all,
it downloads only small percial area(some block units) including writing data
area.
And when s3fs closes or flushes the file descriptor, s3fs downloads other area
which is not downloaded from server. After that, s3fs uploads all of data.
Already r456 revision has parallel upload function, then this revision with
r456 and r457 are very big change for performance.
4) Downloading
By changing a temporary file and a local cache file, when s3fs downloads a
object, it downloads only the required range(some block units).
And s3fs downloads units by parallel GET request, it is same as a case of
uploading. (Maximum parallel request count and each download size are
specified same parameters for uploading.)
In the new revision, when s3fs opens file, s3fs returns file descriptor soon.
Because s3fs only opens(makes) the file descriptor with no downloading
data. And when s3fs reads a data, s3fs downloads only some block unit
including specified area.
This result is good for performance.
5) Changes option name
The option "parallel_upload" which added at r456 is changed to new option
name as "parallel_count". This reason is this option value is not only used by
uploading object, but a uploading object also uses this option. (For a while,
you can use old option name "parallel_upload" for compatibility.)
git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
{
2020-08-22 12:40:53 +00:00
if ( FdManager : : cache_dir . empty ( ) ) {
cache_path = " " ;
return true ;
}
2020-09-11 09:37:24 +00:00
std : : string resolved_path ( FdManager : : cache_dir ) ;
2020-08-22 12:40:53 +00:00
if ( ! is_mirror_path ) {
resolved_path + = " / " ;
resolved_path + = bucket ;
} else {
resolved_path + = " /. " ;
resolved_path + = bucket ;
resolved_path + = " .mirror " ;
}
if ( is_create_dir ) {
int result ;
if ( 0 ! = ( result = mkdirp ( resolved_path + mydirname ( path ) , 0777 ) ) ) {
S3FS_PRN_ERR ( " failed to create dir(%s) by errno(%d). " , path , result ) ;
return false ;
}
}
if ( ! path | | ' \0 ' = = path [ 0 ] ) {
cache_path = resolved_path ;
} else {
cache_path = resolved_path + SAFESTRPTR ( path ) ;
2015-08-23 03:57:34 +00:00
}
2020-08-22 12:40:53 +00:00
return true ;
Changes codes for performance(part 3)
* Summay
This revision includes big change about temporary file and local cache file.
By this big change, s3fs works with good performance when s3fs opens/
closes/syncs/reads object.
I made a big change about the handling about temporary file and local cache
file to do this implementation.
* Detail
1) About temporary file(local file)
s3fs uses a temporary file on local file system when s3fs does download/
upload/open/seek object on S3.
After this revision, s3fs calls ftruncate() function when s3fs makes the
temporary file.
In this way s3fs can set a file size of precisely length without downloading.
(Notice - ftruncate function is for XSI-compliant systems, so that possibly
you have a problem on non-XSI-compliant systems.)
By this change, s3fs can download a part of a object by requesting with
"Range" http header. It seems like downloading by each block unit.
The default block(part) size is 50MB, it is caused the result which is default
parallel requests count(5) by default multipart upload size(10MB).
If you need to change this block size, you can change by new option
"fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes.
So that, you have to take care about that fdcache.cpp(and fdcache.h) were
changed a lot.
2) About local cache
Local cache files which are in directory specified by "use_cache" option do
not have always all of object data.
This cause is that s3fs uses ftruncate function and reads(writes) each block
unit of a temporary file.
s3fs manages each block unit's status which are "downloaded area" or "not".
For this status, s3fs makes new temporary file in cache directory which is
specified by "use_cache" option. This status files is in a directory which is
named "<use_cache sirectory>/.<bucket_name>/".
When s3fs opens this status file, s3fs locks this file for exclusive control by
calling flock function. You need to take care about this, the status files can
not be laid on network drive(like NFS).
This revision changes about file open mode, s3fs always opens a local cache
file and each status file with writable mode.
Last, this revision adds new option "del_cache", this option means that s3fs
deletes all local cache file when s3fs starts and exits.
3) Uploading
When s3fs writes data to file descriptor through FUSE request, old s3fs
revision downloads all of the object. But new revision does not download all,
it downloads only small percial area(some block units) including writing data
area.
And when s3fs closes or flushes the file descriptor, s3fs downloads other area
which is not downloaded from server. After that, s3fs uploads all of data.
Already r456 revision has parallel upload function, then this revision with
r456 and r457 are very big change for performance.
4) Downloading
By changing a temporary file and a local cache file, when s3fs downloads a
object, it downloads only the required range(some block units).
And s3fs downloads units by parallel GET request, it is same as a case of
uploading. (Maximum parallel request count and each download size are
specified same parameters for uploading.)
In the new revision, when s3fs opens file, s3fs returns file descriptor soon.
Because s3fs only opens(makes) the file descriptor with no downloading
data. And when s3fs reads a data, s3fs downloads only some block unit
including specified area.
This result is good for performance.
5) Changes option name
The option "parallel_upload" which added at r456 is changed to new option
name as "parallel_count". This reason is this option value is not only used by
uploading object, but a uploading object also uses this option. (For a while,
you can use old option name "parallel_upload" for compatibility.)
git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
2019-01-23 23:44:50 +00:00
bool FdManager : : CheckCacheTopDir ( )
2015-08-23 03:57:34 +00:00
{
2020-08-22 12:40:53 +00:00
if ( FdManager : : cache_dir . empty ( ) ) {
return true ;
}
2020-09-11 09:37:24 +00:00
std : : string toppath ( FdManager : : cache_dir + " / " + bucket ) ;
2015-08-23 03:57:34 +00:00
2020-08-22 12:40:53 +00:00
return check_exist_dir_permission ( toppath . c_str ( ) ) ;
2015-08-23 03:57:34 +00:00
}
2020-09-11 09:37:24 +00:00
bool FdManager : : MakeRandomTempPath ( const char * path , std : : string & tmppath )
2015-03-04 08:48:37 +00:00
{
2020-08-22 12:40:53 +00:00
char szBuff [ 64 ] ;
2015-03-04 08:48:37 +00:00
2020-08-22 12:40:53 +00:00
sprintf ( szBuff , NOCACHE_PATH_PREFIX_FORM , random ( ) ) ; // worry for performance, but maybe don't worry.
tmppath = szBuff ;
tmppath + = path ? path : " " ;
return true ;
2015-03-04 08:48:37 +00:00
}
2017-04-02 08:10:16 +00:00
bool FdManager : : SetCheckCacheDirExist ( bool is_check )
{
2020-08-22 12:40:53 +00:00
bool old = FdManager : : check_cache_dir_exist ;
FdManager : : check_cache_dir_exist = is_check ;
return old ;
2017-04-02 08:10:16 +00:00
}
2019-01-23 23:44:50 +00:00
bool FdManager : : CheckCacheDirExist ( )
2017-04-02 08:10:16 +00:00
{
2020-08-22 12:40:53 +00:00
if ( ! FdManager : : check_cache_dir_exist ) {
return true ;
}
if ( FdManager : : cache_dir . empty ( ) ) {
return true ;
}
// check the directory
struct stat st ;
if ( 0 ! = stat ( cache_dir . c_str ( ) , & st ) ) {
S3FS_PRN_ERR ( " could not access to cache directory(%s) by errno(%d). " , cache_dir . c_str ( ) , errno ) ;
return false ;
}
if ( ! S_ISDIR ( st . st_mode ) ) {
S3FS_PRN_ERR ( " the cache directory(%s) is not directory. " , cache_dir . c_str ( ) ) ;
return false ;
}
2017-04-02 08:10:16 +00:00
return true ;
}
2019-07-29 22:23:57 +00:00
off_t FdManager : : GetEnsureFreeDiskSpace ( )
{
2020-08-22 12:40:53 +00:00
AutoLock auto_lock ( & FdManager : : reserved_diskspace_lock ) ;
return FdManager : : free_disk_space ;
2019-07-29 22:23:57 +00:00
}
2019-06-14 09:04:57 +00:00
off_t FdManager : : SetEnsureFreeDiskSpace ( off_t size )
2015-10-18 17:03:41 +00:00
{
2020-08-22 12:40:53 +00:00
AutoLock auto_lock ( & FdManager : : reserved_diskspace_lock ) ;
off_t old = FdManager : : free_disk_space ;
FdManager : : free_disk_space = size ;
return old ;
2015-10-18 17:03:41 +00:00
}
2019-06-14 09:04:57 +00:00
off_t FdManager : : GetFreeDiskSpace ( const char * path )
2015-10-18 17:03:41 +00:00
{
2020-08-22 12:40:53 +00:00
struct statvfs vfsbuf ;
2020-09-11 09:37:24 +00:00
std : : string ctoppath ;
2020-08-22 12:40:53 +00:00
if ( ! FdManager : : cache_dir . empty ( ) ) {
ctoppath = FdManager : : cache_dir + " / " ;
ctoppath = get_exist_directory_path ( ctoppath ) ; // existed directory
if ( ctoppath ! = " / " ) {
ctoppath + = " / " ;
}
} else {
ctoppath = TMPFILE_DIR_0PATH " / " ;
2017-04-02 08:10:16 +00:00
}
2020-08-22 12:40:53 +00:00
if ( path & & ' \0 ' ! = * path ) {
ctoppath + = path ;
} else {
ctoppath + = " . " ;
}
if ( - 1 = = statvfs ( ctoppath . c_str ( ) , & vfsbuf ) ) {
S3FS_PRN_ERR ( " could not get vfs stat by errno(%d) " , errno ) ;
return 0 ;
}
return ( vfsbuf . f_bavail * vfsbuf . f_frsize ) ;
2015-10-18 17:03:41 +00:00
}
2019-06-14 09:04:57 +00:00
bool FdManager : : IsSafeDiskSpace ( const char * path , off_t size )
2015-10-18 17:03:41 +00:00
{
2020-08-22 12:40:53 +00:00
off_t fsize = FdManager : : GetFreeDiskSpace ( path ) ;
return size + FdManager : : GetEnsureFreeDiskSpace ( ) < = fsize ;
2015-10-18 17:03:41 +00:00
}
2020-09-20 22:02:06 +00:00
bool FdManager : : HaveLseekHole ( )
2020-08-16 12:37:11 +00:00
{
2020-08-22 12:40:53 +00:00
if ( FdManager : : checked_lseek ) {
return FdManager : : have_lseek_hole ;
}
2020-08-16 12:37:11 +00:00
2020-08-22 12:40:53 +00:00
// create tempolary file
int fd ;
if ( - 1 = = ( fd = open ( TMPFILE_FOR_CHECK_HOLE , O_CREAT | O_RDWR , 0600 ) ) ) {
S3FS_PRN_ERR ( " failed to open tempolary file(%s) - errno(%d) " , TMPFILE_FOR_CHECK_HOLE , errno ) ;
FdManager : : checked_lseek = true ;
FdManager : : have_lseek_hole = false ;
return FdManager : : have_lseek_hole ;
}
// check SEEK_DATA/SEEK_HOLE options
bool result = true ;
if ( - 1 = = lseek ( fd , 0 , SEEK_DATA ) ) {
if ( EINVAL = = errno ) {
S3FS_PRN_ERR ( " lseek does not support SEEK_DATA " ) ;
result = false ;
}
2020-08-16 12:37:11 +00:00
}
2020-08-22 12:40:53 +00:00
if ( result & & - 1 = = lseek ( fd , 0 , SEEK_HOLE ) ) {
if ( EINVAL = = errno ) {
S3FS_PRN_ERR ( " lseek does not support SEEK_HOLE " ) ;
result = false ;
}
2020-08-16 12:37:11 +00:00
}
2020-08-22 12:40:53 +00:00
close ( fd ) ;
unlink ( TMPFILE_FOR_CHECK_HOLE ) ;
2020-08-16 12:37:11 +00:00
2020-08-22 12:40:53 +00:00
FdManager : : checked_lseek = true ;
FdManager : : have_lseek_hole = result ;
return FdManager : : have_lseek_hole ;
2020-08-16 12:37:11 +00:00
}
2021-05-23 16:28:50 +00:00
bool FdManager : : HasOpenEntityFd ( const char * path )
{
2021-05-23 16:55:25 +00:00
AutoLock auto_lock ( & FdManager : : fd_manager_lock ) ;
2021-05-23 16:28:50 +00:00
FdEntity * ent ;
int fd = - 1 ;
2021-05-23 16:55:25 +00:00
if ( NULL = = ( ent = FdManager : : singleton . GetFdEntity ( path , fd , false , true ) ) ) {
2021-05-23 16:28:50 +00:00
return false ;
}
return ( 0 < ent - > GetOpenCount ( ) ) ;
}
Changes codes for performance(part 3)
* Summay
This revision includes big change about temporary file and local cache file.
By this big change, s3fs works with good performance when s3fs opens/
closes/syncs/reads object.
I made a big change about the handling about temporary file and local cache
file to do this implementation.
* Detail
1) About temporary file(local file)
s3fs uses a temporary file on local file system when s3fs does download/
upload/open/seek object on S3.
After this revision, s3fs calls ftruncate() function when s3fs makes the
temporary file.
In this way s3fs can set a file size of precisely length without downloading.
(Notice - ftruncate function is for XSI-compliant systems, so that possibly
you have a problem on non-XSI-compliant systems.)
By this change, s3fs can download a part of a object by requesting with
"Range" http header. It seems like downloading by each block unit.
The default block(part) size is 50MB, it is caused the result which is default
parallel requests count(5) by default multipart upload size(10MB).
If you need to change this block size, you can change by new option
"fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes.
So that, you have to take care about that fdcache.cpp(and fdcache.h) were
changed a lot.
2) About local cache
Local cache files which are in directory specified by "use_cache" option do
not have always all of object data.
This cause is that s3fs uses ftruncate function and reads(writes) each block
unit of a temporary file.
s3fs manages each block unit's status which are "downloaded area" or "not".
For this status, s3fs makes new temporary file in cache directory which is
specified by "use_cache" option. This status files is in a directory which is
named "<use_cache sirectory>/.<bucket_name>/".
When s3fs opens this status file, s3fs locks this file for exclusive control by
calling flock function. You need to take care about this, the status files can
not be laid on network drive(like NFS).
This revision changes about file open mode, s3fs always opens a local cache
file and each status file with writable mode.
Last, this revision adds new option "del_cache", this option means that s3fs
deletes all local cache file when s3fs starts and exits.
3) Uploading
When s3fs writes data to file descriptor through FUSE request, old s3fs
revision downloads all of the object. But new revision does not download all,
it downloads only small percial area(some block units) including writing data
area.
And when s3fs closes or flushes the file descriptor, s3fs downloads other area
which is not downloaded from server. After that, s3fs uploads all of data.
Already r456 revision has parallel upload function, then this revision with
r456 and r457 are very big change for performance.
4) Downloading
By changing a temporary file and a local cache file, when s3fs downloads a
object, it downloads only the required range(some block units).
And s3fs downloads units by parallel GET request, it is same as a case of
uploading. (Maximum parallel request count and each download size are
specified same parameters for uploading.)
In the new revision, when s3fs opens file, s3fs returns file descriptor soon.
Because s3fs only opens(makes) the file descriptor with no downloading
data. And when s3fs reads a data, s3fs downloads only some block unit
including specified area.
This result is good for performance.
5) Changes option name
The option "parallel_upload" which added at r456 is changed to new option
name as "parallel_count". This reason is this option value is not only used by
uploading object, but a uploading object also uses this option. (For a while,
you can use old option name "parallel_upload" for compatibility.)
git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
//------------------------------------------------
// FdManager methods
//------------------------------------------------
FdManager : : FdManager ( )
{
2020-08-22 12:40:53 +00:00
if ( this = = FdManager : : get ( ) ) {
pthread_mutexattr_t attr ;
pthread_mutexattr_init ( & attr ) ;
2019-07-15 01:20:51 +00:00
# if S3FS_PTHREAD_ERRORCHECK
2020-08-22 12:40:53 +00:00
pthread_mutexattr_settype ( & attr , PTHREAD_MUTEX_ERRORCHECK ) ;
2019-07-15 01:20:51 +00:00
# endif
2021-01-24 22:56:10 +00:00
int result ;
if ( 0 ! = ( result = pthread_mutex_init ( & FdManager : : fd_manager_lock , & attr ) ) ) {
S3FS_PRN_CRIT ( " failed to init fd_manager_lock: %d " , result ) ;
2020-08-22 12:40:53 +00:00
abort ( ) ;
}
2021-01-24 22:56:10 +00:00
if ( 0 ! = ( result = pthread_mutex_init ( & FdManager : : cache_cleanup_lock , & attr ) ) ) {
S3FS_PRN_CRIT ( " failed to init cache_cleanup_lock: %d " , result ) ;
2020-08-22 12:40:53 +00:00
abort ( ) ;
}
2021-01-24 22:56:10 +00:00
if ( 0 ! = ( result = pthread_mutex_init ( & FdManager : : reserved_diskspace_lock , & attr ) ) ) {
S3FS_PRN_CRIT ( " failed to init reserved_diskspace_lock: %d " , result ) ;
2020-08-22 12:40:53 +00:00
abort ( ) ;
}
FdManager : : is_lock_init = true ;
} else {
abort ( ) ;
2020-05-29 09:13:22 +00:00
}
Changes codes for performance(part 3)
* Summay
This revision includes big change about temporary file and local cache file.
By this big change, s3fs works with good performance when s3fs opens/
closes/syncs/reads object.
I made a big change about the handling about temporary file and local cache
file to do this implementation.
* Detail
1) About temporary file(local file)
s3fs uses a temporary file on local file system when s3fs does download/
upload/open/seek object on S3.
After this revision, s3fs calls ftruncate() function when s3fs makes the
temporary file.
In this way s3fs can set a file size of precisely length without downloading.
(Notice - ftruncate function is for XSI-compliant systems, so that possibly
you have a problem on non-XSI-compliant systems.)
By this change, s3fs can download a part of a object by requesting with
"Range" http header. It seems like downloading by each block unit.
The default block(part) size is 50MB, it is caused the result which is default
parallel requests count(5) by default multipart upload size(10MB).
If you need to change this block size, you can change by new option
"fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes.
So that, you have to take care about that fdcache.cpp(and fdcache.h) were
changed a lot.
2) About local cache
Local cache files which are in directory specified by "use_cache" option do
not have always all of object data.
This cause is that s3fs uses ftruncate function and reads(writes) each block
unit of a temporary file.
s3fs manages each block unit's status which are "downloaded area" or "not".
For this status, s3fs makes new temporary file in cache directory which is
specified by "use_cache" option. This status files is in a directory which is
named "<use_cache sirectory>/.<bucket_name>/".
When s3fs opens this status file, s3fs locks this file for exclusive control by
calling flock function. You need to take care about this, the status files can
not be laid on network drive(like NFS).
This revision changes about file open mode, s3fs always opens a local cache
file and each status file with writable mode.
Last, this revision adds new option "del_cache", this option means that s3fs
deletes all local cache file when s3fs starts and exits.
3) Uploading
When s3fs writes data to file descriptor through FUSE request, old s3fs
revision downloads all of the object. But new revision does not download all,
it downloads only small percial area(some block units) including writing data
area.
And when s3fs closes or flushes the file descriptor, s3fs downloads other area
which is not downloaded from server. After that, s3fs uploads all of data.
Already r456 revision has parallel upload function, then this revision with
r456 and r457 are very big change for performance.
4) Downloading
By changing a temporary file and a local cache file, when s3fs downloads a
object, it downloads only the required range(some block units).
And s3fs downloads units by parallel GET request, it is same as a case of
uploading. (Maximum parallel request count and each download size are
specified same parameters for uploading.)
In the new revision, when s3fs opens file, s3fs returns file descriptor soon.
Because s3fs only opens(makes) the file descriptor with no downloading
data. And when s3fs reads a data, s3fs downloads only some block unit
including specified area.
This result is good for performance.
5) Changes option name
The option "parallel_upload" which added at r456 is changed to new option
name as "parallel_count". This reason is this option value is not only used by
uploading object, but a uploading object also uses this option. (For a while,
you can use old option name "parallel_upload" for compatibility.)
git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
FdManager : : ~ FdManager ( )
{
2020-08-22 12:40:53 +00:00
if ( this = = FdManager : : get ( ) ) {
for ( fdent_map_t : : iterator iter = fent . begin ( ) ; fent . end ( ) ! = iter ; + + iter ) {
FdEntity * ent = ( * iter ) . second ;
2021-05-23 16:28:50 +00:00
S3FS_PRN_WARN ( " To exit with the cache file opened: path=%s, refcnt=%d " , ent - > GetPath ( ) , ent - > GetOpenCount ( ) ) ;
2020-08-22 12:40:53 +00:00
delete ent ;
}
fent . clear ( ) ;
if ( FdManager : : is_lock_init ) {
2021-01-24 22:56:10 +00:00
int result ;
if ( 0 ! = ( result = pthread_mutex_destroy ( & FdManager : : fd_manager_lock ) ) ) {
S3FS_PRN_CRIT ( " failed to destroy fd_manager_lock: %d " , result ) ;
2020-08-22 12:40:53 +00:00
abort ( ) ;
}
2021-01-24 22:56:10 +00:00
if ( 0 ! = ( result = pthread_mutex_destroy ( & FdManager : : cache_cleanup_lock ) ) ) {
S3FS_PRN_CRIT ( " failed to destroy cache_cleanup_lock: %d " , result ) ;
2020-08-22 12:40:53 +00:00
abort ( ) ;
}
2021-01-24 22:56:10 +00:00
if ( 0 ! = ( result = pthread_mutex_destroy ( & FdManager : : reserved_diskspace_lock ) ) ) {
S3FS_PRN_CRIT ( " failed to destroy reserved_diskspace_lock: %d " , result ) ;
2020-08-22 12:40:53 +00:00
abort ( ) ;
}
FdManager : : is_lock_init = false ;
}
} else {
2020-05-29 09:13:22 +00:00
abort ( ) ;
2013-05-28 05:54:09 +00:00
}
Changes codes for performance(part 3)
* Summay
This revision includes big change about temporary file and local cache file.
By this big change, s3fs works with good performance when s3fs opens/
closes/syncs/reads object.
I made a big change about the handling about temporary file and local cache
file to do this implementation.
* Detail
1) About temporary file(local file)
s3fs uses a temporary file on local file system when s3fs does download/
upload/open/seek object on S3.
After this revision, s3fs calls ftruncate() function when s3fs makes the
temporary file.
In this way s3fs can set a file size of precisely length without downloading.
(Notice - ftruncate function is for XSI-compliant systems, so that possibly
you have a problem on non-XSI-compliant systems.)
By this change, s3fs can download a part of a object by requesting with
"Range" http header. It seems like downloading by each block unit.
The default block(part) size is 50MB, it is caused the result which is default
parallel requests count(5) by default multipart upload size(10MB).
If you need to change this block size, you can change by new option
"fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes.
So that, you have to take care about that fdcache.cpp(and fdcache.h) were
changed a lot.
2) About local cache
Local cache files which are in directory specified by "use_cache" option do
not have always all of object data.
This cause is that s3fs uses ftruncate function and reads(writes) each block
unit of a temporary file.
s3fs manages each block unit's status which are "downloaded area" or "not".
For this status, s3fs makes new temporary file in cache directory which is
specified by "use_cache" option. This status files is in a directory which is
named "<use_cache sirectory>/.<bucket_name>/".
When s3fs opens this status file, s3fs locks this file for exclusive control by
calling flock function. You need to take care about this, the status files can
not be laid on network drive(like NFS).
This revision changes about file open mode, s3fs always opens a local cache
file and each status file with writable mode.
Last, this revision adds new option "del_cache", this option means that s3fs
deletes all local cache file when s3fs starts and exits.
3) Uploading
When s3fs writes data to file descriptor through FUSE request, old s3fs
revision downloads all of the object. But new revision does not download all,
it downloads only small percial area(some block units) including writing data
area.
And when s3fs closes or flushes the file descriptor, s3fs downloads other area
which is not downloaded from server. After that, s3fs uploads all of data.
Already r456 revision has parallel upload function, then this revision with
r456 and r457 are very big change for performance.
4) Downloading
By changing a temporary file and a local cache file, when s3fs downloads a
object, it downloads only the required range(some block units).
And s3fs downloads units by parallel GET request, it is same as a case of
uploading. (Maximum parallel request count and each download size are
specified same parameters for uploading.)
In the new revision, when s3fs opens file, s3fs returns file descriptor soon.
Because s3fs only opens(makes) the file descriptor with no downloading
data. And when s3fs reads a data, s3fs downloads only some block unit
including specified area.
This result is good for performance.
5) Changes option name
The option "parallel_upload" which added at r456 is changed to new option
name as "parallel_count". This reason is this option value is not only used by
uploading object, but a uploading object also uses this option. (For a while,
you can use old option name "parallel_upload" for compatibility.)
git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
2013-05-28 05:54:09 +00:00
2021-05-23 16:55:25 +00:00
FdEntity * FdManager : : GetFdEntity ( const char * path , int & existfd , bool newfd , bool lock_already_held )
Changes codes for performance(part 3)
* Summay
This revision includes big change about temporary file and local cache file.
By this big change, s3fs works with good performance when s3fs opens/
closes/syncs/reads object.
I made a big change about the handling about temporary file and local cache
file to do this implementation.
* Detail
1) About temporary file(local file)
s3fs uses a temporary file on local file system when s3fs does download/
upload/open/seek object on S3.
After this revision, s3fs calls ftruncate() function when s3fs makes the
temporary file.
In this way s3fs can set a file size of precisely length without downloading.
(Notice - ftruncate function is for XSI-compliant systems, so that possibly
you have a problem on non-XSI-compliant systems.)
By this change, s3fs can download a part of a object by requesting with
"Range" http header. It seems like downloading by each block unit.
The default block(part) size is 50MB, it is caused the result which is default
parallel requests count(5) by default multipart upload size(10MB).
If you need to change this block size, you can change by new option
"fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes.
So that, you have to take care about that fdcache.cpp(and fdcache.h) were
changed a lot.
2) About local cache
Local cache files which are in directory specified by "use_cache" option do
not have always all of object data.
This cause is that s3fs uses ftruncate function and reads(writes) each block
unit of a temporary file.
s3fs manages each block unit's status which are "downloaded area" or "not".
For this status, s3fs makes new temporary file in cache directory which is
specified by "use_cache" option. This status files is in a directory which is
named "<use_cache sirectory>/.<bucket_name>/".
When s3fs opens this status file, s3fs locks this file for exclusive control by
calling flock function. You need to take care about this, the status files can
not be laid on network drive(like NFS).
This revision changes about file open mode, s3fs always opens a local cache
file and each status file with writable mode.
Last, this revision adds new option "del_cache", this option means that s3fs
deletes all local cache file when s3fs starts and exits.
3) Uploading
When s3fs writes data to file descriptor through FUSE request, old s3fs
revision downloads all of the object. But new revision does not download all,
it downloads only small percial area(some block units) including writing data
area.
And when s3fs closes or flushes the file descriptor, s3fs downloads other area
which is not downloaded from server. After that, s3fs uploads all of data.
Already r456 revision has parallel upload function, then this revision with
r456 and r457 are very big change for performance.
4) Downloading
By changing a temporary file and a local cache file, when s3fs downloads a
object, it downloads only the required range(some block units).
And s3fs downloads units by parallel GET request, it is same as a case of
uploading. (Maximum parallel request count and each download size are
specified same parameters for uploading.)
In the new revision, when s3fs opens file, s3fs returns file descriptor soon.
Because s3fs only opens(makes) the file descriptor with no downloading
data. And when s3fs reads a data, s3fs downloads only some block unit
including specified area.
This result is good for performance.
5) Changes option name
The option "parallel_upload" which added at r456 is changed to new option
name as "parallel_count". This reason is this option value is not only used by
uploading object, but a uploading object also uses this option. (For a while,
you can use old option name "parallel_upload" for compatibility.)
git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
{
2020-08-22 12:40:53 +00:00
S3FS_PRN_INFO3 ( " [path=%s][fd=%d] " , SAFESTRPTR ( path ) , existfd ) ;
Changes codes for performance(part 3)
* Summay
This revision includes big change about temporary file and local cache file.
By this big change, s3fs works with good performance when s3fs opens/
closes/syncs/reads object.
I made a big change about the handling about temporary file and local cache
file to do this implementation.
* Detail
1) About temporary file(local file)
s3fs uses a temporary file on local file system when s3fs does download/
upload/open/seek object on S3.
After this revision, s3fs calls ftruncate() function when s3fs makes the
temporary file.
In this way s3fs can set a file size of precisely length without downloading.
(Notice - ftruncate function is for XSI-compliant systems, so that possibly
you have a problem on non-XSI-compliant systems.)
By this change, s3fs can download a part of a object by requesting with
"Range" http header. It seems like downloading by each block unit.
The default block(part) size is 50MB, it is caused the result which is default
parallel requests count(5) by default multipart upload size(10MB).
If you need to change this block size, you can change by new option
"fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes.
So that, you have to take care about that fdcache.cpp(and fdcache.h) were
changed a lot.
2) About local cache
Local cache files which are in directory specified by "use_cache" option do
not have always all of object data.
This cause is that s3fs uses ftruncate function and reads(writes) each block
unit of a temporary file.
s3fs manages each block unit's status which are "downloaded area" or "not".
For this status, s3fs makes new temporary file in cache directory which is
specified by "use_cache" option. This status files is in a directory which is
named "<use_cache sirectory>/.<bucket_name>/".
When s3fs opens this status file, s3fs locks this file for exclusive control by
calling flock function. You need to take care about this, the status files can
not be laid on network drive(like NFS).
This revision changes about file open mode, s3fs always opens a local cache
file and each status file with writable mode.
Last, this revision adds new option "del_cache", this option means that s3fs
deletes all local cache file when s3fs starts and exits.
3) Uploading
When s3fs writes data to file descriptor through FUSE request, old s3fs
revision downloads all of the object. But new revision does not download all,
it downloads only small percial area(some block units) including writing data
area.
And when s3fs closes or flushes the file descriptor, s3fs downloads other area
which is not downloaded from server. After that, s3fs uploads all of data.
Already r456 revision has parallel upload function, then this revision with
r456 and r457 are very big change for performance.
4) Downloading
By changing a temporary file and a local cache file, when s3fs downloads a
object, it downloads only the required range(some block units).
And s3fs downloads units by parallel GET request, it is same as a case of
uploading. (Maximum parallel request count and each download size are
specified same parameters for uploading.)
In the new revision, when s3fs opens file, s3fs returns file descriptor soon.
Because s3fs only opens(makes) the file descriptor with no downloading
data. And when s3fs reads a data, s3fs downloads only some block unit
including specified area.
This result is good for performance.
5) Changes option name
The option "parallel_upload" which added at r456 is changed to new option
name as "parallel_count". This reason is this option value is not only used by
uploading object, but a uploading object also uses this option. (For a while,
you can use old option name "parallel_upload" for compatibility.)
git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
2020-08-22 12:40:53 +00:00
if ( ! path | | ' \0 ' = = path [ 0 ] ) {
return NULL ;
}
2021-05-23 16:55:25 +00:00
AutoLock auto_lock ( & FdManager : : fd_manager_lock , lock_already_held ? AutoLock : : ALREADY_LOCKED : AutoLock : : NONE ) ;
2020-08-22 12:40:53 +00:00
2020-09-11 09:37:24 +00:00
fdent_map_t : : iterator iter = fent . find ( std : : string ( path ) ) ;
2021-05-23 16:28:50 +00:00
if ( fent . end ( ) ! = iter & & iter - > second ) {
if ( - 1 = = existfd ) {
if ( newfd ) {
existfd = iter - > second - > OpenPseudoFd ( O_RDWR ) ; // [NOTE] O_RDWR flags
}
return iter - > second ;
} else if ( iter - > second - > FindPseudoFd ( existfd ) ) {
if ( newfd ) {
existfd = iter - > second - > Dup ( existfd ) ;
}
return iter - > second ;
2020-09-13 07:49:25 +00:00
}
2020-08-22 12:40:53 +00:00
}
if ( - 1 ! = existfd ) {
for ( iter = fent . begin ( ) ; iter ! = fent . end ( ) ; + + iter ) {
2021-05-23 16:28:50 +00:00
if ( iter - > second & & iter - > second - > FindPseudoFd ( existfd ) ) {
2020-08-22 12:40:53 +00:00
// found opened fd in map
2021-05-23 16:28:50 +00:00
if ( 0 = = strcmp ( iter - > second - > GetPath ( ) , path ) ) {
if ( newfd ) {
existfd = iter - > second - > Dup ( existfd ) ;
2020-09-13 07:49:25 +00:00
}
2021-05-23 16:28:50 +00:00
return iter - > second ;
2020-08-22 12:40:53 +00:00
}
// found fd, but it is used another file(file descriptor is recycled)
// so returns NULL.
break ;
}
2015-03-04 08:48:37 +00:00
}
}
2021-05-23 16:28:50 +00:00
// If the cache directory is not specified, s3fs opens a temporary file
// when the file is opened.
if ( ! FdManager : : IsCacheDir ( ) ) {
for ( iter = fent . begin ( ) ; iter ! = fent . end ( ) ; + + iter ) {
if ( iter - > second & & iter - > second - > IsOpen ( ) & & 0 = = strcmp ( iter - > second - > GetPath ( ) , path ) ) {
return iter - > second ;
}
}
}
2020-08-22 12:40:53 +00:00
return NULL ;
2013-05-28 05:54:09 +00:00
}
2021-05-23 16:28:50 +00:00
FdEntity * FdManager : : Open ( int & fd , const char * path , headers_t * pmeta , off_t size , time_t time , int flags , bool force_tmpfile , bool is_create , bool no_fd_lock_wait )
2013-05-28 05:54:09 +00:00
{
2021-05-23 16:28:50 +00:00
S3FS_PRN_DBG ( " [path=%s][size=%lld][time=%lld][flags=0x%x] " , SAFESTRPTR ( path ) , static_cast < long long > ( size ) , static_cast < long long > ( time ) , flags ) ;
Changes codes for performance(part 3)
* Summay
This revision includes big change about temporary file and local cache file.
By this big change, s3fs works with good performance when s3fs opens/
closes/syncs/reads object.
I made a big change about the handling about temporary file and local cache
file to do this implementation.
* Detail
1) About temporary file(local file)
s3fs uses a temporary file on local file system when s3fs does download/
upload/open/seek object on S3.
After this revision, s3fs calls ftruncate() function when s3fs makes the
temporary file.
In this way s3fs can set a file size of precisely length without downloading.
(Notice - ftruncate function is for XSI-compliant systems, so that possibly
you have a problem on non-XSI-compliant systems.)
By this change, s3fs can download a part of a object by requesting with
"Range" http header. It seems like downloading by each block unit.
The default block(part) size is 50MB, it is caused the result which is default
parallel requests count(5) by default multipart upload size(10MB).
If you need to change this block size, you can change by new option
"fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes.
So that, you have to take care about that fdcache.cpp(and fdcache.h) were
changed a lot.
2) About local cache
Local cache files which are in directory specified by "use_cache" option do
not have always all of object data.
This cause is that s3fs uses ftruncate function and reads(writes) each block
unit of a temporary file.
s3fs manages each block unit's status which are "downloaded area" or "not".
For this status, s3fs makes new temporary file in cache directory which is
specified by "use_cache" option. This status files is in a directory which is
named "<use_cache sirectory>/.<bucket_name>/".
When s3fs opens this status file, s3fs locks this file for exclusive control by
calling flock function. You need to take care about this, the status files can
not be laid on network drive(like NFS).
This revision changes about file open mode, s3fs always opens a local cache
file and each status file with writable mode.
Last, this revision adds new option "del_cache", this option means that s3fs
deletes all local cache file when s3fs starts and exits.
3) Uploading
When s3fs writes data to file descriptor through FUSE request, old s3fs
revision downloads all of the object. But new revision does not download all,
it downloads only small percial area(some block units) including writing data
area.
And when s3fs closes or flushes the file descriptor, s3fs downloads other area
which is not downloaded from server. After that, s3fs uploads all of data.
Already r456 revision has parallel upload function, then this revision with
r456 and r457 are very big change for performance.
4) Downloading
By changing a temporary file and a local cache file, when s3fs downloads a
object, it downloads only the required range(some block units).
And s3fs downloads units by parallel GET request, it is same as a case of
uploading. (Maximum parallel request count and each download size are
specified same parameters for uploading.)
In the new revision, when s3fs opens file, s3fs returns file descriptor soon.
Because s3fs only opens(makes) the file descriptor with no downloading
data. And when s3fs reads a data, s3fs downloads only some block unit
including specified area.
This result is good for performance.
5) Changes option name
The option "parallel_upload" which added at r456 is changed to new option
name as "parallel_count". This reason is this option value is not only used by
uploading object, but a uploading object also uses this option. (For a while,
you can use old option name "parallel_upload" for compatibility.)
git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
2020-08-22 12:40:53 +00:00
if ( ! path | | ' \0 ' = = path [ 0 ] ) {
return NULL ;
2019-08-11 07:42:48 +00:00
}
2021-05-23 16:55:25 +00:00
2020-08-22 12:40:53 +00:00
AutoLock auto_lock ( & FdManager : : fd_manager_lock ) ;
// search in mapping by key(path)
2020-09-11 09:37:24 +00:00
fdent_map_t : : iterator iter = fent . find ( std : : string ( path ) ) ;
2020-08-22 12:40:53 +00:00
if ( fent . end ( ) = = iter & & ! force_tmpfile & & ! FdManager : : IsCacheDir ( ) ) {
// If the cache directory is not specified, s3fs opens a temporary file
// when the file is opened.
// Then if it could not find a entity in map for the file, s3fs should
// search a entity in all which opened the temporary file.
//
for ( iter = fent . begin ( ) ; iter ! = fent . end ( ) ; + + iter ) {
2021-05-23 16:28:50 +00:00
if ( iter - > second & & iter - > second - > IsOpen ( ) & & 0 = = strcmp ( iter - > second - > GetPath ( ) , path ) ) {
2020-08-22 12:40:53 +00:00
break ; // found opened fd in mapping
}
}
2019-07-13 04:03:27 +00:00
}
Changes codes for performance(part 3)
* Summay
This revision includes big change about temporary file and local cache file.
By this big change, s3fs works with good performance when s3fs opens/
closes/syncs/reads object.
I made a big change about the handling about temporary file and local cache
file to do this implementation.
* Detail
1) About temporary file(local file)
s3fs uses a temporary file on local file system when s3fs does download/
upload/open/seek object on S3.
After this revision, s3fs calls ftruncate() function when s3fs makes the
temporary file.
In this way s3fs can set a file size of precisely length without downloading.
(Notice - ftruncate function is for XSI-compliant systems, so that possibly
you have a problem on non-XSI-compliant systems.)
By this change, s3fs can download a part of a object by requesting with
"Range" http header. It seems like downloading by each block unit.
The default block(part) size is 50MB, it is caused the result which is default
parallel requests count(5) by default multipart upload size(10MB).
If you need to change this block size, you can change by new option
"fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes.
So that, you have to take care about that fdcache.cpp(and fdcache.h) were
changed a lot.
2) About local cache
Local cache files which are in directory specified by "use_cache" option do
not have always all of object data.
This cause is that s3fs uses ftruncate function and reads(writes) each block
unit of a temporary file.
s3fs manages each block unit's status which are "downloaded area" or "not".
For this status, s3fs makes new temporary file in cache directory which is
specified by "use_cache" option. This status files is in a directory which is
named "<use_cache sirectory>/.<bucket_name>/".
When s3fs opens this status file, s3fs locks this file for exclusive control by
calling flock function. You need to take care about this, the status files can
not be laid on network drive(like NFS).
This revision changes about file open mode, s3fs always opens a local cache
file and each status file with writable mode.
Last, this revision adds new option "del_cache", this option means that s3fs
deletes all local cache file when s3fs starts and exits.
3) Uploading
When s3fs writes data to file descriptor through FUSE request, old s3fs
revision downloads all of the object. But new revision does not download all,
it downloads only small percial area(some block units) including writing data
area.
And when s3fs closes or flushes the file descriptor, s3fs downloads other area
which is not downloaded from server. After that, s3fs uploads all of data.
Already r456 revision has parallel upload function, then this revision with
r456 and r457 are very big change for performance.
4) Downloading
By changing a temporary file and a local cache file, when s3fs downloads a
object, it downloads only the required range(some block units).
And s3fs downloads units by parallel GET request, it is same as a case of
uploading. (Maximum parallel request count and each download size are
specified same parameters for uploading.)
In the new revision, when s3fs opens file, s3fs returns file descriptor soon.
Because s3fs only opens(makes) the file descriptor with no downloading
data. And when s3fs reads a data, s3fs downloads only some block unit
including specified area.
This result is good for performance.
5) Changes option name
The option "parallel_upload" which added at r456 is changed to new option
name as "parallel_count". This reason is this option value is not only used by
uploading object, but a uploading object also uses this option. (For a while,
you can use old option name "parallel_upload" for compatibility.)
git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
2021-05-23 16:28:50 +00:00
FdEntity * ent ;
2020-08-22 12:40:53 +00:00
if ( fent . end ( ) ! = iter ) {
// found
2021-05-23 16:28:50 +00:00
ent = iter - > second ;
2020-08-22 12:40:53 +00:00
if ( ent - > IsModified ( ) ) {
2021-04-24 03:27:39 +00:00
// If the file is being modified and it's size is larger than size parameter, it will not be resized.
off_t cur_size = 0 ;
if ( ent - > GetSize ( cur_size ) & & size < = cur_size ) {
size = - 1 ;
}
2020-08-22 12:40:53 +00:00
}
2021-05-23 16:28:50 +00:00
// (re)open
2021-05-23 16:55:25 +00:00
if ( - 1 = = ( fd = ent - > Open ( pmeta , size , time , flags , no_fd_lock_wait ? AutoLock : : NO_WAIT : AutoLock : : NONE ) ) ) {
2021-05-23 16:28:50 +00:00
S3FS_PRN_ERR ( " failed to (re)open and create new pseudo fd for path(%s). " , path ) ;
return NULL ;
}
2020-08-22 12:40:53 +00:00
} else if ( is_create ) {
// not found
2020-09-11 09:37:24 +00:00
std : : string cache_path ;
2020-08-22 12:40:53 +00:00
if ( ! force_tmpfile & & ! FdManager : : MakeCachePath ( path , cache_path , true ) ) {
S3FS_PRN_ERR ( " failed to make cache path for object(%s). " , path ) ;
return NULL ;
}
// make new obj
ent = new FdEntity ( path , cache_path . c_str ( ) ) ;
2021-05-23 16:28:50 +00:00
// open
2021-05-23 16:55:25 +00:00
if ( - 1 = = ( fd = ent - > Open ( pmeta , size , time , flags , no_fd_lock_wait ? AutoLock : : NO_WAIT : AutoLock : : NONE ) ) ) {
2021-05-23 16:28:50 +00:00
delete ent ;
return NULL ;
}
2020-08-22 12:40:53 +00:00
if ( ! cache_path . empty ( ) ) {
// using cache
2020-09-11 09:37:24 +00:00
fent [ std : : string ( path ) ] = ent ;
2020-08-22 12:40:53 +00:00
} else {
// not using cache, so the key of fdentity is set not really existing path.
// (but not strictly unexisting path.)
//
// [NOTE]
// The reason why this process here, please look at the definition of the
// comments of NOCACHE_PATH_PREFIX_FORM symbol.
//
2020-09-11 09:37:24 +00:00
std : : string tmppath ;
2020-08-22 12:40:53 +00:00
FdManager : : MakeRandomTempPath ( path , tmppath ) ;
fent [ tmppath ] = ent ;
}
2021-05-23 16:28:50 +00:00
} else {
2020-08-22 12:40:53 +00:00
return NULL ;
}
return ent ;
Changes codes for performance(part 3)
* Summay
This revision includes big change about temporary file and local cache file.
By this big change, s3fs works with good performance when s3fs opens/
closes/syncs/reads object.
I made a big change about the handling about temporary file and local cache
file to do this implementation.
* Detail
1) About temporary file(local file)
s3fs uses a temporary file on local file system when s3fs does download/
upload/open/seek object on S3.
After this revision, s3fs calls ftruncate() function when s3fs makes the
temporary file.
In this way s3fs can set a file size of precisely length without downloading.
(Notice - ftruncate function is for XSI-compliant systems, so that possibly
you have a problem on non-XSI-compliant systems.)
By this change, s3fs can download a part of a object by requesting with
"Range" http header. It seems like downloading by each block unit.
The default block(part) size is 50MB, it is caused the result which is default
parallel requests count(5) by default multipart upload size(10MB).
If you need to change this block size, you can change by new option
"fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes.
So that, you have to take care about that fdcache.cpp(and fdcache.h) were
changed a lot.
2) About local cache
Local cache files which are in directory specified by "use_cache" option do
not have always all of object data.
This cause is that s3fs uses ftruncate function and reads(writes) each block
unit of a temporary file.
s3fs manages each block unit's status which are "downloaded area" or "not".
For this status, s3fs makes new temporary file in cache directory which is
specified by "use_cache" option. This status files is in a directory which is
named "<use_cache sirectory>/.<bucket_name>/".
When s3fs opens this status file, s3fs locks this file for exclusive control by
calling flock function. You need to take care about this, the status files can
not be laid on network drive(like NFS).
This revision changes about file open mode, s3fs always opens a local cache
file and each status file with writable mode.
Last, this revision adds new option "del_cache", this option means that s3fs
deletes all local cache file when s3fs starts and exits.
3) Uploading
When s3fs writes data to file descriptor through FUSE request, old s3fs
revision downloads all of the object. But new revision does not download all,
it downloads only small percial area(some block units) including writing data
area.
And when s3fs closes or flushes the file descriptor, s3fs downloads other area
which is not downloaded from server. After that, s3fs uploads all of data.
Already r456 revision has parallel upload function, then this revision with
r456 and r457 are very big change for performance.
4) Downloading
By changing a temporary file and a local cache file, when s3fs downloads a
object, it downloads only the required range(some block units).
And s3fs downloads units by parallel GET request, it is same as a case of
uploading. (Maximum parallel request count and each download size are
specified same parameters for uploading.)
In the new revision, when s3fs opens file, s3fs returns file descriptor soon.
Because s3fs only opens(makes) the file descriptor with no downloading
data. And when s3fs reads a data, s3fs downloads only some block unit
including specified area.
This result is good for performance.
5) Changes option name
The option "parallel_upload" which added at r456 is changed to new option
name as "parallel_count". This reason is this option value is not only used by
uploading object, but a uploading object also uses this option. (For a while,
you can use old option name "parallel_upload" for compatibility.)
git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
2021-05-23 16:28:50 +00:00
// [NOTE]
// This method does not create a new pseudo fd.
// It just finds existfd and returns the corresponding entity.
//
2021-05-23 16:55:25 +00:00
FdEntity * FdManager : : GetExistFdEntity ( const char * path , int existfd )
2015-03-04 08:48:37 +00:00
{
2021-05-23 16:28:50 +00:00
S3FS_PRN_DBG ( " [path=%s][existfd=%d] " , SAFESTRPTR ( path ) , existfd ) ;
2015-03-04 08:48:37 +00:00
2021-05-23 16:28:50 +00:00
AutoLock auto_lock ( & FdManager : : fd_manager_lock ) ;
2015-03-04 08:48:37 +00:00
2021-05-23 16:28:50 +00:00
// search from all entity.
for ( fdent_map_t : : iterator iter = fent . begin ( ) ; iter ! = fent . end ( ) ; + + iter ) {
if ( iter - > second & & iter - > second - > FindPseudoFd ( existfd ) ) {
// found existfd in entity
return iter - > second ;
2015-03-04 08:48:37 +00:00
}
}
2021-05-23 16:28:50 +00:00
// not found entity
return NULL ;
}
FdEntity * FdManager : : OpenExistFdEntiy ( const char * path , int & fd , int flags )
{
S3FS_PRN_DBG ( " [path=%s][flags=0x%x] " , SAFESTRPTR ( path ) , flags ) ;
// search entity by path, and create pseudo fd
FdEntity * ent = Open ( fd , path , NULL , - 1 , - 1 , flags , false , false ) ;
if ( ! ent ) {
// Not found entity
return NULL ;
}
2020-08-22 12:40:53 +00:00
return ent ;
2015-03-04 08:48:37 +00:00
}
2015-01-12 22:46:24 +00:00
void FdManager : : Rename ( const std : : string & from , const std : : string & to )
{
2020-08-22 12:40:53 +00:00
AutoLock auto_lock ( & FdManager : : fd_manager_lock ) ;
fdent_map_t : : iterator iter = fent . find ( from ) ;
if ( fent . end ( ) = = iter & & ! FdManager : : IsCacheDir ( ) ) {
// If the cache directory is not specified, s3fs opens a temporary file
// when the file is opened.
// Then if it could not find a entity in map for the file, s3fs should
// search a entity in all which opened the temporary file.
//
for ( iter = fent . begin ( ) ; iter ! = fent . end ( ) ; + + iter ) {
2021-05-23 16:28:50 +00:00
if ( iter - > second & & iter - > second - > IsOpen ( ) & & 0 = = strcmp ( iter - > second - > GetPath ( ) , from . c_str ( ) ) ) {
2020-08-22 12:40:53 +00:00
break ; // found opened fd in mapping
}
}
2019-10-22 15:09:19 +00:00
}
2020-08-22 12:40:53 +00:00
if ( fent . end ( ) ! = iter ) {
// found
S3FS_PRN_DBG ( " [from=%s][to=%s] " , from . c_str ( ) , to . c_str ( ) ) ;
2019-10-22 15:09:19 +00:00
2021-05-23 16:28:50 +00:00
FdEntity * ent = iter - > second ;
2019-10-22 15:09:19 +00:00
2020-08-22 12:40:53 +00:00
// retrieve old fd entity from map
fent . erase ( iter ) ;
2019-10-22 15:09:19 +00:00
2020-08-22 12:40:53 +00:00
// rename path and caches in fd entity
2020-09-11 09:37:24 +00:00
std : : string fentmapkey ;
2020-08-22 12:40:53 +00:00
if ( ! ent - > RenamePath ( to , fentmapkey ) ) {
S3FS_PRN_ERR ( " Failed to rename FdEntity object for %s to %s " , from . c_str ( ) , to . c_str ( ) ) ;
return ;
}
2019-10-22 15:09:19 +00:00
2020-08-22 12:40:53 +00:00
// set new fd entity to map
fent [ fentmapkey ] = ent ;
}
2015-01-12 22:46:24 +00:00
}
2021-05-23 16:28:50 +00:00
bool FdManager : : Close ( FdEntity * ent , int fd )
Changes codes for performance(part 3)
* Summay
This revision includes big change about temporary file and local cache file.
By this big change, s3fs works with good performance when s3fs opens/
closes/syncs/reads object.
I made a big change about the handling about temporary file and local cache
file to do this implementation.
* Detail
1) About temporary file(local file)
s3fs uses a temporary file on local file system when s3fs does download/
upload/open/seek object on S3.
After this revision, s3fs calls ftruncate() function when s3fs makes the
temporary file.
In this way s3fs can set a file size of precisely length without downloading.
(Notice - ftruncate function is for XSI-compliant systems, so that possibly
you have a problem on non-XSI-compliant systems.)
By this change, s3fs can download a part of a object by requesting with
"Range" http header. It seems like downloading by each block unit.
The default block(part) size is 50MB, it is caused the result which is default
parallel requests count(5) by default multipart upload size(10MB).
If you need to change this block size, you can change by new option
"fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes.
So that, you have to take care about that fdcache.cpp(and fdcache.h) were
changed a lot.
2) About local cache
Local cache files which are in directory specified by "use_cache" option do
not have always all of object data.
This cause is that s3fs uses ftruncate function and reads(writes) each block
unit of a temporary file.
s3fs manages each block unit's status which are "downloaded area" or "not".
For this status, s3fs makes new temporary file in cache directory which is
specified by "use_cache" option. This status files is in a directory which is
named "<use_cache sirectory>/.<bucket_name>/".
When s3fs opens this status file, s3fs locks this file for exclusive control by
calling flock function. You need to take care about this, the status files can
not be laid on network drive(like NFS).
This revision changes about file open mode, s3fs always opens a local cache
file and each status file with writable mode.
Last, this revision adds new option "del_cache", this option means that s3fs
deletes all local cache file when s3fs starts and exits.
3) Uploading
When s3fs writes data to file descriptor through FUSE request, old s3fs
revision downloads all of the object. But new revision does not download all,
it downloads only small percial area(some block units) including writing data
area.
And when s3fs closes or flushes the file descriptor, s3fs downloads other area
which is not downloaded from server. After that, s3fs uploads all of data.
Already r456 revision has parallel upload function, then this revision with
r456 and r457 are very big change for performance.
4) Downloading
By changing a temporary file and a local cache file, when s3fs downloads a
object, it downloads only the required range(some block units).
And s3fs downloads units by parallel GET request, it is same as a case of
uploading. (Maximum parallel request count and each download size are
specified same parameters for uploading.)
In the new revision, when s3fs opens file, s3fs returns file descriptor soon.
Because s3fs only opens(makes) the file descriptor with no downloading
data. And when s3fs reads a data, s3fs downloads only some block unit
including specified area.
This result is good for performance.
5) Changes option name
The option "parallel_upload" which added at r456 is changed to new option
name as "parallel_count". This reason is this option value is not only used by
uploading object, but a uploading object also uses this option. (For a while,
you can use old option name "parallel_upload" for compatibility.)
git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
{
2021-05-23 16:28:50 +00:00
S3FS_PRN_DBG ( " [ent->file=%s][pseudo_fd=%d] " , ent ? ent - > GetPath ( ) : " " , fd ) ;
2016-11-19 20:09:35 +00:00
2021-05-23 16:28:50 +00:00
if ( ! ent | | - 1 = = fd ) {
2020-08-22 12:40:53 +00:00
return true ; // returns success
}
AutoLock auto_lock ( & FdManager : : fd_manager_lock ) ;
2016-11-19 20:09:35 +00:00
2020-08-22 12:40:53 +00:00
for ( fdent_map_t : : iterator iter = fent . begin ( ) ; iter ! = fent . end ( ) ; + + iter ) {
2021-05-23 16:28:50 +00:00
if ( iter - > second = = ent ) {
ent - > Close ( fd ) ;
2020-08-22 12:40:53 +00:00
if ( ! ent - > IsOpen ( ) ) {
// remove found entity from map.
fent . erase ( iter + + ) ;
// check another key name for entity value to be on the safe side
for ( ; iter ! = fent . end ( ) ; ) {
2021-05-23 16:28:50 +00:00
if ( iter - > second = = ent ) {
2020-08-22 12:40:53 +00:00
fent . erase ( iter + + ) ;
} else {
+ + iter ;
}
}
delete ent ;
2021-05-23 16:28:50 +00:00
}
return true ;
2016-11-19 20:09:35 +00:00
}
Changes codes for performance(part 3)
* Summay
This revision includes big change about temporary file and local cache file.
By this big change, s3fs works with good performance when s3fs opens/
closes/syncs/reads object.
I made a big change about the handling about temporary file and local cache
file to do this implementation.
* Detail
1) About temporary file(local file)
s3fs uses a temporary file on local file system when s3fs does download/
upload/open/seek object on S3.
After this revision, s3fs calls ftruncate() function when s3fs makes the
temporary file.
In this way s3fs can set a file size of precisely length without downloading.
(Notice - ftruncate function is for XSI-compliant systems, so that possibly
you have a problem on non-XSI-compliant systems.)
By this change, s3fs can download a part of a object by requesting with
"Range" http header. It seems like downloading by each block unit.
The default block(part) size is 50MB, it is caused the result which is default
parallel requests count(5) by default multipart upload size(10MB).
If you need to change this block size, you can change by new option
"fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes.
So that, you have to take care about that fdcache.cpp(and fdcache.h) were
changed a lot.
2) About local cache
Local cache files which are in directory specified by "use_cache" option do
not have always all of object data.
This cause is that s3fs uses ftruncate function and reads(writes) each block
unit of a temporary file.
s3fs manages each block unit's status which are "downloaded area" or "not".
For this status, s3fs makes new temporary file in cache directory which is
specified by "use_cache" option. This status files is in a directory which is
named "<use_cache sirectory>/.<bucket_name>/".
When s3fs opens this status file, s3fs locks this file for exclusive control by
calling flock function. You need to take care about this, the status files can
not be laid on network drive(like NFS).
This revision changes about file open mode, s3fs always opens a local cache
file and each status file with writable mode.
Last, this revision adds new option "del_cache", this option means that s3fs
deletes all local cache file when s3fs starts and exits.
3) Uploading
When s3fs writes data to file descriptor through FUSE request, old s3fs
revision downloads all of the object. But new revision does not download all,
it downloads only small percial area(some block units) including writing data
area.
And when s3fs closes or flushes the file descriptor, s3fs downloads other area
which is not downloaded from server. After that, s3fs uploads all of data.
Already r456 revision has parallel upload function, then this revision with
r456 and r457 are very big change for performance.
4) Downloading
By changing a temporary file and a local cache file, when s3fs downloads a
object, it downloads only the required range(some block units).
And s3fs downloads units by parallel GET request, it is same as a case of
uploading. (Maximum parallel request count and each download size are
specified same parameters for uploading.)
In the new revision, when s3fs opens file, s3fs returns file descriptor soon.
Because s3fs only opens(makes) the file descriptor with no downloading
data. And when s3fs reads a data, s3fs downloads only some block unit
including specified area.
This result is good for performance.
5) Changes option name
The option "parallel_upload" which added at r456 is changed to new option
name as "parallel_count". This reason is this option value is not only used by
uploading object, but a uploading object also uses this option. (For a while,
you can use old option name "parallel_upload" for compatibility.)
git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
}
2020-08-22 12:40:53 +00:00
return false ;
2013-05-28 05:54:09 +00:00
}
2015-10-18 17:03:41 +00:00
bool FdManager : : ChangeEntityToTempPath ( FdEntity * ent , const char * path )
{
2020-08-22 12:40:53 +00:00
AutoLock auto_lock ( & FdManager : : fd_manager_lock ) ;
2015-10-18 17:03:41 +00:00
2020-08-22 12:40:53 +00:00
for ( fdent_map_t : : iterator iter = fent . begin ( ) ; iter ! = fent . end ( ) ; ) {
2021-05-23 16:28:50 +00:00
if ( iter - > second = = ent ) {
2020-08-22 12:40:53 +00:00
fent . erase ( iter + + ) ;
2015-10-18 17:03:41 +00:00
2020-09-11 09:37:24 +00:00
std : : string tmppath ;
2020-08-22 12:40:53 +00:00
FdManager : : MakeRandomTempPath ( path , tmppath ) ;
fent [ tmppath ] = ent ;
} else {
+ + iter ;
}
2015-10-18 17:03:41 +00:00
}
2020-08-22 12:40:53 +00:00
return false ;
2015-10-18 17:03:41 +00:00
}
2017-04-02 07:22:12 +00:00
void FdManager : : CleanupCacheDir ( )
{
2020-08-22 12:40:53 +00:00
//S3FS_PRN_DBG("cache cleanup requested");
if ( ! FdManager : : IsCacheDir ( ) ) {
return ;
}
AutoLock auto_lock_no_wait ( & FdManager : : cache_cleanup_lock , AutoLock : : NO_WAIT ) ;
if ( auto_lock_no_wait . isLockAcquired ( ) ) {
//S3FS_PRN_DBG("cache cleanup started");
CleanupCacheDirInternal ( " " ) ;
//S3FS_PRN_DBG("cache cleanup ended");
} else {
// wait for other thread to finish cache cleanup
AutoLock auto_lock ( & FdManager : : cache_cleanup_lock ) ;
}
2017-04-02 07:22:12 +00:00
}
void FdManager : : CleanupCacheDirInternal ( const std : : string & path )
{
2020-08-22 12:40:53 +00:00
DIR * dp ;
struct dirent * dent ;
std : : string abs_path = cache_dir + " / " + bucket + path ;
if ( NULL = = ( dp = opendir ( abs_path . c_str ( ) ) ) ) {
S3FS_PRN_ERR ( " could not open cache dir(%s) - errno(%d) " , abs_path . c_str ( ) , errno ) ;
return ;
2017-04-02 07:22:12 +00:00
}
2020-08-22 12:40:53 +00:00
for ( dent = readdir ( dp ) ; dent ; dent = readdir ( dp ) ) {
if ( 0 = = strcmp ( dent - > d_name , " .. " ) | | 0 = = strcmp ( dent - > d_name , " . " ) ) {
continue ;
}
2020-09-11 09:37:24 +00:00
std : : string fullpath = abs_path ;
2020-08-22 12:40:53 +00:00
fullpath + = " / " ;
fullpath + = dent - > d_name ;
struct stat st ;
if ( 0 ! = lstat ( fullpath . c_str ( ) , & st ) ) {
S3FS_PRN_ERR ( " could not get stats of file(%s) - errno(%d) " , fullpath . c_str ( ) , errno ) ;
closedir ( dp ) ;
return ;
}
2020-09-11 09:37:24 +00:00
std : : string next_path = path + " / " + dent - > d_name ;
2020-08-22 12:40:53 +00:00
if ( S_ISDIR ( st . st_mode ) ) {
CleanupCacheDirInternal ( next_path ) ;
} else {
AutoLock auto_lock ( & FdManager : : fd_manager_lock , AutoLock : : NO_WAIT ) ;
if ( ! auto_lock . isLockAcquired ( ) ) {
S3FS_PRN_ERR ( " could not get fd_manager_lock when clean up file(%s) " , next_path . c_str ( ) ) ;
continue ;
}
fdent_map_t : : iterator iter = fent . find ( next_path ) ;
if ( fent . end ( ) = = iter ) {
S3FS_PRN_DBG ( " cleaned up: %s " , next_path . c_str ( ) ) ;
FdManager : : DeleteCacheFile ( next_path . c_str ( ) ) ;
}
}
2017-04-02 07:22:12 +00:00
}
2020-08-22 12:40:53 +00:00
closedir ( dp ) ;
2017-04-02 07:22:12 +00:00
}
2019-06-14 09:04:57 +00:00
bool FdManager : : ReserveDiskSpace ( off_t size )
2018-01-29 11:19:39 +00:00
{
2020-08-22 12:40:53 +00:00
if ( IsSafeDiskSpace ( NULL , size ) ) {
AutoLock auto_lock ( & FdManager : : reserved_diskspace_lock ) ;
free_disk_space + = size ;
return true ;
}
return false ;
2018-01-29 11:19:39 +00:00
}
2019-06-14 09:04:57 +00:00
void FdManager : : FreeReservedDiskSpace ( off_t size )
2018-01-29 11:19:39 +00:00
{
2020-08-22 12:40:53 +00:00
AutoLock auto_lock ( & FdManager : : reserved_diskspace_lock ) ;
free_disk_space - = size ;
2018-01-29 11:19:39 +00:00
}
2020-06-28 08:00:41 +00:00
//
// Inspect all files for stats file for cache file
//
// [NOTE]
// The minimum sub_path parameter is "/".
// The sub_path is a directory path starting from "/" and ending with "/".
//
// This method produces the following output.
//
// * Header
// ------------------------------------------------------------
// Check cache file and its stats file consistency
// ------------------------------------------------------------
// * When the cache file and its stats information match
// File path: <file path> -> [OK] no problem
//
// * If there is a problem with the cache file and its stats information
// File path: <file path>
// -> [P] <If the problem is that parsing is not possible in the first place, the message is output here with this prefix.>
// -> [E] there is a mark that data exists in stats, but there is no data in the cache file.
// <offset address>(bytes)
// ...
// ...
// -> [W] These show no data in stats, but there is evidence of data in the cache file.(no problem.)
// <offset address>(bytes)
// ...
// ...
//
bool FdManager : : RawCheckAllCache ( FILE * fp , const char * cache_stat_top_dir , const char * sub_path , int & total_file_cnt , int & err_file_cnt , int & err_dir_cnt )
{
2020-08-22 12:40:53 +00:00
if ( ! cache_stat_top_dir | | ' \0 ' = = cache_stat_top_dir [ 0 ] | | ! sub_path | | ' \0 ' = = sub_path [ 0 ] ) {
S3FS_PRN_ERR ( " Parameter cache_stat_top_dir is empty. " ) ;
return false ;
}
2020-06-28 08:00:41 +00:00
2020-08-22 12:40:53 +00:00
// open directory of cache file's stats
DIR * statsdir ;
2020-09-11 09:37:24 +00:00
std : : string target_dir = cache_stat_top_dir ;
2020-08-22 12:40:53 +00:00
target_dir + = sub_path ;
if ( NULL = = ( statsdir = opendir ( target_dir . c_str ( ) ) ) ) {
S3FS_PRN_ERR ( " Could not open directory(%s) by errno(%d) " , target_dir . c_str ( ) , errno ) ;
return false ;
2020-06-28 08:00:41 +00:00
}
2020-08-22 12:40:53 +00:00
// loop in directory of cache file's stats
struct dirent * pdirent = NULL ;
while ( NULL ! = ( pdirent = readdir ( statsdir ) ) ) {
if ( DT_DIR = = pdirent - > d_type ) {
// found directory
if ( 0 = = strcmp ( pdirent - > d_name , " . " ) | | 0 = = strcmp ( pdirent - > d_name , " .. " ) ) {
continue ;
}
// reentrant for sub directory
2020-09-11 09:37:24 +00:00
std : : string subdir_path = sub_path ;
2020-08-22 12:40:53 +00:00
subdir_path + = pdirent - > d_name ;
subdir_path + = ' / ' ;
if ( ! RawCheckAllCache ( fp , cache_stat_top_dir , subdir_path . c_str ( ) , total_file_cnt , err_file_cnt , err_dir_cnt ) ) {
// put error message for this dir.
+ + err_dir_cnt ;
S3FS_PRN_CACHE ( fp , CACHEDBG_FMT_DIR_PROB , subdir_path . c_str ( ) ) ;
S3FS_PRN_CACHE ( fp , CACHEDBG_FMT_CRIT_HEAD , " Something error is occurred in checking this directory " ) ;
}
2020-06-28 08:00:41 +00:00
2020-08-22 12:40:53 +00:00
} else {
+ + total_file_cnt ;
// make cache file path
2020-09-11 09:37:24 +00:00
std : : string strOpenedWarn ;
std : : string cache_path ;
std : : string object_file_path = sub_path ;
2020-08-22 12:40:53 +00:00
object_file_path + = pdirent - > d_name ;
if ( ! FdManager : : MakeCachePath ( object_file_path . c_str ( ) , cache_path , false , false ) | | cache_path . empty ( ) ) {
+ + err_file_cnt ;
S3FS_PRN_CACHE ( fp , CACHEDBG_FMT_FILE_PROB , object_file_path . c_str ( ) , strOpenedWarn . c_str ( ) ) ;
S3FS_PRN_CACHE ( fp , CACHEDBG_FMT_CRIT_HEAD , " Could not make cache file path " ) ;
continue ;
}
// check if the target file is currently in operation.
{
AutoLock auto_lock ( & FdManager : : fd_manager_lock ) ;
fdent_map_t : : iterator iter = fent . find ( object_file_path ) ;
if ( fent . end ( ) ! = iter ) {
// This file is opened now, then we need to put warning message.
strOpenedWarn = CACHEDBG_FMT_WARN_OPEN ;
}
}
// open cache file
int cache_file_fd ;
if ( - 1 = = ( cache_file_fd = open ( cache_path . c_str ( ) , O_RDONLY ) ) ) {
+ + err_file_cnt ;
S3FS_PRN_CACHE ( fp , CACHEDBG_FMT_FILE_PROB , object_file_path . c_str ( ) , strOpenedWarn . c_str ( ) ) ;
S3FS_PRN_CACHE ( fp , CACHEDBG_FMT_CRIT_HEAD , " Could not open cache file " ) ;
continue ;
}
// get inode number for cache file
struct stat st ;
if ( 0 ! = fstat ( cache_file_fd , & st ) ) {
+ + err_file_cnt ;
S3FS_PRN_CACHE ( fp , CACHEDBG_FMT_FILE_PROB , object_file_path . c_str ( ) , strOpenedWarn . c_str ( ) ) ;
S3FS_PRN_CACHE ( fp , CACHEDBG_FMT_CRIT_HEAD , " Could not get file inode number for cache file " ) ;
close ( cache_file_fd ) ;
continue ;
}
ino_t cache_file_inode = st . st_ino ;
// open cache stat file and load page info.
PageList pagelist ;
CacheFileStat cfstat ( object_file_path . c_str ( ) ) ;
if ( ! cfstat . ReadOnlyOpen ( ) | | ! pagelist . Serialize ( cfstat , false , cache_file_inode ) ) {
+ + err_file_cnt ;
S3FS_PRN_CACHE ( fp , CACHEDBG_FMT_FILE_PROB , object_file_path . c_str ( ) , strOpenedWarn . c_str ( ) ) ;
S3FS_PRN_CACHE ( fp , CACHEDBG_FMT_CRIT_HEAD , " Could not load cache file stats information " ) ;
close ( cache_file_fd ) ;
continue ;
}
cfstat . Release ( ) ;
// compare cache file size and stats information
if ( st . st_size ! = pagelist . Size ( ) ) {
+ + err_file_cnt ;
S3FS_PRN_CACHE ( fp , CACHEDBG_FMT_FILE_PROB , object_file_path . c_str ( ) , strOpenedWarn . c_str ( ) ) ;
S3FS_PRN_CACHE ( fp , CACHEDBG_FMT_CRIT_HEAD2 " The cache file size(%lld) and the value(%lld) from cache file stats are different " , static_cast < long long int > ( st . st_size ) , static_cast < long long int > ( pagelist . Size ( ) ) ) ;
close ( cache_file_fd ) ;
continue ;
}
// compare cache file stats and cache file blocks
fdpage_list_t err_area_list ;
fdpage_list_t warn_area_list ;
if ( ! pagelist . CompareSparseFile ( cache_file_fd , st . st_size , err_area_list , warn_area_list ) ) {
// Found some error or warning
S3FS_PRN_CACHE ( fp , CACHEDBG_FMT_FILE_PROB , object_file_path . c_str ( ) , strOpenedWarn . c_str ( ) ) ;
if ( ! warn_area_list . empty ( ) ) {
S3FS_PRN_CACHE ( fp , CACHEDBG_FMT_WARN_HEAD ) ;
for ( fdpage_list_t : : const_iterator witer = warn_area_list . begin ( ) ; witer ! = warn_area_list . end ( ) ; + + witer ) {
S3FS_PRN_CACHE ( fp , CACHEDBG_FMT_PROB_BLOCK , static_cast < size_t > ( witer - > offset ) , static_cast < size_t > ( witer - > bytes ) ) ;
}
}
if ( ! err_area_list . empty ( ) ) {
+ + err_file_cnt ;
S3FS_PRN_CACHE ( fp , CACHEDBG_FMT_ERR_HEAD ) ;
for ( fdpage_list_t : : const_iterator eiter = err_area_list . begin ( ) ; eiter ! = err_area_list . end ( ) ; + + eiter ) {
S3FS_PRN_CACHE ( fp , CACHEDBG_FMT_PROB_BLOCK , static_cast < size_t > ( eiter - > offset ) , static_cast < size_t > ( eiter - > bytes ) ) ;
}
}
} else {
// There is no problem!
if ( ! strOpenedWarn . empty ( ) ) {
strOpenedWarn + = " \n " ;
}
S3FS_PRN_CACHE ( fp , CACHEDBG_FMT_FILE_OK , object_file_path . c_str ( ) , strOpenedWarn . c_str ( ) ) ;
}
err_area_list . clear ( ) ;
warn_area_list . clear ( ) ;
close ( cache_file_fd ) ;
}
}
closedir ( statsdir ) ;
return true ;
2020-06-28 08:00:41 +00:00
}
bool FdManager : : CheckAllCache ( )
{
2020-08-22 12:40:53 +00:00
if ( ! FdManager : : HaveLseekHole ( ) ) {
S3FS_PRN_ERR ( " lseek does not support SEEK_DATA/SEEK_HOLE, then could not check cache. " ) ;
return false ;
}
FILE * fp ;
if ( FdManager : : check_cache_output . empty ( ) ) {
fp = stdout ;
} else {
if ( NULL = = ( fp = fopen ( FdManager : : check_cache_output . c_str ( ) , " a+ " ) ) ) {
S3FS_PRN_ERR ( " Could not open(create) output file(%s) for checking all cache by errno(%d) " , FdManager : : check_cache_output . c_str ( ) , errno ) ;
return false ;
}
2020-06-28 08:00:41 +00:00
}
2020-08-22 12:40:53 +00:00
// print head message
2021-05-07 17:48:47 +00:00
S3FS_PRN_CACHE ( fp , CACHEDBG_FMT_HEAD , S3fsLog : : GetCurrentTime ( ) . c_str ( ) ) ;
2020-06-28 08:00:41 +00:00
2020-08-22 12:40:53 +00:00
// Loop in directory of cache file's stats
2020-09-11 09:37:24 +00:00
std : : string top_path = CacheFileStat : : GetCacheFileStatTopDir ( ) ;
2020-08-22 12:40:53 +00:00
int total_file_cnt = 0 ;
int err_file_cnt = 0 ;
int err_dir_cnt = 0 ;
bool result = RawCheckAllCache ( fp , top_path . c_str ( ) , " / " , total_file_cnt , err_file_cnt , err_dir_cnt ) ;
if ( ! result ) {
S3FS_PRN_ERR ( " Processing failed due to some problem. " ) ;
}
2020-06-28 08:00:41 +00:00
2020-08-22 12:40:53 +00:00
// print foot message
S3FS_PRN_CACHE ( fp , CACHEDBG_FMT_FOOT , total_file_cnt , err_file_cnt , err_dir_cnt ) ;
2020-06-28 08:00:41 +00:00
2020-08-22 12:40:53 +00:00
if ( stdout ! = fp ) {
fclose ( fp ) ;
}
2020-06-28 08:00:41 +00:00
2020-08-22 12:40:53 +00:00
return result ;
2020-06-28 08:00:41 +00:00
}
2014-09-07 15:08:27 +00:00
/*
* Local variables :
2020-08-22 12:40:53 +00:00
* tab - width : 4
* c - basic - offset : 4
2014-09-07 15:08:27 +00:00
* End :
2020-08-22 12:40:53 +00:00
* vim600 : expandtab sw = 4 ts = 4 fdm = marker
* vim < 600 : expandtab sw = 4 ts = 4
2014-09-07 15:08:27 +00:00
*/