s3fs-fuse/src/fdcache.h

111 lines
4.6 KiB
C
Raw Normal View History

2014-09-07 15:08:27 +00:00
/*
* s3fs - FUSE-based file system backed by Amazon S3
*
* Copyright(C) 2007 Randy Rizun <rrizun@gmail.com>
2014-09-07 15:08:27 +00:00
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*/
#ifndef S3FS_FDCACHE_H_
#define S3FS_FDCACHE_H_
#include "fdcache_entity.h"
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
//------------------------------------------------
// class FdManager
//------------------------------------------------
class FdManager
{
private:
static FdManager singleton;
static pthread_mutex_t fd_manager_lock;
static pthread_mutex_t cache_cleanup_lock;
static pthread_mutex_t reserved_diskspace_lock;
static bool is_lock_init;
static std::string cache_dir;
static bool check_cache_dir_exist;
static off_t free_disk_space; // limit free disk space
static off_t fake_used_disk_space; // difference between fake free disk space and actual at startup(for test/debug)
static std::string check_cache_output;
static bool checked_lseek;
static bool have_lseek_hole;
static std::string tmp_dir;
fdent_map_t fent;
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
private:
static off_t GetFreeDiskSpace(const char* path);
static bool IsDir(const std::string* dir);
int GetPseudoFdCount(const char* path);
void CleanupCacheDirInternal(const std::string &path = "");
bool RawCheckAllCache(FILE* fp, const char* cache_stat_top_dir, const char* sub_path, int& total_file_cnt, int& err_file_cnt, int& err_dir_cnt);
Changes codes for performance(part 3) * Summay This revision includes big change about temporary file and local cache file. By this big change, s3fs works with good performance when s3fs opens/ closes/syncs/reads object. I made a big change about the handling about temporary file and local cache file to do this implementation. * Detail 1) About temporary file(local file) s3fs uses a temporary file on local file system when s3fs does download/ upload/open/seek object on S3. After this revision, s3fs calls ftruncate() function when s3fs makes the temporary file. In this way s3fs can set a file size of precisely length without downloading. (Notice - ftruncate function is for XSI-compliant systems, so that possibly you have a problem on non-XSI-compliant systems.) By this change, s3fs can download a part of a object by requesting with "Range" http header. It seems like downloading by each block unit. The default block(part) size is 50MB, it is caused the result which is default parallel requests count(5) by default multipart upload size(10MB). If you need to change this block size, you can change by new option "fd_page_size". This option can take from 1MB(1024 * 1024) to any bytes. So that, you have to take care about that fdcache.cpp(and fdcache.h) were changed a lot. 2) About local cache Local cache files which are in directory specified by "use_cache" option do not have always all of object data. This cause is that s3fs uses ftruncate function and reads(writes) each block unit of a temporary file. s3fs manages each block unit's status which are "downloaded area" or "not". For this status, s3fs makes new temporary file in cache directory which is specified by "use_cache" option. This status files is in a directory which is named "<use_cache sirectory>/.<bucket_name>/". When s3fs opens this status file, s3fs locks this file for exclusive control by calling flock function. You need to take care about this, the status files can not be laid on network drive(like NFS). This revision changes about file open mode, s3fs always opens a local cache file and each status file with writable mode. Last, this revision adds new option "del_cache", this option means that s3fs deletes all local cache file when s3fs starts and exits. 3) Uploading When s3fs writes data to file descriptor through FUSE request, old s3fs revision downloads all of the object. But new revision does not download all, it downloads only small percial area(some block units) including writing data area. And when s3fs closes or flushes the file descriptor, s3fs downloads other area which is not downloaded from server. After that, s3fs uploads all of data. Already r456 revision has parallel upload function, then this revision with r456 and r457 are very big change for performance. 4) Downloading By changing a temporary file and a local cache file, when s3fs downloads a object, it downloads only the required range(some block units). And s3fs downloads units by parallel GET request, it is same as a case of uploading. (Maximum parallel request count and each download size are specified same parameters for uploading.) In the new revision, when s3fs opens file, s3fs returns file descriptor soon. Because s3fs only opens(makes) the file descriptor with no downloading data. And when s3fs reads a data, s3fs downloads only some block unit including specified area. This result is good for performance. 5) Changes option name The option "parallel_upload" which added at r456 is changed to new option name as "parallel_count". This reason is this option value is not only used by uploading object, but a uploading object also uses this option. (For a while, you can use old option name "parallel_upload" for compatibility.) git-svn-id: http://s3fs.googlecode.com/svn/trunk@458 df820570-a93a-0410-bd06-b72b767a4274
2013-07-23 16:01:48 +00:00
public:
FdManager();
~FdManager();
// Reference singleton
2020-09-20 22:02:06 +00:00
static FdManager* get() { return &singleton; }
2020-09-20 22:02:06 +00:00
static bool DeleteCacheDirectory();
static int DeleteCacheFile(const char* path);
static bool SetCacheDir(const char* dir);
2020-09-20 22:02:06 +00:00
static bool IsCacheDir() { return !FdManager::cache_dir.empty(); }
static const char* GetCacheDir() { return FdManager::cache_dir.c_str(); }
static bool SetCacheCheckOutput(const char* path);
2020-09-20 22:02:06 +00:00
static const char* GetCacheCheckOutput() { return FdManager::check_cache_output.c_str(); }
static bool MakeCachePath(const char* path, std::string& cache_path, bool is_create_dir = true, bool is_mirror_path = false);
2020-09-20 22:02:06 +00:00
static bool CheckCacheTopDir();
static bool MakeRandomTempPath(const char* path, std::string& tmppath);
static bool SetCheckCacheDirExist(bool is_check);
2020-09-20 22:02:06 +00:00
static bool CheckCacheDirExist();
static bool HasOpenEntityFd(const char* path);
static int GetOpenFdCount(const char* path);
static off_t GetEnsureFreeDiskSpace();
static off_t SetEnsureFreeDiskSpace(off_t size);
static bool InitFakeUsedDiskSize(off_t fake_freesize);
static bool IsSafeDiskSpace(const char* path, off_t size);
static void FreeReservedDiskSpace(off_t size);
static bool ReserveDiskSpace(off_t size);
2020-09-20 22:02:06 +00:00
static bool HaveLseekHole();
static bool SetTmpDir(const char* dir);
static bool CheckTmpDirExist();
static FILE* MakeTempFile();
// Return FdEntity associated with path, returning nullptr on error. This operation increments the reference count; callers must decrement via Close after use.
FdEntity* GetFdEntity(const char* path, int& existfd, bool newfd = true, AutoLock::Type locktype = AutoLock::NONE);
2023-07-25 13:41:00 +00:00
FdEntity* Open(int& fd, const char* path, const headers_t* pmeta, off_t size, const struct timespec& ts_mctime, int flags, bool force_tmpfile, bool is_create, bool ignore_modify, AutoLock::Type type);
FdEntity* GetExistFdEntity(const char* path, int existfd = -1);
2021-06-27 02:22:33 +00:00
FdEntity* OpenExistFdEntity(const char* path, int& fd, int flags = O_RDONLY);
void Rename(const std::string &from, const std::string &to);
bool Close(FdEntity* ent, int fd);
bool ChangeEntityToTempPath(FdEntity* ent, const char* path);
void CleanupCacheDir();
2020-09-20 22:02:06 +00:00
bool CheckAllCache();
};
#endif // S3FS_FDCACHE_H_
2014-09-07 15:08:27 +00:00
/*
* Local variables:
* tab-width: 4
* c-basic-offset: 4
2014-09-07 15:08:27 +00:00
* End:
* vim600: expandtab sw=4 ts=4 fdm=marker
* vim<600: expandtab sw=4 ts=4
2014-09-07 15:08:27 +00:00
*/