تصميم خوارزمية إلغاء بيانات مكررة فعالة للملفات الصوتية في التخزين السحابي

عمار زقزوق; حسن حسن

doi:10.5281/zenodo.19347954

Authors

Prof. Dr. Ammar Zakzouk Al-Wataniya Private University Author
Hasan Hasan Homs University Author

DOI:

https://doi.org/10.5281/zenodo.19347954

Keywords:

Deduplication, Hash Table, MD6, Audio Files, Cloud Storage

Abstract

Duplicate data poses a significant challenge in big data storage systems as it consumes storage space, affecting data organization, management, and processing. To solvethis problem, hashalgorithms are used to generate hashkeys for files. However, as theamount of data stored in the cloud increases, the search and matching process takes longer. Additionally, hashkeys can match different files, known as collisions, which are related to the length of the hashkey. The longer the key, the less likely collisions will occur.In this paper, we present a technique for eliminating duplicate data at the file level to reduce storage of duplicate audio data in the cloud storage system. The proposed technique aims to reduce the search time for hashvalues by creatinga reduction table with multiple indexes. These indexes are designed based on the audio file format. Therefore, the hashtable includes multiple indexes, each for a specific format. To minimize the probabilityof collisions, MD6 algorithm is used, which produces a key with a length of 512 bits.