Primary competition visual

CGIAR Eyes on the Ground Challenge

Helping Africa
$10 000 USD
Completed (over 2 years ago)
Prediction
871 joined
137 active
Starti
Jul 21, 23
Closei
Nov 03, 23
Reveali
Nov 03, 23
Data Downloading is taking too long
Help · 1 Oct 2023, 17:13 · 3

Well am downloading data from Colab but its taking long to download the zip files though i have the test and train folders but they are not downloadable until the zip files are done downloading which is really slow. Ay help? I really do not know how to access the data using AWS

Discussion 3 answers
User avatar
Muhamed_Tuo
Inveniam

Hey,

You can download it using aws-cli:

!export AWS_ACCESS_KEY_ID=your_access_key
!export AWS_SECRET_ACCESS_KEY=your_secret_key
!export AWS_DEFAULT_REGION=us-west-2

!aws s3 sync s3://eyes-on-the-ground /path_to_data_folder/cgiar-eyes-on-the-ground/

PS: Make sure to have aws-cli installed

3 Oct 2023, 15:20
Upvotes 0
User avatar
Muhamed_Tuo
Inveniam

It shouldn't take longer than a few minutes

Instead of using download_files function, you can run the following code snippet:

def create_paths(folder, local_path=local_path):
    folder_path = Path(os.path.join(local_path, folder))
     # Create all folders in the path
    folder_path.mkdir(parents=True, exist_ok=True)
def download_and_save_files(file_name, bucket_name=bucket_name, local_path=local_path):
    file_path = Path(os.path.join(local_path, file_name))
 file_path.parent.mkdir(parents=True, exist_ok=True)
file_path.parent.mkdir(parents=True, exist_ok=True)
    s3_client.download_file(
    bucket_name,
    file_name,
    str(file_path)
    )

from multiprocessing import Pool
from tqdm.notebook import tqdm as T
import os
if __name__ == '__main__':
    with Pool(8) as p:
        list(T(p.imap(create_paths, folders), total=len(folders), colour='red'))
    with Pool(8) as p:
        list(T(p.imap(download_and_save_files, file_names), total=len(file_names), colour='red'))

It basically does the same task but uses multiprocessing so it is around 8 times faster.

5 Oct 2023, 17:04
Upvotes 0