Loading the data using this code below takes time. Is that normal? Or is there something I'm doing wrong? My google colab session also crashed while running the code. Any help would be appreciated! Thanks!
n_obs = 5
X = np.empty((0, 2 * (n_obs-1)))
y = np.empty((0, 1))
field_ids = np.empty((0, 1))
for tile_id in tile_ids_train:
if tile_id != '1951': # avoid using this specific tile for the Hackathon as it might have a missing file
tile_df = competition_train_df[competition_train_df['tile_id']==tile_id]
label_src = rasterio.open(tile_df[tile_df['asset']=='labels']['file_path'].values[0])
label_array = label_src.read(1)
y = np.append(y, label_array.flatten())
field_id_src = rasterio.open(tile_df[tile_df['asset']=='field_ids']['file_path'].values[0])
field_id_array = field_id_src.read(1)
field_ids = np.append(field_ids, field_id_array.flatten())
tile_date_times = tile_df[tile_df['satellite_platform']=='s1']['datetime'].unique()
X_tile = np.empty((256 * 256, 0))
for date_time in tile_date_times[ : 4 * n_obs : n_obs]:
vv_src = rasterio.open(tile_df[(tile_df['datetime']==date_time) & (tile_df['asset']=='VV')]['file_path'].values[0])
vv_array = np.expand_dims(vv_src.read(1).flatten(), axis=1)
vh_src = rasterio.open(tile_df[(tile_df['datetime']==date_time) & (tile_df['asset']=='VH')]['file_path'].values[0])
vh_array = np.expand_dims(vh_src.read(1).flatten(), axis=1)
X_tile = np.append(X_tile, vv_array, axis = 1)
X_tile = np.append(X_tile, vh_array, axis = 1)
X = np.append(X, X_tile, axis=0)
mine didn't work completely. I had to remove the axis=0 part but that took forever to load.
am afraid am going to spend more time loading the data rather than implementing the solution.
its the data size.. it crashes due to memory issues
Probably insufficient RAM. I am not certain the most efficient way to handle this. For now, I load the data in batches and append.
Edit
I also use del to delete variables I believe I would no longer need to free memory.