Greetings Zidians, better late than never. Can someone give me a hint on how to merge road survey data with the Train. Thanks!
Hi Acalo!
You can try to find the segment which is nearest of each accident location point in train set and get the index of this line in roads data. And then, for the given index in road data, add in train the additional values from road data.
import numpy as np
import geopandas as gpd # For loading the map of road segments
from shapely.geometry import Point, LineString
from tqdm.notebook import tqdm
train = pd.read_csv('Data/Train.csv', parse_dates=['datetime'])
road_surveys = pd.read_csv('Data/Segment_info.csv')
road_segment_locs = gpd.read_file('Data/segments_geometry.geojson')
geo_data = pd.merge(road_surveys, road_segment_locs, on="segment_id", how="right")
train["point"]=gpd.points_from_xy(train["longitude"], train["latitude"])
def segment_finder(segments, point):
distances=[line.distance(point) for line in segments]
return np.argmin(distances)
indices=[segment_finder(geo_data["geometry"], point) for point in tqdm(df["point"])]
additional_data=geo_data.iloc[indices]
additional_data.reset_index(drop=True, inplace=True)
def add_data(col):
return additional_data[col]
for col in tqdm(geo_data.columns):
df[col]=add_data(col)