Primary competition visual

Spatio-Temporal Beam-Level Traffic Forecasting Challenge by ITU

12 000 CHF
Completed (over 1 year ago)
Forecast
716 joined
171 active
Starti
Jul 24, 24
Closei
Oct 11, 24
Reveali
Oct 11, 24
User avatar
skaak
Ferra Solutions
Dummy baseline
Notebooks · 31 Aug 2024, 12:25 · 7

Given the lack of right hand side variables for this in test, it is debatable if one can actually gain by using them ... because they are not known in future. The same for the left hand side variables ... even if you can model the autocorrelation, you have to predict so far into the unknown that I am not sure it is going to be useful ... so I did a little test ...

... below is an interesting and useful and extremely simple baseline. The host may find this useful also ... on public it scores around 0.2 (putting you firmly into the top 20 atm) ... and ... wait for it ... it only uses dummy variables! This will simply predict a repetitive cycle for each beam.

There are of course many ways in which to improve this ... that is left as an exercise to the reader :-)

# Simple model for each beam
import numpy  as np
import pandas as pd
from   sklearn.ensemble import HistGradientBoostingRegressor
# Read data
ss     = pd.read_csv ( "SampleSubmission.csv" )
ty     = pd.read_csv ( "traffic_DLThpVol.csv" )
sub_fn = "simple dummy sub.csv"
# Configuration
n_base =        30
n_cell =         3
n_beam =        32
rs     =  123

# Dummies
def make_dummies ( n ) :

    i  = np.arange ( n, dtype = int )
    df = pd.DataFrame ( index = i )

    # Hour
    for j in range ( 24 - 1 ) :

        df [ f"h{ j }" ] = 1.0 * ( ( i % 24 ) == j )

    # Week
    for j in range ( 7 - 1 ) :

        df [ f"w{ j }" ] = 1.0 * ( ( ( i // 24 ) % 7 ) == j )

    return df

x_train = make_dummies ( len ( ty ) )
x_test  = make_dummies ( len ( ty ) + 1008 ).iloc [ len ( ty ) : ]

# Prepare sample submission
ss = ss.set_index ( "ID", drop = False )
ss [ "Target" ] = ss [ "Target" ].astype ( "float16" )
# Fit one model for each beam
for base in range ( n_base ) :

    for cell in range ( n_cell ) :

        for beam in range ( n_beam ) :

            rs     += 1
            mod_col = f"{ base }_{ cell }_{ beam }"
            model   = HistGradientBoostingRegressor ( random_state = rs, loss = "absolute_error" )

            model.fit ( x_train, ty [ mod_col ] )

            pred    = np.clip ( model.predict ( x_test ), 0, 255 )

            # Load predictions into sample submission
            for k in range ( 168 ) :

                ss.at [ "traffic_DLThpVol_test_5w-6w_"   + str (       k     ) + "_" + mod_col, "Target" ] = pred [   k     ]
                ss.at [ "traffic_DLThpVol_test_10w-11w_" + str ( 168 - k - 1 ) + "_" + mod_col, "Target" ] = pred [ - k - 1 ]

# Save submission
ss.to_csv ( sub_fn, index = False )
Discussion 7 answers
User avatar
marching_learning
Nostalgic Mathematics

Thank you @skaak for sharing. But what is the exact score of this baseline ? And howmuch time does it take to train.

3 Sep 2024, 09:12
Upvotes 0
User avatar
skaak
Ferra Solutions

For me, this scored 0.2016, but I used a different seed (I cleaned it up a bit when I copied it in here).

This really does not take long, just a few minutes to run.

User avatar
marching_learning
Nostalgic Mathematics

Thank you.

thanks ....just a small clarification on submission

are only supposed to submit one file?

User avatar
skaak
Ferra Solutions

Yes

User avatar
skaak
Ferra Solutions

Well, one at a time, not sure I understand your question? You can submit many times, but only one file at a time ...