🖥️ Data Talk: Dummy baseline

Spatio-Temporal Beam-Level Traffic Forecasting Challenge by ITU

12 000 CHF

Completed (over 1 year ago)

Skills you will learn

Forecast

733 joined

169 active

Info Data Chat Leaderboard

Start

Jul 24, 24

Oct 11, 24

Reveal

Oct 11, 24

skaak

Ferra Solutions

Dummy baseline

Notebooks · 31 Aug 2024, 12:25 · 7

Given the lack of right hand side variables for this in test, it is debatable if one can actually gain by using them ... because they are not known in future. The same for the left hand side variables ... even if you can model the autocorrelation, you have to predict so far into the unknown that I am not sure it is going to be useful ... so I did a little test ...

... below is an interesting and useful and extremely simple baseline. The host may find this useful also ... on public it scores around 0.2 (putting you firmly into the top 20 atm) ... and ... wait for it ... it only uses dummy variables! This will simply predict a repetitive cycle for each beam.

There are of course many ways in which to improve this ... that is left as an exercise to the reader :-)

# Simple model for each beam

import numpy  as np

import pandas as pd

from   sklearn.ensemble import HistGradientBoostingRegressor

# Read data

ss     = pd.read_csv ( "SampleSubmission.csv" )

ty     = pd.read_csv ( "traffic_DLThpVol.csv" )

sub_fn = "simple dummy sub.csv"

# Configuration

n_base =        30

n_cell =         3

n_beam =        32

rs     =  123

# Dummies

def make_dummies ( n ) :

    i  = np.arange ( n, dtype = int )

    df = pd.DataFrame ( index = i )

    # Hour

    for j in range ( 24 - 1 ) :

        df [ f"h{ j }" ] = 1.0 * ( ( i % 24 ) == j )

    # Week

    for j in range ( 7 - 1 ) :

        df [ f"w{ j }" ] = 1.0 * ( ( ( i // 24 ) % 7 ) == j )

    return df

x_train = make_dummies ( len ( ty ) )

x_test  = make_dummies ( len ( ty ) + 1008 ).iloc [ len ( ty ) : ]

# Prepare sample submission

ss = ss.set_index ( "ID", drop = False )

ss [ "Target" ] = ss [ "Target" ].astype ( "float16" )

# Fit one model for each beam

for base in range ( n_base ) :

    for cell in range ( n_cell ) :

        for beam in range ( n_beam ) :

            rs     += 1

            mod_col = f"{ base }_{ cell }_{ beam }"

            model   = HistGradientBoostingRegressor ( random_state = rs, loss = "absolute_error" )

            model.fit ( x_train, ty [ mod_col ] )

            pred    = np.clip ( model.predict ( x_test ), 0, 255 )

            # Load predictions into sample submission

            for k in range ( 168 ) :

                ss.at [ "traffic_DLThpVol_test_5w-6w_"   + str (       k     ) + "_" + mod_col, "Target" ] = pred [   k     ]

                ss.at [ "traffic_DLThpVol_test_10w-11w_" + str ( 168 - k - 1 ) + "_" + mod_col, "Target" ] = pred [ - k - 1 ]

# Save submission

ss.to_csv ( sub_fn, index = False )

Discussion 7 answers

marching_learning

Nostalgic Mathematics

Thank you @skaak for sharing. But what is the exact score of this baseline ? And howmuch time does it take to train.

3 Sep 2024, 09:12

Upvotes 0

skaak

Ferra Solutions

For me, this scored 0.2016, but I used a different seed (I cleaned it up a bit when I copied it in here).

This really does not take long, just a few minutes to run.

replied to marching_learning3 Sep 2024, 09:29

Upvotes 2

marching_learning

Nostalgic Mathematics

Thank you.

replied to skaak3 Sep 2024, 09:47

Upvotes 0

Pita

thanks ....just a small clarification on submission

are only supposed to submit one file?

replied to skaak4 Sep 2024, 11:51

Upvotes 0

skaak

Ferra Solutions

Yes

replied to Pita4 Sep 2024, 15:17

Upvotes 0

skaak

Ferra Solutions

Well, one at a time, not sure I understand your question? You can submit many times, but only one file at a time ...

replied to skaak4 Sep 2024, 15:18

Upvotes 0

VeryCreativeUsername

8 Sep 2024, 17:03

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status