Tanzania Tourism Prediction by Pycon Tanzania Community
Can you use tourism survey data and ML to predict how much money a tourist will spend when visiting Tanzania?
101 active ยท 626 enrolled
Good for beginners

The dataset describes 6476 rows of up-to-date information on tourist expenditure collected by the National Bureau of Statistics (NBS) in Tanzania.The dataset was collected to gain a better understanding of the status of the tourism sector and provide an instrument that will enable sector growth.

Your goal is to accurately predict tourist expenditure when visiting Tanzania.

The majority of the visitors under the age group of 25-44 came for business (18.5%), or leisure and holidays (53.2%), which is consistent with the fact that they are economically more productive. Those at the age group of 45-64 were more prominent in holiday making and visiting friends and relatives. The results further reveal that most visitors belonging to the age group of 18-24 came for leisure and holidays (55.3%) as well as volunteering (13.7%). The majority of senior citizens (65 and above) came for leisure and holidays (80.9%) and visiting friends and relatives (9.5%).

The survey covers seven departure points, namely: Julius Nyerere International Airport, Kilimanjaro International Airport, Abeid Amani Karume International Airport, and the Namanga, Tunduma, Mtukula and Manyovu border points.

Files available for download:

  • Train.csv - contains the target. This is the dataset that you will use to train your model.
  • Test.csv- is the dataset to which you will apply your model to test how well it performs. Use your model and this dataset to predict the tourist expenditure. The test set contains 1619 rows of tourists information. This dataset includes the same fields as train.csv except for the last column. Note that the target is total_cost.
  • SampleSubmission.csv - shows the submission format for this competition, with the “test_id” column mirroring that of Test.csv and the “total_cost” column containing your predictions. The order of the rows does not matter, but the names of the “test_id” must be correct.
  • VariableDefintions.csv provides definitions of the variables found in test.csv and train.csv