Primary competition visual

SANSA AWS Informal Settlements in South Africa by #ZindiWeekendz

Helping South Africa
$1 000 USD
Challenge completed over 5 years ago
Classification
Earth Observation
182 joined
77 active
Starti
Jun 12, 20
Closei
Jun 14, 20
Reveali
Jun 14, 20
Tips and Getting Started
Data ยท 12 Jun 2020, 13:39 ยท 8

I thought I'd share some info to help get started. We also just did a live stream, so if you want a video reference for connecting to the VM you can check it out here: https://www.twitch.tv/videos/648634668. But assuming that like me you're a text person, here's the juice:

- The training locations all fall within a single image tile, which I copied to the VM using `aws s3 cp s3://eohackathon-covid19/Hackthon_Data/Gauteng/2528C.tif 2528C.tif` (you need to set secret keys and things - see the README.yaml file for instructions). The test locations all fall within 2930D.tif. SO: you don't need to copy all the imagery from s3 to get going - just these two tiles (4GB each).

- The GP settlement layer is available as a shapefile, which means you can generate more training data if you want. BUT: careful with the class balance. Both train and (spoiler alert) test have ~20-30% positives (informal settlements) - randomly sampling locations will likely get you closer to 2% positives, and thus might give a model that does worse on the test set.

- The test set comes from an entirely different province. Check out some of the imagery and you'll notice it's from a fairly urban area, but with a good mix of land-use classes. If you're doing a random split on the training data, you'll see high accuracy (95%) locally but will get a nasty surprise when your model is scored on the leaderboard (eg 75% accuracy, and log_loss of ~0.7 or whatever, higher than the 0.2 seen in training). THINK ABOUT HOW TO MAKE A BETTER LOCAL VALIDATION SET. Maybe split by latitude, or generate a new validation set from a different location in GP...

- Make sure to save your notebook to your local machine after making a submission - the VMs disappear at the end of the weekend, and you don't want to be stuck with no code to submit :)

- Finally, although this is a competition, we're all trying to learn things. If you find a nice way to speed up something like image access, or have a nifty trick for generating more training locations, or you've done an image segmentation model using the shapefile as a mask, or you figured out how to install unrar... SHARE :) It's so nice as a beginner to get help from others and see how they've overcome challenges. Add tips in this thread or start your own.

Good luck, and have fun :)

JW

Discussion 8 answers

Thanks Johnowhitaker! :-)

12 Jun 2020, 13:50
Upvotes 0

Thanks so much for the excellent session!

12 Jun 2020, 14:35
Upvotes 0
User avatar
msamwelmollel
University of Glasgow

Thank you, John. I am using windows and I successfully connected to ssh via putty. I type Jupiter-notebook in the terminal and it generated a token for the notebook. But I can not be able to access localhost in the browser. Please could you help me with this one?

12 Jun 2020, 15:12
Upvotes 0

You need to set up port forwarding in Putty. If I remember correctly it's under ssh -> tunnels.

Something like this: https://www.ccsl.carleton.ca/~falaca/comp4108_w17/ssh_putty/index.html (but the destination is localhost:8888 and the source is 8000 or whatever you choose).

An alternative: AWS has a guide on setting up Jupyter: https://docs.aws.amazon.com/dlami/latest/devguide/setup-jupyter.html

User avatar
msamwelmollel
University of Glasgow

Thank you Johnowhitaker I am finally able to forward a port. But now I encounter another problem: when I copy the token it says invalid Invalid credentials. How to deal with this one

You can set a password instead. On mobile at the moment so you'll have to Google around :)

User avatar
msamwelmollel
University of Glasgow

Thank you I set the password and it is working now. Thank you very much