This is a private hackathon open to UmojaHack Tunisia participants. If you are a university student in Tunisia and would like to participate, contact Zindi Ambassador Mohamed Salem Jedidi.
Protein kinases are enzymes that play a vital role in all eukaryotic cells (such as our own). They switch other proteins on and off by adding phosphate groups, which changes how those proteins behave by modifying properties such as flexibility or reactivity. Protein kinases can be classified by function into multiple classes within several main groups.
All enzymes, including protein kinases, are made of one or more chains of amino acids, which determine their structure, behaviour, and interaction with other enzymes and molecules. That means it should be possible to predict the protein kinase class given just the amino acid sequence - which is the goal of this hackathon. You must predict which class a sequence belongs to.
Since sequences in the same class all perform essentially the same function, usually through the same mechanism, they are likely to share certain defining characteristics. However, these kinases are thought to have evolved via multiple different pathways. Two sequences with the same function might have evolved separately and ended up with very different sequences, and two kinases might look similar due to a shared evolutionary history, but perform very different functions. This makes it harder to tell via simple comparison what function a given sequence will perform.
In addition to kinases from known organisms (which we have from studying their proteomes), there are vast numbers of metagenomic kinase sequences - this is proteomic sequence data from environmental samples. Being able to quickly annotate them with function using this model (i.e. going beyond simple sequence similarity) would be indispensable. Models developed in the course of this challenge may contribute to furthering the understanding of the world around us.
How to prepare for UmojaHack
About InstaDeep (www.instadeep.com)
InstaDeep delivers AI-powered decision-making systems for the Enterprise. With expertise in both machine intelligence research and concrete business deployments, we provide a competitive advantage to our customers in an AI-first world. As one of the leading AI companies in Africa, InstaDeep knows first-hand what African talent is truly capable of.
InstaDeep has more than 120 employees spread across its headquarters in London, and offices in Paris, Tunis, Dubai, Lagos and Cape Town. In addition to its connections to African educational institutions, the company also possesses strong ties to elite French schools and top rated universities in the UK. To apply: instadeep.bamboohr.com/jobs and hello@instadeep.com.
About IEEE Tunisia Section (ieee.tn)
IEEE Tunisia Section is the Association representing IEEE (the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity) in Tunisia with more than 3000 members in about 30 Tunisian universities.
This is a private hackathon open to UmojaHack Tunisia participants. If you are a university student in Tunisia and would like to participate, contact Zindi Ambassador Mohamed Salem Jedidi.
Teams and collaboration
You may participate in this competition as an individual or in a team of up to four people. When creating a team, the team must have a total submission count less than or equal to the maximum allowable submissions as of the formation date. A team will be allowed the maximum number of submissions for the competition, minus the highest number of submissions among team members at team formation. Prizes are transferred only to the individual players or to the team leader.
Multiple accounts per user are not permitted, and neither is collaboration or membership across multiple teams. Individuals and their submissions originating from multiple accounts will be disqualified.
Code must not be shared privately outside of a team. Any code that is shared, must be made available to all competition participants through the platform. (i.e. on the discussion boards).
Datasets and packages
The solution must use publicly-available, open-source packages only. Your models should not use any of the metadata provided.
You may use only the datasets provided for this competition. Automated machine learning tools such as automl are not permitted.
If the challenge is a computer vision challenge, image metadata (Image size, aspect ratio, pixel count, etc) may not be used in your submission.
You may use pretrained models as long as they are openly available to everyone.
The data used in this competition is the sole property of Zindi and the competition host. You may not transmit, duplicate, publish, redistribute or otherwise provide or make available any competition data to any party not participating in the Competition (this includes uploading the data to any public site such as Kaggle or GitHub). You may upload, store and work with the data on any cloud platform such as Google Colab, AWS or similar, as long as 1) the data remains private and 2) doing so does not contravene Zindi’s rules of use.
You must notify Zindi immediately upon learning of any unauthorised transmission of or unauthorised access to the competition data, and work with Zindi to rectify any unauthorised transmission or access.
Your solution must not infringe the rights of any third party and you must be legally entitled to assign ownership of all rights of copyright in and to the winning solution code to Zindi.
Submissions and winning
You may make a maximum of 100 submissions per day. Your highest-scoring solution on the private leaderboard at the end of the competition will be the one by which you are judged.
Zindi maintains a public leaderboard and a private leaderboard for each competition. The Public Leaderboard includes approximately 50% of the test dataset. While the competition is open, the Public Leaderboard will rank the submitted solutions by the accuracy score they achieve. Upon close of the competition, the Private Leaderboard, which covers the other 50% of the test dataset, will be made public and will constitute the final ranking for the competition.
If your solution places 1st, 2nd, or 3rd in the final ranking, you will be required to submit your winning solution code to us for verification and you thereby agree to share all worldwide rights of copyright in and to such winning solution to Zindi.
You acknowledge and agree that Zindi may, without any obligation to do so, remove or disqualify an individual, team, or account if Zindi believes that such individual, team, or account is in violation of these rules. Entry into this competition constitutes your acceptance of these official competition rules.
Teams may win in one challenge category, and are encouraged to enter only one
Please refer to the FAQs and Terms of Use for additional rules that may apply to this competition. We reserve the right to update these rules at any time.
The evaluation metric for this challenge is Log Loss.
This error metric requires probabilities to be submitted, i.e. do not set thresholds (or round your probabilities) to improve your place on the leaderboard. In order to ensure that the client receives the best solution Zindi will need the raw probabilities. This will allow the clients to set thresholds to their own needs.
The values can be between 0 and 1, inclusive.
Your submission should look like this (numbers for illustrative purposes only).
ID target_0 target_1 target_2 target_3 target_4 target_5 target_6 target_7 ID_test_70942 0.63 0.11 0.01 0.45 0.89 0.33 0.67 0.29
In order to win, you must:
1st prize: DT 1350 shared between members of the winning team plus DT 3450 for the university to which the winning team is affiliated.
2nd prize: DT 825 shared between members of the winning team plus DT 2750 for the university to which the winning team is affiliated.
3rd prize: DT 550 shared between members of the winning team plus DT 2075 for the university to which the winning team is affiliated.
Important to note: Only one team from each university will be allowed to win a prize. If more than one team from one university places in the top three, only the top team will win a prize and their university will also win only the one prize. The remaining prizes will be awarded to the next top team/university.
09:00 – 09:30 Welcome and orientation (live video conference using Zoom and YouTube - https://www.youtube.com/watch?v=NrIhU4hOCEA)
09:30 – 10:00 Technical orientation to the platform and the challenges (live video conference using Zoom and YouTube - https://www.youtube.com/watch?v=NrIhU4hOCEA)
10:00 Competition opens (note that users can sign up for a competition and join teams before the time)
10:00 – 11:00 Q&A with Mohamed Jedidi. Jedidi will stay on the Zoom call to answer questions about the challenge and the notebooks (https://www.youtube.com/watch?v=NrIhU4hOCEA)
12:00 - 13:00 Modelling live stream and Q&A with Johno Whitaker (https://www.twitch.tv/johnowhitaker)
10:00 – 19:00 Students form teams and work on the challenge, questions and issues during this time can be addressed by local Zindi reps or via WhatsApp group
19:00 Submissions close
19:15 Announcement of local winners and prizes (live video conference using Zoom)