How to write great documentation for your next notebook
Data skills · 16 Apr 2024, 10:55 · 3 mins read ·
64

When creating a machine learning solution, it is vital to include documentation, especially when submitting code for review by the Zindi code team.

Documentation for code solutions is essential for ensuring that hosts understand the purpose, functionality, and usage of the code. It serves as a crucial resource for maintaining, troubleshooting, and incorporating your solutions within their pipelines. Effective documentation should include details such as the purpose and scope of the code, usage instructions, design decisions, implementation details, potential issues, and best practices.

By providing clear and comprehensive documentation, you will facilitate collaboration, knowledge transfer, and efficient development processes, ultimately leading to higher-quality outcomes and better implementation of your solution by the hosts. You can also add any notes or insight that would be useful to the host.

In addition, good documentation is a critical capability that you will need for any data or software engineering career. Now’s the time to improve your documentation skills!

Here are key elements to include in your documentation:

Overview and objectives:

  • A high-level description of the solution, including its purpose and the problems it addresses.
  • Objectives and expected outcomes of implementing the solution.

Architecture diagram:

  • A visual representation of the entire data flow and components, illustrating how ETL, modeling, and inference are interconnected.

ETL process:

  • Extract: Describe the data sources, data formats, and extraction methods. Include any considerations for data volume and frequency of extraction.
  • Transform: Detail the transformation logic, data cleansing, and preprocessing steps. Explain how the data is transformed to fit the needs of the model.
  • Load: Describe how transformed data is loaded into the storage or system for modeling and inference. Include information on data storage mechanisms and any indexing or optimisation strategies.

Data modeling:

  • Description of the data model(s) used, including any assumptions or theoretical foundations.
  • Details on feature selection, engineering, and normalization processes.
  • Information on model training, including algorithms, hyperparameters, training processes, and evaluation metrics.
  • Explanation of model validation, and how the model's performance is measured.

Inference:

  • Description of how the model is deployed for inference, including the infrastructure and services used.
  • Details on how new data is input into the model and how the output is interpreted.
  • Information on handling model updates, versioning, and retraining strategies.

Run time:

  • Indicate the run time for each script, notebook, and each model.

Performance metrics:

  • Metrics and KPIs used to measure the efficiency and effectiveness of the ETL processes, model accuracy, and inference outcomes.
  • Report your public and private score scores.
  • Comment on any other metrics you used when building your solution, if applicable.

Error handling and logging:

  • Description of error handling mechanisms and logging strategies throughout the ETL, modeling, and inference stages.

Maintenance and monitoring:

  • Guidelines for ongoing monitoring, maintenance, and updating of the ETL processes, models, and inference systems.
  • Strategies for scaling the solution and managing its lifecycle.

Good documentation not only facilitates better understanding and usage of the solution, but also ensures that it can be effectively maintained, scaled, and adapted over time. Improving your documentation will make you a better engineer!

Back to top
If you enjoyed this content upvote this article to show your support
Discussion 4 answers
User avatar
AdeptSchneider22
Kenyatta University

Nice documentation guideline. I'll definitely start practising the habit of documenting my work using this guideline.

22 May 2024, 15:58
Upvotes 2

So true documention helps to guide and fasttrack learning

12 Jun 2024, 15:46
Upvotes 0

Nice documentation. Please, any sample notebook?

14 May 2025, 00:47
Upvotes 0

Thanks for sharing

13 Oct 2025, 21:41
Upvotes 0