
Hi hackers, 👋 I trust you are having a great time in this competition. Gen-AI competitions are new, so you should have as much fun as possible🌟. For folks still wondering how to even get started, I present you with two baseline approaches and some tips on how to do better at the ROUGE-1 metric. I also took out quality time to explain as high level as possible.🚀 Cheers.
1. RAG approach (0.47+ public score)📈:
2. Fine-tuning approach (~0.32 public score)📊:
Have fun!! 🥂 Keep learning! Keep Winning! 🏆
Great work @Professor, but we can't use GPUs in this competition and this RAG notebook used quantization which requires GPUs.
I don't see anything in the info that says that GPUs aren't allowed 🤷🏿♂️
https://zindi.africa/competitions/malawi-public-health-systems-llm-challenge/discussions/20000
Hi, @Nayal_17, thanks for pointing that out. You are completely right, Just checked out Steve's thread. Bits and Bytes require GPUs, so it may make more sense to switch to the GGUF/GGML equivalent of the model on TheBloke's HF repo. I'll implement that and edit the post. But damn, inference on the entire test set may take years on Kaggle's CPU.😀
I know right, Tbf you can perform inference using gpu enabled runtime, but make sure the model can run locally as well, my opinion though. That way you will speed up your experimentation time. Performing inference locally may take over 1 day. My experience though
Yeah, that makes more sense. Damn! You spent over a day doing inference? 😅
Yeah broooo, it's tough 😂😂😂
Not on kaggle cpus though
In my local laptop
But maybe others have found a way to run them faster on cpu I don't know
@Professor
No other way to run inference faster on only a CPU, unless you use smaller models like the tiny Flan T-5 , BERT, GPT 2.
Yeah @KevinKibe, You are right, but as KoleshJr has said, an idea is that you can probably use a GPU to run your experiments for submission to the leaderboard, this will help you experiment faster. But most importantly, ensure that your solution works on a CPU. Also when I ran !lscpu on Kaggle I saw that the processor there has 2 cores and 4 threads, whereas the specifications limit for this comp is a core i9 (8 cores & 16 threads). So there's a high chance a single inference will be 4 times faster with an i9.
@Koleshjr, are you using CTransformers, llamacpp or original transformers library without quantization.
I am using ollama models
Good job! thanks for sharing.
Thanks, Reacher.🥂
am just a starter in all this sorry if it is a silly question, but the code in the RAG notebook doesn't gives error when i try models like mixtral and with other models too.
Hi @GIrum, please feel free to share a screenshot of your error or create a new discussion, I'm sure you'll get help. Also, you might want to note that an edited version of the notebook is now available which supports CPU/GPUs depending on the available compute.
I know it is a bit late but I think you need to replace the model_type in the CTransformer class instantiation with mistral i.e. llm = CTransformers(model='mistral-7b-v1.0.gguf', # Location of downloaded GGML model
model_type='mistral',
batch_size=4,
config=config)
Edit: A CPU/GPU-compatible version was added. the previous notebook was modified. Cheers 🥂
About Fine-tuning approach, it is unusual when training to predict answer from question rather than from (question + relevant passages).
I mean: to my knowledge, you should find relevant passages before predicting answer.
Yeah, depending on your usecase. This is a baseline to get started. Conventionally, fine-tuning is mostly just question/answer pairs, introducing relevant context changes it to a rag based approach.
To rank in top positions, I'll assume that you have to combine both RAG and Fine-tuning techniques.
I agree. Thanks a lot @Professor for sharing your knowledge especially the RAG notebook.
Welcome @AdeptSchneider22
This is very educational. Thanks a lot.