In my local experiments. I've seen that with the training dataset that Falcon 7B is underperforming compared to Phi-2 or evan RAG (+ sementic similarity). But I'd like to know if somebody is doing goof with Falcon-7B. It seems that Falcon 7B is struggling with MCQ in the telecommunication area according to this paper https://arxiv.org/pdf/2402.15818. While playing with the 60 first training questions, Falcon is 30% accurate.
Its not doing well for me. Its even difficult for me to get it to follow the instructions of generating the answer choice.
Same here too. It's performing bad. seeming not so good with long prompts. or better still one need to research into hyperparameter settings that can handle long prompts well