
Is there anyone who has tried using his or her fine-tuned LLM for this task in his or her RAG system? How are you handling the limited context window? I'm aware you can use Langchain to limit the chunks plus query that'll be passed upon querying but that may reduce the accuracy of the RAG system. How can one increase an LLM context window? I saw the Gemma Large Language Model has a 20,000 context window. Anyone who has successfully fine-tuned Gemma for this task can jump into this discussion.
Even after setting my max_seq_length to 2048 during Supervised Finetuning as follows: trainer = SFTTrainer(
model = model,
train_dataset = dataset['train'],
dataset_text_field = "text",
max_seq_length = 2048,
args = training_args,
)
I'd appreciate it if someone could clarify whether this max_seq_length translates to a model's context window.