The challenge description has been updated with some awesome resources. Check out the sections 'Tutorials' and 'Examples of open-source tools and resources' for some helpful links. We've also added links to various databases and libraries in the Data section. These resources can be useful for assembling custom datasets to fine-tune text embedding models in your RAG solutions.
Thanks.
Also, are we allowed to use other available models for query expansion, etc such as Llama 3 or Command-R-Plus? Should the used models have a license that permits commercial use or it is okay to use experimental models available for experimentation that can be accessed after paying for a license to run it locally? PS: Command-R-Plus has been seen to improve RAG pipelines.
I don't think Command R is useful for this project as we are not focusing on the generation part but retrieval.
Thank you for your question! To maintain fairness in the competition, we kindly request all participants to utilize the same model (llama2:7B) for query expansion and other query pre-processing tasks (let us know if you experience any technical issues with using that specific model). While other tools are permissible, they must adhere to the following criteria: they must be open-source, free, and capable of running fully locally without relying on any external APIs.
So, we aren't even allowed to use Llama 3? It's open-source and available to everyone. I have been working on building a module that can take text from documents and represent the information in a Knowledge Graph in nodes with relationship. I have been pulling my hair trying to figure out how I can automate the knowledge graph generation process from pieces of text obtained from documents. Yesterday I did a comprehensive research and it turns out there is an LLMGraphTransformer that takes text from documents and generates a knowledge graph. It needs a Large Language Model to generate Cypher queries that will be used to generate the knowledge graph. I'm opting for the second alternative of having multi-stage retrieval with a knowledge graph.
Yeah. I realized that. Plus paying for the license is way expensive.
Is it okay to use Llama 3 8B then given that it is available and open to everyone participating in the competition?