Primary competition visual

Unifi Value Frameworks PDF Lifting Competition

Helping South Africa
$5 000 USD
Challenge completed over 1 year ago
Generative AI
450 joined
73 active
Starti
Dec 21, 21
Closei
Mar 17, 24
Reveali
Mar 17, 24
User avatar
Brainiac
A potential strategy for tackling this challenge.
Help · 15 Jan 2024, 10:51 · 3

One approach we can use for this challenge is called RAG which stands for Retrieval-Augmented Generation. It's a technique that combines the power of large language models with external knowledge retrieval. Essentially, RAG enhances the capabilities of a language model by allowing it to retrieve information from a large database or collection of documents.

This way, the model can access a vast amount of information beyond what it was trained on, making its responses more informed and potentially more accurate to the provided data. This approach is particularly useful for answering questions that require up-to-date or very specific information that might not be included in the training data of the model.

Here is how you can use RAG

1. PDF Preprocessing:

Convert the pdf reports into text format. This can be done using a PDF parser. The parser will read through the pdfs and extract all the text content. This is a crucial step because RAG systems work with text data.

2. Building a Database for Retrieval:

Once you have the text, the next step is to organize it into a searchable database. This means structuring the text in a way that it can be easily queried. For example, you might categorize the text by sections or topics found in the report data.

3. Setting Up the RAG System:

The next step is to integrate a large language model with the database (the parsed report data). The LLM model serves as the "brain" of the system, understanding queries and knowing what information to retrieve from the database.

4. Query Processing with RAG:

Next, when you have a specific piece of information you want to extract from the PDF, you ask the language model. For instance, "What were the air emissions of Carbon Dioxide for Absa in the year 2022?"

The large language model will internalize and comprehend the query and decide what information is relevant. It then uses the retrieval system to fetch this information from the database which contains the text extracted from the reports pdfs.

The whole process can then be iterated over the other AMKEYS and retrieve values from the reports data

N/B

Note that this is just one of the many approaches you can use for this kind of challenge.

Here are some resources to get you started:

Discussion 3 answers
User avatar
Impact_Insights

Thank you, this is helpful.

15 Jan 2024, 11:28
Upvotes 0

Thank you for sharing the roadmap!

16 Jan 2024, 03:36
Upvotes 0

your insights are truly valuable, thank you.

1 Feb 2024, 10:33
Upvotes 0