Specializing Large Language Models for Telecom Networks 📡

Specializing Large Language Models for Telecom Networks by ITU AI/ML in 5G Challenge

€6 000 EUR

Completed (over 1 year ago)

Skills you will learn

Generative AI

468 joined

131 active

Info Data Chat Leaderboard

Start

May 07, 24

Jul 26, 24

Reveal

Jul 26, 24

About

The dataset shared with the participants is composed by multiple choice questions related to 3GPP standards included in the TeleQnA. TeleQnA is a comprehensive dataset tailored to assess the knowledge of LLMs in the field of telecommunications.

It encompasses 1827 multiple-choice questions distributed across two distinct categories:

Standards overview: This category consists of 318 questions related to summaries of standards from 3GPP standards.
Standards specifications: With 1509 questions, this category explores the technical specifications and practical implementations of telecommunications systems, leveraging information from 3GPP documents.

The 1827 MCQs are also divided in 1461 and 366 questions that compose the train and test1 sets.

For more in-depth information about the dataset and the generation process, please refer to [2].

Each question is represented in JSON format, comprising five distinct fields:

Question: This field consists of a string that presents the question associated with a specific concept within the telecommunications domain.
Options: This field comprises a set of strings representing the various answer options.
Answer: This field contains a string that adheres to the format ’option ID: Answer’ and presents the correct response to the question. A single option is correct; however, options may include choices like “All of the Above” or “Both options 1 and 2”.
Explanation: This field encompasses a string that clarifies the reasoning behind the correct answer.
Category: This field includes a label identifying the source category (e.g., lexicon, research overview, etc.).

The test1 MCQs do not include neither the correct answer nor the related explanation.

To request computational resources for running Falcon-7B, please fill the following form https://forms.office.com/r/Dx2jN5SWG8

Here there is one example from the dataset:

question 2045: {

"question": "What is the maximum number of eigenmodes that the MIMO channel can support? (nt is the number of transmit antennas, nr is the number of receive antennas)",

"option 1": "nt",

"option 2": "nr",

"option 3": "min(nt, nr)",

"option 4": "max(nt, nr)",

"answer": "option 3: min(nt, nr)",

"explanation": "The maximum number of eigenmodes that the MIMO channel can support is min(nt, nr).",

"category": "Research publications"}

Files

Description

Files

Is an example of what your submission file should look like. The order of the rows does not matter, but the names of the "ID" must be correct. Note: This SampleSubmission is longer than testing1.txt this is because more questions will be added later on in the challenge.

This file contains the target for the training.txt file.

Here is additional testing data.

This is the file you will apply your model to.

This is the file you will train your model on.

This is the corpus of technical documents that you can use e.g., as input for your RAG to provide additional context to the LLM

Join the largest network for
data scientists and AI builders

About FAQs

Status