I was looking at demo code submitted by lelapa team. How can they use 0,1,2 to calculate the log softmax ? I checked the vocabulary of lelapa and 0,1,2 does not correspond to positive, negative or neutral in respective language. Can anyone give clarification on this.
Labels 0,1,2 aren't for Sentiment analysis, but for AfriXNLI. We have 2 languages, each having 3 different sentiments, thus 6 different combinations for Sentiment analysis. I hope my understanding does not mislead you.
this is the code: it is being used for both xnli and sentiment task
if task != "mmt":
with torch.no_grad():
logits = model(
**batch
).logits # Shape: [batch_size, seq_length, vocab_size]
log_probs = torch.nn.functional.log_softmax(
logits, dim=-1
) # Shape: [batch_size, seq_length, vocab_size]
# compute the log-likelihood of the target tokens
t_labels = (
torch.tensor([0, 1, 2])
.unsqueeze(0)
.unsqueeze(0)
.expand(
batch["input_ids"].size(0), batch["input_ids"].size(1), -1
)
.to(model.device)
)
# Gathering log-likelihoods for the labels
log_likelihoods_per_class = log_probs.gather(
2, t_labels
) # Shape: [batch_size, seq_length, 3]
# sum or average over the sequence to get a final score
log_likelihoods_per_class = log_likelihoods_per_class.mean(
dim=1
).cpu().numpy() # Shape: [batch_size, 3]
I will revert back to this tomorrow. But try checking the eval.py function and see how the evaluation is actually being performed. I believe the answer is there
Sure. Take your time. I checked the eval function, the lelapa is not performing very good. I think they have just done pre training
Hi ML-GOD,
Yes, InkubaLM is the pre-trained models