@ZindiAdmins, With the above statement, is it safe to assume that we CANNOT calibrate our probabilities by any manual process. The ones that are predicted can be stacked / blended, but we cannot setup any manual calibration stretegy?
Would appriciate your response.
Correct!
Thank you, and I believe even clipping would not be deemed acceptable? I read somewhere in the forum that someone was suggesting to clip your obvious cases to higher/lower values.
Hello, Zindi! In my opinion, in our situation with data some products can't be connected with another product. We can find this in statistic values. It will be prediction, not from model, but it called prediction. In that case we should round them. Fro example, if we have 1s product with abstract name 'YYY' we see in train that product 'OOO' have 0s for all 1s 'YYY'. It means that these products can't combine and we must to round 'OOO' to ~0 for each client, who have 1s on 'YYY'.
And for importance P-value, it's a statistically significant difference
What about the probabilities for pairs "customer_id - product_id" in test, for which we know exactly, that this customer has this particaular product (in SampleSubmission.csv we have 1.0 for these pairs)? Do we need to make predictions for these pairs too, or labels for them can be equal to 1.0?
refer to the sample submission. Products that the customer_id already has reflected with a prob of 1.
@darrel Yes, I see it, just wanted to clarify)
Ok, I just round my prediction for these values to 1e-53 or 1-(1e-53) and not to 0 or 1) Thank you!
ye, let's wait them
You can clip or round for the known values, however, please do not clip or round your predictions.
What about statistic values from our train data? In rules we have no information how we must do predictions(I mean way). Statistics-prediction too and I think if we use machine learning model and than statistics values, we don't break the rules. We do it properly, because it is our prediction from statistics.
For me "known values" are statistics values