https://colab.research.google.com/drive/18ySAIJ-pfdlaXeLouaT19DSmMFfs3HrE?usp=sharing
Hi All above is a notebook I have been working on, testing to improve inference speeds to be below 100ms as per the constraints. You can run the whole notebook on colab and test the results there as well.
The first inference method (321.28 ms) gives a score of 0.38 which means for fast inference the accuracy is sacrifised a little bit. Please review and share your comments.
Also note Ram Usage is Below 2GB at 1.18GB
The inference speed constraint of 100ms is per vignette and not for the entire test set. I don't know if that helps
In the notebook it's per vignette. Only that it's an everage (Average time per vignette)
Alright
I will check the notebook
Thank you Great work!! that's a clear notebook
Great man.
very nice.
no need to share how you do it but am curious if you are able to hit that 100ms?
The inference time should be per vignette not the calculated average inference time per vignette after running the samples throught the model in batches (multiparallel). This is what your code does. As it stands, there is no model between 100M and 200M that meets the constraint of generating full response below 100ms per vignette. I have tested this out with all forms of tweaks (except int1-3).
That's what I noted as well. Something that needs checking and improvement.