I am wondering if there is a way to run batch inference with Ollama, processing multiple inputs simultaneously to improve efficiency? I am a beginner and it takes so much time to run in my local machine (i checked it uses gpu and cpu, how can i enable gpu only also ?)
try working on collab or kaggle
Actually i started using colab and kaggle but since i am storing data to disk with chroma, every time i rerun the notebook i loose the data and i need to go into rerunning the preprocessing and indexing stage(which take too much time) so i worked with my local ressources. Also trying to seperate each component into several modules and classes and use them directly in notebook in the same repo.
i think there a feature called persistence in kaggle that make it possible to save output file and variables between sessions
I faced a problem with sqlite in kaggle in particular (i didn't face it in colab for instance). Anyway Thank you a lot gor your guidance ! very appreciated