While this is certainly a cool concept, local voice assistants like this are currently a novelty. Cool to play around with, though!
You can expect around 5 seconds processing time to start generating the response to a basic question on a very basic model like Llama 3 8B.
For context, using Moondream2 (as recommended) on a RasPi 5, it takes around 50 seconds to process an image taken by the Camera and start generating a description.
While this is certainly a cool concept, local voice assistants like this are currently a novelty. Cool to play around with, though!
You can expect around 5 seconds processing time to start generating the response to a basic question on a very basic model like Llama 3 8B.
For context, using Moondream2 (as recommended) on a RasPi 5, it takes around 50 seconds to process an image taken by the Camera and start generating a description.