In our previous episode, Boian Mitov said: > BTW: The CUDA as example simply uses a thread pool. Maintaining a number of > cores pool will very much eliminate the thread creation. True, but CUDA single threaded. So you would need a threadpool per thread that can do _paralel_