It is possible that you didn't optimally structure your code for multi-gpu training if you distributed it layer-wise. Generally training should scale roughly linearly with number of GPUs.
**TL;DR**: Use [`multi_gpu_model()`][6] from Keras.
<hr>
**TS;WM**:
From the [Tensorflow Guide][1]:
>If you have more than one GPU in your system, the GPU with the lowest ID will be selected by default.
If you want to use multiple GPUs, unfortunately you have to manually specify what tensors to put on each GPU like
with tf.device('/device:GPU:2'):
More info in the [Tensorflow Guide Using Multiple GPUs][2].
In terms of how to distribute your network over the multiple GPUs, there are two main ways of doing that.
1. You distribute your network layer-wise over the GPUs. This is easier to implement but will not yield a lot of performance benefit because the GPUs will wait for each other to complete the operation.
2. You create separate copies of your network, called "towers" on each GPU. When you feed the octuple network, you break up you input batch into 8 parts, and distribute them. Let the network forward propagate, then sum the gradients, and do the backward propagation. This will result in an [almost-linear speedup][3] with the number of GPUs. It's much more difficult to implement, however, because you also have to deal with complexities related to batch normalization, and very advisable to make sure you randomize your batch properly. There is [a nice tutorial here][4]. You should also review the [Inception V3 code][5] referenced there for ideas how to structure such a thing. Especially `_tower_loss()`, `_average_gradients()` and the part of `train()` starting with `for i in range(FLAGS.num_gpus):`.
In case you want to give Keras a try, it now has simplified multi-gpu training significantly with [`multi_gpu_model()`][6]. It can do all the heavy lifting for you.
[1]: https://www.tensorflow.org/programmers_guide/using_gpu
[2]: https://www.tensorflow.org/programmers_guide/using_gpu#using_multiple_gpus
[3]: https://www.researchgate.net/figure/TensorFlow-Inception-v3-Training-Scalable-Performance-on-multi-GPU-node_fig10_322673516
[4]: https://blog.rescale.com/deep-learning-with-multiple-gpus-on-rescale-tensorflow/
[5]: https://github.com/tensorflow/models/blob/f87a58cd96d45de73c9a8330a06b2ab56749a7fa/research/inception/inception/inception_train.py
[6]: https://keras.io/utils/#multi_gpu_model
**TL;DR**: Use [`multi_gpu_model()`][6] from Keras.
<hr>
**TS;WM**:
From the [Tensorflow Guide][1]:
>If you have more than one GPU in your system, the GPU with the lowest ID will be selected by default.
If you want to use multiple GPUs, unfortunately you have to manually specify what tensors to put on each GPU like
with tf.device('/device:GPU:2'):
More info in the [Tensorflow Guide Using Multiple GPUs][2].
In terms of how to distribute your network over the multiple GPUs, there are two main ways of doing that.
1. You distribute your network layer-wise over the GPUs. This is easier to implement but will not yield a lot of performance benefit because the GPUs will wait for each other to complete the operation.
2. You create separate copies of your network, called "towers" on each GPU. When you feed the octuple network, you break up you input batch into 8 parts, and distribute them. Let the network forward propagate, then sum the gradients, and do the backward propagation. This will result in an [almost-linear speedup][3] with the number of GPUs. It's much more difficult to implement, however, because you also have to deal with complexities related to batch normalization, and very advisable to make sure you randomize your batch properly. There is [a nice tutorial here][4]. You should also review the [Inception V3 code][5] referenced there for ideas how to structure such a thing. Especially `_tower_loss()`, `_average_gradients()` and the part of `train()` starting with `for i in range(FLAGS.num_gpus):`.
In case you want to give Keras a try, it now has simplified multi-gpu training significantly with [`multi_gpu_model()`][6]. It can do all the heavy lifting for you.
[1]: https://www.tensorflow.org/programmers_guide/using_gpu
[2]: https://www.tensorflow.org/programmers_guide/using_gpu#using_multiple_gpus
[3]: https://www.researchgate.net/figure/TensorFlow-Inception-v3-Training-Scalable-Performance-on-multi-GPU-node_fig10_322673516
[4]: https://blog.rescale.com/deep-learning-with-multiple-gpus-on-rescale-tensorflow/
[5]: https://github.com/tensorflow/models/blob/f87a58cd96d45de73c9a8330a06b2ab56749a7fa/research/inception/inception/inception_train.py
[6]: https://keras.io/utils/#multi_gpu_model