tensorflow optimizing sparse_tensor_dense_matmul operation on GPU -
is optimizing sparse_tensor_dense_matmul operation possible in tensorflow on gpu? use tensoflow 1.2.1 cuda 8. error example:
import tensorflow tf tf.device('/gpu:0'): st = tf.sparsetensor( tf.constant([[0, 0], [1, 1]], dtype=tf.int64), tf.constant([1.2, 3.4], dtype=tf.float32), tf.constant([2, 2], dtype=tf.int64) ) v = tf.variable([[1.0, 0.0], [0.0, 1.0]], dtype=tf.float32) st = tf.sparse_tensor_dense_matmul(st, v) st = tf.reduce_min(st) optimizer = tf.train.adamoptimizer() trainer = optimizer.minimize(st) tf.session() sess: print(sess.run(trainer))
results in following error:
traceback (most recent call last): file "test_tf3.py", line 18, in <module> print(sess.run(trainer)) file "/media/awork/home/astepochkin/drecs/repo/env/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 895, in run run_metadata_ptr) file "/media/awork/home/astepochkin/drecs/repo/env/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1124, in _run feed_dict_tensor, options, run_metadata) file "/media/awork/home/astepochkin/drecs/repo/env/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run options, run_metadata) file "/media/awork/home/astepochkin/drecs/repo/env/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.invalidargumenterror: cannot assign device operation 'gradients/sparsetensordensematmul/sparsetensordensematmul_grad/strided_slice_1': not satisfy explicit device specification '/device:gpu:0' because no supported kernel gpu devices available. [[node: gradients/sparsetensordensematmul/sparsetensordensematmul_grad/strided_slice_1 = stridedslice[index=dt_int32, t=dt_int64, begin_mask=1, ellipsis_mask=0, end_mask=1, new_axis_mask=0, shrink_axis_mask=2, _device="/device:gpu:0"](const, gradients/sparsetensordensematmul/sparsetensordensematmul_grad/strided_slice_1/stack, gradients/sparsetensordensematmul/sparsetensordensematmul_grad/strided_slice_1/stack_1, gradients/sparsetensordensematmul/sparsetensordensematmul_grad/strided_slice_1/stack_2)]]
it may make sense disable hard device placement:
import tensorflow tf tf.device('/gpu:0'): st = tf.sparsetensor( tf.constant([[0, 0], [1, 1]], dtype=tf.int64), tf.constant([1.2, 3.4], dtype=tf.float32), tf.constant([2, 2], dtype=tf.int64) ) v = tf.variable([[1.0, 0.0], [0.0, 1.0]], dtype=tf.float32) st = tf.sparse_tensor_dense_matmul(st, v) st = tf.reduce_min(st) optimizer = tf.train.adamoptimizer() trainer = optimizer.minimize(st) tf.session(config=tf.configproto(allow_soft_placement=true)) sess: sess.run(tf.global_variables_initializer()) print(sess.run(trainer))
you can log device placements, may useful figuring out whether kernels care on gpu.
there host memory fake gpu kernels registered int32 strided slice, not int64
. open pull request / feature request on github add int64
host memory kernels (effectively copying int32
versions) if need/want hard device placement.
for background, strided slice getting used in gradient of sparsetensordensematmul
. there's no benefit running these kinds of indexing operations on gpu, registered gpu kernels run on cpu in order avoid kinds of hard device placement bookkeeping issues you've run into.
Comments
Post a Comment