tensorflow optimizing sparse_tensor_dense_matmul operation on GPU -
is optimizing sparse_tensor_dense_matmul operation possible in tensorflow on gpu? use tensoflow 1.2.1 cuda 8. error example:
import tensorflow tf tf.device('/gpu:0'): st = tf.sparsetensor( tf.constant([[0, 0], [1, 1]], dtype=tf.int64), tf.constant([1.2, 3.4], dtype=tf.float32), tf.constant([2, 2], dtype=tf.int64) ) v = tf.variable([[1.0, 0.0], [0.0, 1.0]], dtype=tf.float32) st = tf.sparse_tensor_dense_matmul(st, v) st = tf.reduce_min(st) optimizer = tf.train.adamoptimizer() trainer = optimizer.minimize(st) tf.session() sess: print(sess.run(trainer)) results in following error:
traceback (most recent call last): file "test_tf3.py", line 18, in <module> print(sess.run(trainer)) file "/media/awork/home/astepochkin/drecs/repo/env/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 895, in run run_metadata_ptr) file "/media/awork/home/astepochkin/drecs/repo/env/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1124, in _run feed_dict_tensor, options, run_metadata) file "/media/awork/home/astepochkin/drecs/repo/env/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run options, run_metadata) file "/media/awork/home/astepochkin/drecs/repo/env/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.invalidargumenterror: cannot assign device operation 'gradients/sparsetensordensematmul/sparsetensordensematmul_grad/strided_slice_1': not satisfy explicit device specification '/device:gpu:0' because no supported kernel gpu devices available. [[node: gradients/sparsetensordensematmul/sparsetensordensematmul_grad/strided_slice_1 = stridedslice[index=dt_int32, t=dt_int64, begin_mask=1, ellipsis_mask=0, end_mask=1, new_axis_mask=0, shrink_axis_mask=2, _device="/device:gpu:0"](const, gradients/sparsetensordensematmul/sparsetensordensematmul_grad/strided_slice_1/stack, gradients/sparsetensordensematmul/sparsetensordensematmul_grad/strided_slice_1/stack_1, gradients/sparsetensordensematmul/sparsetensordensematmul_grad/strided_slice_1/stack_2)]]
it may make sense disable hard device placement:
import tensorflow tf tf.device('/gpu:0'): st = tf.sparsetensor( tf.constant([[0, 0], [1, 1]], dtype=tf.int64), tf.constant([1.2, 3.4], dtype=tf.float32), tf.constant([2, 2], dtype=tf.int64) ) v = tf.variable([[1.0, 0.0], [0.0, 1.0]], dtype=tf.float32) st = tf.sparse_tensor_dense_matmul(st, v) st = tf.reduce_min(st) optimizer = tf.train.adamoptimizer() trainer = optimizer.minimize(st) tf.session(config=tf.configproto(allow_soft_placement=true)) sess: sess.run(tf.global_variables_initializer()) print(sess.run(trainer)) you can log device placements, may useful figuring out whether kernels care on gpu.
there host memory fake gpu kernels registered int32 strided slice, not int64. open pull request / feature request on github add int64 host memory kernels (effectively copying int32 versions) if need/want hard device placement.
for background, strided slice getting used in gradient of sparsetensordensematmul. there's no benefit running these kinds of indexing operations on gpu, registered gpu kernels run on cpu in order avoid kinds of hard device placement bookkeeping issues you've run into.
Comments
Post a Comment