tensorflow - tf.parse_example for examples with sequence of sequence data -

my tensorflow model takes in sequence of sequence data each example, namely, sequences of character tokens in sequence of words (e.g., [[3], [4,3],[6,1,20]]). able before padding 3d numpy array [batch_size, max_words_len, max_chars_len] , feeding placeholder.

in_question_chars = tf.placeholder(tf.int32,                                     [none, none, none],                                     name="in_question_chars") # example of other data in_question_words = tf.placeholder(tf.int32,                                     [none, none],                                     name="in_question_words")

but use google cloud machine learning engine online prediction/deployment. based on example tensorflow serving: https://github.com/tensorflow/serving/blob/master/tensorflow_serving/example/mnist_saved_model.py

i came don't know use feature parse sequence of sequence char tokens:

serialized_tf_example = tf.placeholder(tf.string, name='tf_example') feature_configs = {'in_question_chars':tf.fixedlensequencefeature(shape=[none],                                         allow_missing=true,                                         dtype=tf.int32,                                         default_value=0),                     'in_question_words':tf.fixedlensequencefeature(shape=[],                                         allow_missing=true,                                         dtype=tf.int32,                                         default_value=0)                    }  tf_example = tf.parse_example(serialized_tf_example, feature_configs)  in_question_chars = tf.identity(tf_example['in_question_chars'],                                  name='in_question_chars') # example of other data in_question_words = tf.identity(tf_example['in_question_words'],                                  name='in_question_words')

should use varlenfeature, turns sparsetensor (eventhough it's not sparse), , use tf.sparse_tensor_to_dense convert dense?

for next step, embedding each char token.

in_question_char_repres = tf.nn.embedding_lookup(char_embedding,                                                   in_question_chars)

so option keep sparsetensor , use tf.nn.embedding_lookup_sparse

i wasn't able find example of how should done. please let me know best practice. thanks!

edit 8/25/17

it doesn't seem allow me set none 2nd dimension.

here's abridged version of code

def read_dataset(filename, mode=tf.contrib.learn.modekeys.train):       def _input_fn():         num_epochs = max_epochs if mode == tf.contrib.learn.modekeys.train else 1          input_file_names = tf.train.match_filenames_once(str(filename))          filename_queue = tf.train.string_input_producer(             input_file_names, num_epochs=num_epochs, shuffle=true)         reader = tf.tfrecordreader()         _, serialized = reader.read_up_to(filename_queue, num_records=batch_size)          features_spec = {             correct_child_node_idx: tf.fixedlenfeature(shape=[],                                                dtype=tf.int64,                                                 default_value=0),             question_lengths: tf.fixedlenfeature(shape=[], dtype=tf.int64),             in_question_words: tf.fixedlensequencefeature(shape=[],                                                        allow_missing=true,                                                        dtype=tf.int64                                                       ),             question_char_lengths: tf.fixedlensequencefeature(shape=[],                                                            allow_missing=true,                                                            dtype=tf.int64                                                           ),             in_question_chars: tf.fixedlensequencefeature(shape=[none],                                                        allow_missing=true,                                                        dtype=tf.int64                                                       )             }         examples = tf.parse_example(serialized, features=features_spec)          label = examples[correct_child_node_idx]         return examples, label   # dict of features, label     return _input_fn

when have 'none' shape, gives me error:

    info:tensorflow:using default config. info:tensorflow:using config: {'_task_type': none, '_task_id': 0, '_cluster_spec': <tensorflow.python.training.server_lib.clusterspec object @ 0x7f57fc309c18>, '_master': '', '_num_ps_replicas': 0, '_num_worker_replicas': 0, '_environment': 'local', '_is_chief': true, '_evaluation_master': '', '_tf_config': gpu_options {   per_process_gpu_memory_fraction: 1.0 } , '_tf_random_seed': none, '_save_summary_steps': 100, '_save_checkpoints_secs': 600, '_log_step_count_steps': 100, '_session_config': none, '_save_checkpoints_steps': none, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_model_dir': 'outputdir'} warning:tensorflow:from /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/monitors.py:269: basemonitor.__init__ (from tensorflow.contrib.learn.python.learn.monitors) deprecated , removed after 2016-12-05. instructions updating: monitors deprecated. please use tf.train.sessionrunhook. --------------------------------------------------------------------------- invalidargumenterror                      traceback (most recent call last) /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/common_shapes.py in _call_cpp_shape_fn_impl(op, input_tensors_needed, input_tensors_as_shapes_needed, require_shape_fn)     653           graph_def_version, node_def_str, input_shapes, input_tensors, --> 654           input_tensors_as_shapes, status)     655   except errors.invalidargumenterror err:  /home/jupyter-admin/anaconda3/lib/python3.6/contextlib.py in __exit__(self, type, value, traceback)      88             try: ---> 89                 next(self.gen)      90             except stopiteration:  /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in raise_exception_on_not_ok_status()     465           compat.as_text(pywrap_tensorflow.tf_message(status)), --> 466           pywrap_tensorflow.tf_getcode(status))     467   finally:  invalidargumenterror: dense_shapes[2] has unknown rank or unknown inner dimensions: [?,?] 'parseexample/parseexample' (op: 'parseexample') input shapes: [?], [0], [], [], [], [], [], [], [], [], [], [0], [1], [], [], [0], [], [0], [0], [0].  during handling of above exception, exception occurred:  valueerror                                traceback (most recent call last) <ipython-input-45-392858a0e7b4> in <module>()      48       49 shutil.rmtree('outputdir', ignore_errors=true) # start fresh each time ---> 50 learn_runner.run(experiment_fn, 'outputdir')  /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py in run(experiment_fn, output_dir, schedule, run_config, hparams)     207   schedule = schedule or _get_default_schedule(run_config)     208  --> 209   return _execute_schedule(experiment, schedule)     210      211   /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py in _execute_schedule(experiment, schedule)      44     logging.error('allowed values experiment are: %s', valid_tasks)      45     raise typeerror('schedule references non-callable member %s' % schedule) ---> 46   return task()      47       48   /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/experiment.py in train_and_evaluate(self)     500             name=eval_dir_suffix, hooks=self._eval_hooks     501         )] --> 502       self.train(delay_secs=0)     503      504     eval_result = self._call_evaluate(input_fn=self._eval_input_fn,  /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/experiment.py in train(self, delay_secs)     278     return self._call_train(input_fn=self._train_input_fn,     279                             max_steps=self._train_steps, --> 280                             hooks=self._train_monitors + extra_hooks)     281      282   def evaluate(self, delay_secs=none, name=none):  /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/experiment.py in _call_train(self, _sentinel, input_fn, steps, hooks, max_steps)     675                                  steps=steps,     676                                  max_steps=max_steps, --> 677                                  monitors=hooks)     678      679   def _call_evaluate(self, _sentinel=none,  # pylint: disable=invalid-name,  /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py in new_func(*args, **kwargs)     294               'in future version' if date none else ('after %s' % date),     295               instructions) --> 296       return func(*args, **kwargs)     297     return tf_decorator.make_decorator(func, new_func, 'deprecated',     298                                        _add_deprecated_arg_notice_to_docstring(  /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py in fit(self, x, y, input_fn, steps, batch_size, monitors, max_steps)     456       hooks.append(basic_session_run_hooks.stopatstephook(steps, max_steps))     457  --> 458     loss = self._train_model(input_fn=input_fn, hooks=hooks)     459     logging.info('loss final step: %s.', loss)     460     return self  /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py in _train_model(self, input_fn, hooks)     954       random_seed.set_random_seed(self._config.tf_random_seed)     955       global_step = contrib_framework.create_global_step(g) --> 956       features, labels = input_fn()     957       self._check_inputs(features, labels)     958       model_fn_ops = self._get_train_ops(features, labels)  <ipython-input-44-fdb63ed72b90> in _input_fn()      35                                                           )      36             } ---> 37         examples = tf.parse_example(serialized, features=features_spec)      38       39         label = examples[correct_child_node_idx]  /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/parsing_ops.py in parse_example(serialized, features, name, example_names)     573   outputs = _parse_example_raw(     574       serialized, example_names, sparse_keys, sparse_types, dense_keys, --> 575       dense_types, dense_defaults, dense_shapes, name)     576   return _construct_sparse_tensors_for_sparse_features(features, outputs)     577   /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/parsing_ops.py in _parse_example_raw(serialized, names, sparse_keys, sparse_types, dense_keys, dense_types, dense_defaults, dense_shapes, name)     698         dense_keys=dense_keys,     699         dense_shapes=dense_shapes, --> 700         name=name)     701     # pylint: enable=protected-access     702   /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_parsing_ops.py in _parse_example(serialized, names, sparse_keys, dense_keys, dense_defaults, sparse_types, dense_shapes, name)     174                                 dense_defaults=dense_defaults,     175                                 sparse_types=sparse_types, --> 176                                 dense_shapes=dense_shapes, name=name)     177   return _parseexampleoutput._make(result)     178   /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py in apply_op(self, op_type_name, name, **keywords)     765         op = g.create_op(op_type_name, inputs, output_types, name=scope,     766                          input_types=input_types, attrs=attr_protos, --> 767                          op_def=op_def)     768         if output_structure:     769           outputs = op.outputs  /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in create_op(self, op_type, inputs, dtypes, input_types, name, attrs, op_def, compute_shapes, compute_device)    2630                     original_op=self._default_original_op, op_def=op_def)    2631     if compute_shapes: -> 2632       set_shapes_for_outputs(ret)    2633     self._add_op(ret)    2634     self._record_op_seen_by_control_dependencies(ret)  /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in set_shapes_for_outputs(op)    1909       shape_func = _call_cpp_shape_fn_and_require_op    1910  -> 1911   shapes = shape_func(op)    1912   if shapes none:    1913     raise runtimeerror(  /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in call_with_requiring(op)    1859     1860   def call_with_requiring(op): -> 1861     return call_cpp_shape_fn(op, require_shape_fn=true)    1862     1863   _call_cpp_shape_fn_and_require_op = call_with_requiring  /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/common_shapes.py in call_cpp_shape_fn(op, require_shape_fn)     593     res = _call_cpp_shape_fn_impl(op, input_tensors_needed,     594                                   input_tensors_as_shapes_needed, --> 595                                   require_shape_fn)     596     if not isinstance(res, dict):     597       # handles case _call_cpp_shape_fn_impl calls unknown_shape(op).  /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/common_shapes.py in _call_cpp_shape_fn_impl(op, input_tensors_needed, input_tensors_as_shapes_needed, require_shape_fn)     657       missing_shape_fn = true     658     else: --> 659       raise valueerror(err.message)     660      661   if missing_shape_fn:  valueerror: dense_shapes[2] has unknown rank or unknown inner dimensions: [?,?] 'parseexample/parseexample' (op: 'parseexample') input shapes: [?], [0], [], [], [], [], [], [], [], [], [], [0], [1], [], [], [0], [], [0], [0], [0].

currently, i'm getting around turning 2d sequence of sequence 1d sequence setting second dimension max_char_length , concatenating 1d array. keep first max_char_length char if it's longer max_char_length or pad zeros if it's shorter. seems work perhaps there's way can accept variable length sequence second dimension , padding in tf.parse_example or tf.train.batch.

edit: fixed confusing/wrong answer =)

so want tf.sequenceexample uses tf.parse_single_sequence_example rather tf.parse_example. allows have each feature in feature_list within example part of sequence, in case each feature can varlenfeature representing number of characters in word. unfortunately, doesn't work when want pass multiple sentences. have hacking around higher order functions , tf.sparse_concat:

i produced test program here: https://gist.github.com/elibixby/1c7a2497f96a457130241c59c676ebd4

the input (before serialization batch of sequenceexamples) looks like:

[[[5, 10], [5, 10, 20]],  [[0, 1, 2], [2, 1, 0], [0, 1, 2, 3]]]

the resulting sparsetensor looks like:

sparsetensorvalue(indices=array([[[0, 0, 0],     [0, 0, 1],     [0, 1, 0],     [0, 1, 1],     [0, 1, 2],     [1, 0, 0],     [1, 0, 1],     [1, 0, 2],     [1, 1, 0],     [1, 1, 1],     [1, 1, 2],     [1, 2, 0],     [1, 2, 1],     [1, 2, 2],     [1, 2, 3]]]), values=array([[ 5, 10,  5, 10, 20,  0,  1,  2,  2,  1,  0,  0,  1,  2,  3]]), dense_shape=array([[2, 3, 4]]))

which appears sparsetensor index=[sentence, word, letter]

Search This Blog

Force Net

tensorflow - tf.parse_example for examples with sequence of sequence data -

Comments

Post a Comment

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -