tensorflow - tf.parse_example for examples with sequence of sequence data -
my tensorflow model takes in sequence of sequence data each example, namely, sequences of character tokens in sequence of words (e.g., [[3], [4,3],[6,1,20]]). able before padding 3d numpy array [batch_size, max_words_len, max_chars_len] , feeding placeholder.
in_question_chars = tf.placeholder(tf.int32, [none, none, none], name="in_question_chars") # example of other data in_question_words = tf.placeholder(tf.int32, [none, none], name="in_question_words")
but use google cloud machine learning engine online prediction/deployment. based on example tensorflow serving: https://github.com/tensorflow/serving/blob/master/tensorflow_serving/example/mnist_saved_model.py
i came don't know use feature parse sequence of sequence char tokens:
serialized_tf_example = tf.placeholder(tf.string, name='tf_example') feature_configs = {'in_question_chars':tf.fixedlensequencefeature(shape=[none], allow_missing=true, dtype=tf.int32, default_value=0), 'in_question_words':tf.fixedlensequencefeature(shape=[], allow_missing=true, dtype=tf.int32, default_value=0) } tf_example = tf.parse_example(serialized_tf_example, feature_configs) in_question_chars = tf.identity(tf_example['in_question_chars'], name='in_question_chars') # example of other data in_question_words = tf.identity(tf_example['in_question_words'], name='in_question_words')
should use varlenfeature, turns sparsetensor (eventhough it's not sparse), , use tf.sparse_tensor_to_dense convert dense?
for next step, embedding each char token.
in_question_char_repres = tf.nn.embedding_lookup(char_embedding, in_question_chars)
so option keep sparsetensor , use tf.nn.embedding_lookup_sparse
i wasn't able find example of how should done. please let me know best practice. thanks!
edit 8/25/17
it doesn't seem allow me set none 2nd dimension.
here's abridged version of code
def read_dataset(filename, mode=tf.contrib.learn.modekeys.train): def _input_fn(): num_epochs = max_epochs if mode == tf.contrib.learn.modekeys.train else 1 input_file_names = tf.train.match_filenames_once(str(filename)) filename_queue = tf.train.string_input_producer( input_file_names, num_epochs=num_epochs, shuffle=true) reader = tf.tfrecordreader() _, serialized = reader.read_up_to(filename_queue, num_records=batch_size) features_spec = { correct_child_node_idx: tf.fixedlenfeature(shape=[], dtype=tf.int64, default_value=0), question_lengths: tf.fixedlenfeature(shape=[], dtype=tf.int64), in_question_words: tf.fixedlensequencefeature(shape=[], allow_missing=true, dtype=tf.int64 ), question_char_lengths: tf.fixedlensequencefeature(shape=[], allow_missing=true, dtype=tf.int64 ), in_question_chars: tf.fixedlensequencefeature(shape=[none], allow_missing=true, dtype=tf.int64 ) } examples = tf.parse_example(serialized, features=features_spec) label = examples[correct_child_node_idx] return examples, label # dict of features, label return _input_fn
when have 'none' shape, gives me error:
info:tensorflow:using default config. info:tensorflow:using config: {'_task_type': none, '_task_id': 0, '_cluster_spec': <tensorflow.python.training.server_lib.clusterspec object @ 0x7f57fc309c18>, '_master': '', '_num_ps_replicas': 0, '_num_worker_replicas': 0, '_environment': 'local', '_is_chief': true, '_evaluation_master': '', '_tf_config': gpu_options { per_process_gpu_memory_fraction: 1.0 } , '_tf_random_seed': none, '_save_summary_steps': 100, '_save_checkpoints_secs': 600, '_log_step_count_steps': 100, '_session_config': none, '_save_checkpoints_steps': none, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_model_dir': 'outputdir'} warning:tensorflow:from /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/monitors.py:269: basemonitor.__init__ (from tensorflow.contrib.learn.python.learn.monitors) deprecated , removed after 2016-12-05. instructions updating: monitors deprecated. please use tf.train.sessionrunhook. --------------------------------------------------------------------------- invalidargumenterror traceback (most recent call last) /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/common_shapes.py in _call_cpp_shape_fn_impl(op, input_tensors_needed, input_tensors_as_shapes_needed, require_shape_fn) 653 graph_def_version, node_def_str, input_shapes, input_tensors, --> 654 input_tensors_as_shapes, status) 655 except errors.invalidargumenterror err: /home/jupyter-admin/anaconda3/lib/python3.6/contextlib.py in __exit__(self, type, value, traceback) 88 try: ---> 89 next(self.gen) 90 except stopiteration: /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in raise_exception_on_not_ok_status() 465 compat.as_text(pywrap_tensorflow.tf_message(status)), --> 466 pywrap_tensorflow.tf_getcode(status)) 467 finally: invalidargumenterror: dense_shapes[2] has unknown rank or unknown inner dimensions: [?,?] 'parseexample/parseexample' (op: 'parseexample') input shapes: [?], [0], [], [], [], [], [], [], [], [], [], [0], [1], [], [], [0], [], [0], [0], [0]. during handling of above exception, exception occurred: valueerror traceback (most recent call last) <ipython-input-45-392858a0e7b4> in <module>() 48 49 shutil.rmtree('outputdir', ignore_errors=true) # start fresh each time ---> 50 learn_runner.run(experiment_fn, 'outputdir') /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py in run(experiment_fn, output_dir, schedule, run_config, hparams) 207 schedule = schedule or _get_default_schedule(run_config) 208 --> 209 return _execute_schedule(experiment, schedule) 210 211 /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py in _execute_schedule(experiment, schedule) 44 logging.error('allowed values experiment are: %s', valid_tasks) 45 raise typeerror('schedule references non-callable member %s' % schedule) ---> 46 return task() 47 48 /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/experiment.py in train_and_evaluate(self) 500 name=eval_dir_suffix, hooks=self._eval_hooks 501 )] --> 502 self.train(delay_secs=0) 503 504 eval_result = self._call_evaluate(input_fn=self._eval_input_fn, /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/experiment.py in train(self, delay_secs) 278 return self._call_train(input_fn=self._train_input_fn, 279 max_steps=self._train_steps, --> 280 hooks=self._train_monitors + extra_hooks) 281 282 def evaluate(self, delay_secs=none, name=none): /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/experiment.py in _call_train(self, _sentinel, input_fn, steps, hooks, max_steps) 675 steps=steps, 676 max_steps=max_steps, --> 677 monitors=hooks) 678 679 def _call_evaluate(self, _sentinel=none, # pylint: disable=invalid-name, /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py in new_func(*args, **kwargs) 294 'in future version' if date none else ('after %s' % date), 295 instructions) --> 296 return func(*args, **kwargs) 297 return tf_decorator.make_decorator(func, new_func, 'deprecated', 298 _add_deprecated_arg_notice_to_docstring( /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py in fit(self, x, y, input_fn, steps, batch_size, monitors, max_steps) 456 hooks.append(basic_session_run_hooks.stopatstephook(steps, max_steps)) 457 --> 458 loss = self._train_model(input_fn=input_fn, hooks=hooks) 459 logging.info('loss final step: %s.', loss) 460 return self /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py in _train_model(self, input_fn, hooks) 954 random_seed.set_random_seed(self._config.tf_random_seed) 955 global_step = contrib_framework.create_global_step(g) --> 956 features, labels = input_fn() 957 self._check_inputs(features, labels) 958 model_fn_ops = self._get_train_ops(features, labels) <ipython-input-44-fdb63ed72b90> in _input_fn() 35 ) 36 } ---> 37 examples = tf.parse_example(serialized, features=features_spec) 38 39 label = examples[correct_child_node_idx] /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/parsing_ops.py in parse_example(serialized, features, name, example_names) 573 outputs = _parse_example_raw( 574 serialized, example_names, sparse_keys, sparse_types, dense_keys, --> 575 dense_types, dense_defaults, dense_shapes, name) 576 return _construct_sparse_tensors_for_sparse_features(features, outputs) 577 /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/parsing_ops.py in _parse_example_raw(serialized, names, sparse_keys, sparse_types, dense_keys, dense_types, dense_defaults, dense_shapes, name) 698 dense_keys=dense_keys, 699 dense_shapes=dense_shapes, --> 700 name=name) 701 # pylint: enable=protected-access 702 /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_parsing_ops.py in _parse_example(serialized, names, sparse_keys, dense_keys, dense_defaults, sparse_types, dense_shapes, name) 174 dense_defaults=dense_defaults, 175 sparse_types=sparse_types, --> 176 dense_shapes=dense_shapes, name=name) 177 return _parseexampleoutput._make(result) 178 /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py in apply_op(self, op_type_name, name, **keywords) 765 op = g.create_op(op_type_name, inputs, output_types, name=scope, 766 input_types=input_types, attrs=attr_protos, --> 767 op_def=op_def) 768 if output_structure: 769 outputs = op.outputs /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in create_op(self, op_type, inputs, dtypes, input_types, name, attrs, op_def, compute_shapes, compute_device) 2630 original_op=self._default_original_op, op_def=op_def) 2631 if compute_shapes: -> 2632 set_shapes_for_outputs(ret) 2633 self._add_op(ret) 2634 self._record_op_seen_by_control_dependencies(ret) /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in set_shapes_for_outputs(op) 1909 shape_func = _call_cpp_shape_fn_and_require_op 1910 -> 1911 shapes = shape_func(op) 1912 if shapes none: 1913 raise runtimeerror( /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in call_with_requiring(op) 1859 1860 def call_with_requiring(op): -> 1861 return call_cpp_shape_fn(op, require_shape_fn=true) 1862 1863 _call_cpp_shape_fn_and_require_op = call_with_requiring /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/common_shapes.py in call_cpp_shape_fn(op, require_shape_fn) 593 res = _call_cpp_shape_fn_impl(op, input_tensors_needed, 594 input_tensors_as_shapes_needed, --> 595 require_shape_fn) 596 if not isinstance(res, dict): 597 # handles case _call_cpp_shape_fn_impl calls unknown_shape(op). /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/common_shapes.py in _call_cpp_shape_fn_impl(op, input_tensors_needed, input_tensors_as_shapes_needed, require_shape_fn) 657 missing_shape_fn = true 658 else: --> 659 raise valueerror(err.message) 660 661 if missing_shape_fn: valueerror: dense_shapes[2] has unknown rank or unknown inner dimensions: [?,?] 'parseexample/parseexample' (op: 'parseexample') input shapes: [?], [0], [], [], [], [], [], [], [], [], [], [0], [1], [], [], [0], [], [0], [0], [0].
currently, i'm getting around turning 2d sequence of sequence 1d sequence setting second dimension max_char_length , concatenating 1d array. keep first max_char_length char if it's longer max_char_length or pad zeros if it's shorter. seems work perhaps there's way can accept variable length sequence second dimension , padding in tf.parse_example or tf.train.batch.
edit: fixed confusing/wrong answer =)
so want tf.sequenceexample
uses tf.parse_single_sequence_example
rather tf.parse_example
. allows have each feature in feature_list
within example part of sequence, in case each feature
can varlenfeature
representing number of characters in word. unfortunately, doesn't work when want pass multiple sentences. have hacking around higher order functions , tf.sparse_concat
:
i produced test program here: https://gist.github.com/elibixby/1c7a2497f96a457130241c59c676ebd4
the input (before serialization batch of sequenceexamples
) looks like:
[[[5, 10], [5, 10, 20]], [[0, 1, 2], [2, 1, 0], [0, 1, 2, 3]]]
the resulting sparsetensor
looks like:
sparsetensorvalue(indices=array([[[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 1, 1], [0, 1, 2], [1, 0, 0], [1, 0, 1], [1, 0, 2], [1, 1, 0], [1, 1, 1], [1, 1, 2], [1, 2, 0], [1, 2, 1], [1, 2, 2], [1, 2, 3]]]), values=array([[ 5, 10, 5, 10, 20, 0, 1, 2, 2, 1, 0, 0, 1, 2, 3]]), dense_shape=array([[2, 3, 4]]))
which appears sparsetensor index=[sentence, word, letter]
Comments
Post a Comment