[Coursera] Intro to TensorFlow (2)

Coursera 강의 “Machine Learning with TensorFlow on Google Cloud Platform” 중 세 번째 코스인 Intro to TensorFlow의 2주차 강의노트입니다.

Estimator API

Estimators wrap up a large amount of boilerplate code, on top of the model itself.

From small to big to prod with the Estimator API
- Quick model
- Checkpointing
- Out-of-memory datasets
- Train / eval / monitor
- Distributed training
- Hyper-parameter tuning on ML-Engine
- Production: serving predictions from a trained model
Pre-made estimators that can all be used in the same way.

tf.estimator.Estimator

Pre-made Estimators

Feature columns tell the model what inputs to expect

import tensorflow as tf

featcols = [ tf.feature_column.numeric_column("sq_footage"), tf.feature_column.categorical_column_with_vocabulary_list("type", ["house", "apt"]) ]

model = tf.estimator.LinearRegressor(featcols)

Under the hood: feature columns take care of packing the inputs into the input vector of the model
- tf.feature_column.bucketized_column
- tf.feature_column.embedding_column
- tf.feature_column.crossed_column
- tf.feature_column.categorical_column_with_hash_bucket
- …
Training: feed in training input data and train for 100 epochs

def train_input_fn():
	features = {"sq_footage": [1000,   2000,   3000,   1000,   2000,   3000],
			  "type":       ["house", "house", "house", "apt", "apt", "apt"]}
       labels =                              [500,    1000,   1500,   700,    1300,   1900]
       return features, labels

model.train(train_input_fn, steps=100)

Predictions: once trained, the model can be used for prediction

def predict_input_fn():
	features = {"sq_footage": [1500, 1800],
				"type":       ["house", "apt"]}
	return features

predictions = model.predict(predict_input_fn)

To use a different pre-made estimator, just change the class name and supply appropriate parameters

model = tf.estimator.DNNRegressor(featcols, hidden_units=[3, 2])

Checkpointing

Model checkpoints
1. Continue training
2. Resume on failure
3. Predict from trained model
Estimators automatically checkpoint training

model = tf.estimator.LinearRegressor(featcols, './model_trained') # Where to put the checkpoints
model.train(train_input_fn, steps=100)

%ls model_trained

checkpoint                                                     model.ckpt-100.meta
graph.pbtxt                                                    model.ckpt-1.data-00000-of-00001
model.ckpt-100.data-00000-of-00001       model.ckpt-1.index
model.ckpt-100.index                                   model.ckpt-1.meta

We can now restore and predict with the model

trained_model = tf.estimator.LinearRegressor(featcols, './model_trained')
predictions = trained_model.predict(pred_input_fn)

INFO:tensorflow:Restoring parameters from
model_trained/model.ckpt-100

{'predictions': array([855.93], dtype=float32)}
{'predictions': array([859.07], dtype=float32)}

Training also resumes from the last checkpoint

Training on in-memory datasets

In memory data: usually numpy arrays or Pandas dataframes
- tf.estimator.inputs.numpy_input_fn
- tf.estimator.inputs.pandas_input_fn
Training happens until input is exhausted or number of steps is reached

def pandas_train_input_fn(df): # a Pandas dataframe
	return tf.estimator.inputs.pandas_input_fn(
			x = df,
			y = df['price'],
			batch_size=128,
			num_epochs=10,
			shuffle=True
	)

# Trains until input exhausted (10 epochs) starting from checkpoint
model.train(pandas_train_input_fn(df))
# 1000 additional steps from checkpoint
model.train(pandas_train_fn(df), steps=1000)
# 1000 steps - might be nothing if checkpoint already there
model.train(pandas_train_input_fn(df), max_steps=1000)

To add a new feature, add it to the list of feature columns and make sure it is present in data frame

Train on large datasets with Dataset API

Real World ML Models

Reak World ML Models

Out-of memory datasets tend to be sharded into multiple files
Datasets can be created from different file formats. They generate input functions for Estimators
Read one CSV file using TextLineDataset
Datasets handle shuffling, epochs, batching, …
They support arbitrary transformations with map()
Datasets help create input_fn’s for Estimators

def decode_line(row):
	cols = tf.decode_csv(row, recode_defaults=[[0],['house'],[0]])
	features = {'sq_footage': cols[0], 'type': cols[1]}
	label = cols[2] # price
	return features, label

dataset = tf.data.TextLineDataset("train_1.csv").map(decode_line)

dataset = dataset.shuffle(1000).repeat(15).batch(128)

def input_fn():
	features, label = dataset.make_one_shot_iterator().get_next()
	return features, label

model.train(input_fn)

All the tf.commands that you write in Python do not actually process any data, they just build graphs.
Common Misconceptions about input_fn
1. Input functions called only once
2. Input functions return tf nodes (not data)
The real benefit of Dataset is that you can do more than just ingest data

dataset = tf.data.TextLineDataset(filename)\
							.skip(num_header_lines)\
							.map(add_key)\
							.map(lambda feats, labels: preproc(feats), labels)
							.filter(is_valid)\
							.cache()

Big jobs, Distributed training

estimator.train_and_evaluate is the preferred method for training real-world models.
data parallelism = replicate your model on multiple workers

Distributed training using dataparallelism

estimator.train_and_evaluate is the preferred method for training real-world models

estimator = tf.estimator.LineRegressor(
						feature_colimns=featcols,
						config=run_config)
...
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

RunConfig tells the estimator where and how often to write Checkpoints and Tensorboard logs (“summaries”)

run_config = tf.estimator.RunConfig(
						model_dir=output_dir,
						save_summary_steps=100,
						save_checkpoints_steps=2000)

estimator = tf.estimator.LineRegressor(config=run_config, ...)

The TrainSpec tells the estimator how to get training data

train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn, max_steps=50000)
...
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

The EvalSpec controls the evaluation and the checkpointing of the model since they happen at the same time

eval_spec = tf.estimator.EvalSpec(
					input_fn=eval_input_fn,
					steps=100, # evals on 100 batches
					throttle_secs=600, # eval no more than every 10 min
					exporters=...)

tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

Shuffling is even more important in distributed training

dataset = tf.data.Dataset.list_files("train.csv-*") \
							.shuffle(100)                     \
							.flat_map(tf.data.TextLineDataset)\
							.map(decode_csv)

dataset = dataset.shuffle(1000) \
							.repeat(15)   \
							.batch(128)

Monitoring with TensorBoard

Point Tensorboard to your output directory and the dashboards appear in your browser at localhost:6006
Pre-made Estimators export relevant metrics, embedding, histograms, etc. for TensorBoard, so there is nothing more to do

The dashboard for the graph

If you are writing a custom Estimator model, you can add summaries for Tensorboard with a single line.
- Sprinkle appropriate summary ops throughout your code:
- tf.summary.scalar
- tf.summary.image
- tf.summary.audio
- tf.summary.text
- tf.summary.histogram

tf.summary.scalar('meanVarl', tf.reduce_mean(varl))
...
tf.summary.text('outClass', stringvar)

Serving Input Function

Recap with all the code

run_config = tf.estimator.RunConfig(model_dir=output_dir, ...)

estimator = tf.estimator.LineRegressor(featcols, config=run_config)

train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn, max_steps=1000)

export_latest = tf.estimator.LatestExporter(serving_input_receiver_fn=serving_input_fn)

eval_spec = tf.estimator.EvalSpec(input_fn=eval_input_fn, exporters=export_latest)

tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

Serving and training-time inputs are often very different

Serving input function transforms from parsed JSON data to the data your model expects

The exported model is ready to deploy
Example serving input function that decodes JPEGs

def serving_input_fn():
	json = {'jpeg_bytes': tf.placeholder(tf.string, [None])}

def decode(jpeg):
	pixels = tf.image.decode_jpeg(jpeg, channels=3)
	return pixels

pics = tf.map_fn(decode, json['jpeg_bytes'], dtype=tf.unit8)

features = {'pics': pics}
return tf.estimator.export.ServingInputReceiver(features, json)