Coursera 강의 “Machine Learning with TensorFlow on Google Cloud Platform” 중 세 번째 코스인 Intro to TensorFlow의 2주차 강의노트입니다.
Estimator API
Estimators
wrap up a large amount of boilerplate code, on top of the model itself.
- From small to big to prod with the
Estimator API
- Quick model
- Checkpointing
- Out-of-memory datasets
- Train / eval / monitor
- Distributed training
- Hyper-parameter tuning on ML-Engine
- Production: serving predictions from a trained model
Pre-made estimators
that can all be used in the same way.
tf.estimator.Estimator
Pre-made Estimators
- Feature columns tell the model
what inputs to expect
- Under the hood: feature columns take care of packing the inputs into the input vector of the model
- tf.feature_column.
bucketized_column
- tf.feature_column.
embedding_column
- tf.feature_column.
crossed_column
- tf.feature_column.
categorical_column_with_hash_bucket
- …
Training
: feed in training input data and train for 100 epochs
Predictions
: once trained, the model can be used for prediction
- To use a different pre-made estimator, just change the class name and supply appropriate parameters
Checkpointing
- Model checkpoints
- Continue training
- Resume on failure
- Predict from trained model
- Estimators
automatically
checkpoint training
- We can now restore and predict with the model
- Training also resumes from the last checkpoint
Training on in-memory datasets
- In memory data: usually numpy arrays or Pandas dataframes
tf.estimator.inputs.numpy_input_fn
tf.estimator.inputs.pandas_input_fn
- Training happens until input is exhausted or number of steps is reached
- To add a new feature, add it to the list of feature columns and make sure it is present in data frame
Train on large datasets with Dataset API
Reak World ML Models
- Out-of memory datasets tend to be
sharded into multiple files
- Datasets can be created from different file formats. They generate
input functions
for Estimators
- Read one CSV file using
TextLineDataset
- Datasets handle shuffling, epochs, batching, …
- They support arbitrary transformations with
map()
- Datasets help create
input_fn
’s for Estimators
- All the tf.commands that you write in Python do not actually process any data, they just build graphs.
- Common Misconceptions about
input_fn
- Input functions called
only once
- Input functions return
tf nodes
(not data)
- The real benefit of Dataset is that you can do more than just ingest data
Big jobs, Distributed training
estimator.train_and_evaluate
is the preferred method for training real-world models.
- data parallelism = replicate your model on multiple workers
Distributed training using dataparallelism
estimator.train_and_evaluate
is the preferred method for training real-world models
RunConfig
tells the estimator where and how often to write Checkpoints and Tensorboard logs (“summaries”)
- The
TrainSpec
tells the estimator how to get training data
- The
EvalSpec
controls the evaluation and the checkpointing of the model since they happen at the same time
Shuffling
is even more important in distributed training
Monitoring with TensorBoard
- Point Tensorboard to your output directory and the dashboards appear in your browser at localhost:6006
Pre-made
Estimators export relevant metrics, embedding, histograms, etc. for TensorBoard, so there is nothing more to do
The dashboard for the graph
- If you are writing a custom Estimator model, you can add
summaries
for Tensorboard with a single line.
- Sprinkle appropriate summary ops throughout your code:
tf.summary.scalar
tf.summary.image
tf.summary.audio
tf.summary.text
tf.summary.histogram
Serving
and training
-time inputs are often very different
- Serving input function transforms from
parsed JSON data
to the data your model expects
- The exported model is ready to
deploy
- Example serving input function that decodes JPEGs