[Coursera] Art and Science of Machine Learning (2)

Machine Learning with TensorFlow on GCP

Posted by cyc1am3n on October 26, 2018

Coursera 강의 “Machine Learning with TensorFlow on Google Cloud Platform” 중 다섯 번째 코스인 Art and Science of Machine Learning의 강의노트입니다.

Review Embedding

• Creating an embedding column from a feature cross.
• The weights in the embedding column are learned from data.
• The model learns how to embed the feature cross in lower-dimensional space
• Embedding a feature cross in TensorFlow
• Transfer Learning of embeddings from similar ML models
• First layer: the feature cross
• Second layer: a mystery box labeled latent factor
• Third layer: the embedding
• Fourth layer: one side: image of traffic
• Second side: image of people watching TV

Recommendations

• Using a second dimension gives us more freedom in organizing movies by similarity
• A d-dimensional embedding assumes that user interest in movies can be approximated by d aspects (d < N) Data-driven Embeddings

• We could give the axes names, but it is not essential
• Its’ easier to train a model with d inputs than a model with N inputs
• Embeddings can be learned from data Sparse Tensors

• Dense representations are inefficient in space and compute

• So, use a sparse representation to hold the example

• Build a dictionary mapping each feature to an integer from 0, … # movies -1
• Efficiently represent the sparse vector as just the movies the user watched
• Representing feature columns as sparse vectors (These are all different ways to create a categorical column)

• If you know the keys beforehand:
• If your data is already indexed: i.e., has integers in[0-N):
• If you don’t have a vocabulary of all possible values:

Train an Embedding

• Embedding are feature columns that function like layers
• The weights in the embedding layer are learned through backprop just as with other weights
• Embeddings can be thought of as latent features. Similarity Property

• Embeddings provides dimensionality reduction. • You can take advantage of this similarity property of embeddings

• A good starting point for number of embedding dimensions

• Higher dimensions → more accuracy
• Higher dimensions → overfitting, slow training

Custom Estimator

• Estimator provides a lot of benefits
• Canned Estimators are sometimes insufficient • Suppose that you want to use a model structure from a research paper…

• Implement the model using low-level TensorFlow ops
• How do we wrap this custom model into Estimator framework?

• Create train_and_evaluate function with the base-class Estimator

• myfunc (above) is a EstimatorSpec.

• The 6 things in a EstimatorSpec
1. Mode is pass-through
2. Any tensors you want to return
3. Loss metric
4. Training op
5. Eval ops
6. Export outputs
• The ops are set up in the appropriate mode

Keras Models

• Keras is high-level deep neural networks library that supports multiple backends
• Keras is easy to use for fast prototyping
• From a compiled Keras model, you can get an Estimator
• You will use this estimator the way you normally use an estimator
• The connection between the input features and Keras is through a naming convention