Configuration of Optimization Algorithms and Hyperparameters¶
After a neural network model has been set up, it usually requires training before using for prediction or inference. The training process means to optimize parameters of the nerwork which are usually updated with the back propagation algorithm and a specified optimizer. In this article, we will introduce how to setup optimizers and hyperparameters in OneFlow to users.
Key point summary of this article:
Configuration examples of job functions for training and prediction.
The use of optimizer and learning strategies.
Common errors due to misconfiguration and corresponding solutions.
Users can directly use the training and inferencing configurations described in Example of configutraion section without knowing the design concept of OneFlow. For more detials please refer to optimizer api
Job Function Configuration¶
In [Recognizing MNIST Handwritten Digits] (... /quick_start/lenet_mnist.md#global_function), we have learned about the concept of the
oneflow.global_function decorator and the job function. The configuration of this article base on that.
The job function can be configured by passing the
function_config parameter to the decorator.
Example of Configurations¶
Configuration for prediction/inference¶
Here we define a job function to evaluate the model:
We set up the configurations of
get_eval_config fucntion and pass it to
@flow.global_function. At the same time, we set the
type parameter of the
@flow.global_function to "predict" for evaluation task. This way, OneFlow does not propagate backwards in this job function.
def get_eval_config(): config = flow.function_config() config.default_data_type(flow.float) return config @flow.global_function(type="predict", get_eval_config()) def eval_job() -> tp.Numpy: # build neural network here
Configuration for training¶
If you specify the
type parameter of
@flow.global_function to be
train, you can get a job function for training.
In the following code,
train_job is the job function used for training and it is configured with the default
function_config (so there is no parameter passed to
The reason you need to specify the following settings like optimizer, learning rate and other hyperparameters in the job function is because OneFlow will back propagate for
@flow.global_function(type="train") def train_job( images: tp.Numpy.Placeholder((BATCH_SIZE, 1, 28, 28), dtype=flow.float), labels: tp.Numpy.Placeholder((BATCH_SIZE,), dtype=flow.int32), ) -> tp.Numpy: with flow.scope.placement("gpu", "0:0"): logits = lenet(images, train=True) loss = flow.nn.sparse_softmax_cross_entropy_with_logits( labels, logits, name="softmax_loss" ) lr_scheduler = flow.optimizer.PiecewiseConstantScheduler(, [0.1]) flow.optimizer.SGD(lr_scheduler, momentum=0).minimize(loss) return loss
PiecewiseConstantScheduler` sets the learning rate (0.1) and the learning strategy (PiecewiseConstantScheduler, a segment scaling strategy). There are other learning strategies built inside OneFlow. Such as: CosineScheduler、CustomScheduler、InverseTimeScheduler and etc.
flow.optimizer.SGD(lr_scheduler, momentum=0).minimize(loss), set the optimizer to SGD and specify the optimization target as
loss. OneFlow contains multiple optimizers such as: SGD、Adam、AdamW、LazyAdam、LARS、RMSProp. More information please refer to API documentation.
Check failed: job().job_conf().train_conf().has_model_update_conf()
typeof the job function is
optimizerand optimization target are not configured. OneFlow will report an error during back propagation because OneFlow does not know how to update the parameters. Solution: Configure
optimizerfor the job function and specify the optimization target.
Check failed: NeedBackwardOp
typeof the job function is
optimizeris incorrectly configured. Then
optimizercannot get the reversed data because OneFlow does not generate a reversed map for the
predictjob function. Solution: Remove the
optimizerstatement from the