2D SBP¶

After reading the Global View and Global Tensor, you may have learned about the basic concepts of SBP and SBP Signature, and can get started with related tasks. In fact, both these two documents refers to 1D SBP.

Since you have known about 1D SBP, this document introduces 2D SBP, which can more flexibly deal with more complex distributed training scenarios.

2D Devices Array¶

We are already familiar with the placement configuration of 1D SBP. In the scenario of 1D SBP, configure the cluster through the oneflow.placement interface. For example, use the 0~3 GPU graphics in the cluster:

>>> placement1 = flow.placement("cuda", ranks=[0, 1, 2, 3])

The above "cuda" specifies the device type, and ranks=[0, 1, 2, 3] specifies the computing devices in the cluster. In fact, ranks can be not only a one-dimensional int list, but also a multi-dimensional int array:

>>> placement2 = flow.placement("cuda", ranks=[[0, 1], [2, 3]])

When ranks is in the form of a one-dimensional list like ranks=[0, 1, 2, 3], all devices in the cluster form a 1D device vector, which is where the 1D SBP name comes from.

When ranks is in the form of a multi-dimensional array, the devices in the cluster are grouped into a multi-dimensional array of devices. ranks=[[0, 1], [2, 3]] means that the four computing devices in the cluster are divided into a \(2 \times 2\) device array.

2D SBP¶

When constructing a Global Tensor, we need to specify both placement and SBP. When the cluster in placement is a 2-dimensional device array, SBP must also correspond to it, being a tuple with a length of 2. The 0^th and 1^st elements in this tuple respectively describes the distribution of Global Tensor in the 0^th and 1^st dimensions of the device array.

For example, The following code configures a \(2 \times 2\) device array, and sets the 2D SBP to (broadcast, split(0)).

>>> a = flow.Tensor([[1,2],[3,4]])
>>> placement = flow.placement("cuda", ranks=[[0, 1], [2, 3]])
>>> sbp = (flow.sbp.broadcast, flow.sbp.split(0))
>>> a_to_global = a.to_global(placement=placement, sbp=sbp)

It means that logically the data, over the entire device array, is broadcast in the 0^th dimension ("viewed vertically"); split(0) in the 1^st dimension ("viewed across").

See the following figure:

In the above figure, the left side is the global data, and the right side is the data of each device on the device array. As you can see, from the perspective of the 0^th dimension, they are all in broadcast relations:

The data in (group0, device0) and (group1, device0) are consistent, and they are in broadcast relations to each other
The data in (group0, device1) and (group1, device1) are consistent, and they are in broadcast relations to each other

From the perspective of the 1^st dimension, they are all in split(0) relations:

(group0, device0) and (group0, device1) are in split(0) relations to each other
(group1, device0) and (group1, device1) are in split(0) relations to each other

It may be difficult to directly understand the correspondence between logical data and physical data in the final device array. When thinking about 2D SBP, you can imagine an intermediate state (gray part in the above figure) there. Take (broadcast, split(0)) as an example:

First, the original logical tensor is broadcast to 2 groups through broadcast, and the intermediate state is obtained
On the basis of the intermediate state, split(0) is continued to be done on the groups to get the status of each physical tensor in the final device array

2D SBP Signature¶

1D SBP has the concept of SBP signature, similarly, the operator also has 2D SBP signature. Based on mastering the concept of 1D SBP and its signature concept, 2D SBP signature is very simple and you only need to follow one principle:

Independently derive in the respective dimensions

Let's take matrix multiplication as an example. First, let's review the case of 1D SBP. Suppose that \(x \times w = y\) can have the following SBP Signature:

\[ broadcast \times split(1) = split(1) \]

and

\[ split(0) \times broadcast = split(0) \]

Now, suppose we set the 2D SBP for \(x\) to \((broadcast, split(0))\) and set the 2D SBP for \(w\) to \((split(1), broadcast)\), then in the context of the 2D SBP, operate \(x \times w = y\) to obtain the SBP attribute for \(y\) is \((split(1), split(0))\).

That is to say, the following 2D SBPs constitute the 2D SBP Signature of matrix multiplication:

\[ (broadcast, split(0)) \times (split(1), broadcast) = (split(1), split(0)) \]