This work follows Cambricon-F: Machine Learning Computers with Fractal von Neumann Architecture.

Cambricon-F obtains the programming scale-invariant property via fractal execution, alleviating the programming productivity issue of machine learning computers. However, the fractal execution on this computer is by the hardware controller and only supports a few common basic operators (convolution, pooling, etc.). Other functions need to be built on the sequence of these operators. We have found that when a limited and fixed instruction set is used to support complex and variable application payloads, inefficiency will occur.

When supporting regular algorithms such as conventional CNNs, the machine can achieve optimal efficiency. However, in complex and variable application scenarios, even if the application itself conforms to the definition of fractal operation, it will cause inefficiency phenomenon. The inefficiency phenomenon is defined as a suboptimal computational or communication complexity when certain applications are executed on a fractal computer. This paper uses TopK and 3DConv to illustrate the inefficiency phenomenon.

An intuitive example: The user wants to execute the application Bayesian Network, which conforms to the definition of fractal operation and can be executed efficiently in a fractal manner; But because there is no such “Bayesian” instruction in Cambricon-F, the application can only be decomposed into a series of basic operations and then executed serially. If the instruction set can be expanded, and a BAYES fractal instruction is added, the fractal execution can be maintained until the leaf node is reached, which significantly improves the computational efficiency.

Based on this, we improved the architecture of Cambricon-F and proposed Cambricon-FR with a fractal reconfigurable instruction set structure. Analytically, Cambricon-F is a Fractal Machine, while Cambricon-FR can be seen as a Universal Fractal Machine; Cambricon-F can achieve optimal efficiency on a specific application payload, while Cambricon-FR can achieve optimal efficiency on complex and variable application payloads.

Published in “IEEE Transactions on Computers”. [DOI]