Microsoft Releases the Updated Version of DeepSpeed
It is a deep learning optimization library that can help create a model of 1 trillion parameters.
Microsoft recently released the updated version of its DeepSpeed. DeepSpeed is a deep learning optimization library from Microsoft that can train a neural network very efficiently than a normal AI developer does. Microsoft’s DeepSpeed can train an AI model of over a trillion parameters, it is claimed to be 3 times better than Open AI’s GPT-3 in terms of parameters as parameters decide the performance of a high powered model like GPT-3 & DeepSpeed.
The announcement of the updated DeepSpeed is certainly going to help the tech researchers or deep learning practitioners to become more efficient in terms of training & developing their AI models, developers will also find it more efficient in terms of hardware. The data science practitioners working on a high-end graphics card computer, researchers, who are on a low budget, working on a low-end computer, after putting up a lot of effort in training the models, the computing power loses its sheen. DeepSpeed will benefit all the aforementioned issues.
Microsoft announced this library powerhouse in February and it was responsible for creating a state-of-the-art accuracy Turing NLG(Natural Language Generation), another NLG model that has around 17 billion parameters. DeepSpeed helped create Turing NLG it was also responsible for creating ZeRO-2, trained with almost 200 billion parameters as mentioned in Microsoft’s research blog.
Microsoft claimed that DeepSpeed will be able to train a model of a trillion parameters using only 100 of V100 graphics cards from the past generation of NVIDIA. NVIDIA’s incumbent generation graphics cards A100 being 20 times faster than V100. Microsoft claimed that they can train a trillion parameters using 4,000 A100 graphics cards in no more than 100 days with proper accuracy.
Such is the level of implementation claimed by Microsoft and it will be very exciting to see its implementation for each and every deep learning enthusiast and practitioner. After Turing NLG, many more such intimidating innovations are awaited.
Improvements from the Previous Version
ZeRO-Offload: (Zero Redundancy Optimizer) is 200 billion parameters trained model as mentioned that optimizes memory consumption of models that are too large to train, thereby improving scalability, usability, adaptability of the models.
Dubbed 3D Parallelism: It is said by Microsoft in their research blog that it is mainly used to increase the hardware efficiency of training the deep learning model by distributing work amongst the training servers. The deep learning model requires a lot of time to train a plethora of data, a much significant amount of computing power, and a plethora of time to finally implement.
DeepSpeed can help practitioners and researchers to speed up their process and become more productive. If implemented industrially and used in real-world projects, then DeepSpeed can certainly be a good boost to the startups or upcoming startups, and the tech behemoths like Microsoft itself.
As of now, DeepSpeed cannot be compared with OpenAI’s GPT-3, the latter being an API, while the former is a deep learning optimization library.
Machine Learning automation is seeing a steep rise and it can become a huge industry within, to quote Bill Gates “A breakthrough in Machine Learning can be equivalent to 10 Microsofts.”
DeepSpeed looks very promising but it will certainly be put to good use if AI professionals outside of Microsoft start to implement it in their own ecosystem.
To read more about DeepSpeed, check out Microsoft’s official announcement of DeepSense.
Github repo for DeepSpeed: https://github.com/microsoft/DeepSpeed