Apache Spark has emerged as an efficient big data platform over the years and it has its own loyal fan base. It is often considered a rival of Hadoop but that is not the case. Both platforms complement each other and can be used with or without each other. Leaving that topic for some other time, let’s focus on the big announcement of February 2017.
Intel’s BigDL for Apache Spark is here
For those of you are not aware of what BigDL is, it is an open-source artificial intelligence-based project from Intel. It was first announced in November 2016 and is now open for contribution. Before I proceed, BigDL uses the Distributed Deep Learning technology. The Corp VP & GM Software and Services Group at Intel Corporation, Doug Fisher unveiled BigDL stating:
“BigDL is an open-source project, and we encourage all developers to connect with us on the BigDL GitHub, sample the code and contribute to the project.”
All the details on the GitHub page and contribution formality are mentioned at the end of the blog.
BigDL’s Features
BigDL has come at a time when the industry is going crazy with the predictions about artificial intelligence paving its way in 2017. Since the end of 2016, almost all technology related prediction makes mention of the artificial intelligence and here is something that works on an enhanced modality called the deep learning.
Imagine the time it would take to process extremely large data sets. Now, there are a lot of Software that can do this type of tedious tasks in a very less time and now you have distributed deep learning to take care of even bigger data sets at a much faster rate. BigDL is said to be extremely efficient and useful in certain cases like:
• Analyzing data on the storage mediums like Hive, HDFS, etc.
• Enhancing your big data programs by making use of deep learning applications.
• Embedding deep learning applications in your existing platform.
• Gaining high momentum performance because of the use of Intel Math Kernel Library (MKL).
• Analyzing data at a large scale.
• Implementing synchronous Stochastic gradient descent (SGD).
Limitations
The Spark 2.0 is currently not working well with Java 7 and therefore, the recommendation is to use Java 8. Also, the Spark 2.0 version is having troubles with Kryo and it suggests making use of Java serializer. There are a few other known issues that can be found on the official GitHub page mentioned below. Besides these, there is always an option of reporting bugs if you encounter issues. You can either do that from the official page or make use of the user group for gaining more information. You can also enroll yourself in the mailing list to be informed about the latest updates.
If you are a novice, then make use of the comprehensive tutorials to polish your skills. For additional information on the BigDL project, visit the GitHub page. The page has detailed information on the process of installation, getting started, reporting bugs, and other support issues.
All the content shared in this post belongs to the author of big data analytics solutions company. If you wish to share your thoughts regarding the Microsoft Dynamics, comment below.