QIZMT – An open source distributed computing platform

MySpace recently open sourced, their internally developed distributed computing platform, QIZMT. Qizmt is based on MapReduce framework. Qizmt particular internal framework is a powerful tool that was developed by the data mining team at MySpace. The main purpose of this platform finds its use when there is a necessity to process a large set of data, like in case of collaborative filtering user logs say peta bytes of data. It can also be used for running lots and lots of data for recommendations and analytics. The basic intention behind this framework is to make the user recommendation of MySpace a smarter one to use by faster data processing. Not only does it make it smarter, it also makes it faster and more reliable. At present, it is being used in the “People you may know” feature of MySpace. Later on, it can also be extended to other user recommendations areas of the portal. This serves only on the windows platform as it was built leveraging .Net. The reason why QIZMT is spoke about a lot recently, is due to certain internal benchmarks, based on which the working was highlightened. Tests have proven that, QIZMT is faster than any other open source distributed platforms. MapReduce is the primary environment of internet giants like Google, Amazon, etc. Initially, MapReduce framework was developed by Google. Taking this as the base framework, QIZMT was built. Furthermore, the MapReduce framework can operate in large cluster of computers. This is inspired by the Google’s framework operating in the functional programming context. Computations can occur either in the file system or in a large database. With the advent of Business Intelligence platforms, QIZMT will be the undisputed choice? we have to wait to watch. When providers start dealing with continuously growing data, for example, a social networking site, etc., the analytic needs keep increasing. In such cases, QIZMT will be an integral part of the system, both for data mining and the data processing needs. The intention of MySpace is very clear. MySpace is basically a portal that has users who constantly update the details, upload videos, pictures, etc. QIZMT was designed in such a way that, it not only process the active data i.e., data from users, but also process the passive data which is generated by the analytics system. Not only does it process, it also transforms the data into a real time recommendation. With the help of these recommendations, the user has a different real time experience as far as the portal is concerned. As quoted from Mike Jones, chief operating officer at MySpace, “This will support the discoverability of new entertainment experiences across music, videos, friends and more.”
Some of MySpace Qizmt’s Features
- Built-in IDE/debugger; developing and debugging mapreducer jobs on a large cluster of servers.
- Execute commands from any machine in the cluster.
- Delta-only exchange option for Mapreduce jobs.
- Configurable data-redundancy/machine level failover.
- Easily add machines to a cluster to increase processing power and capacity.
MySpace Qizmt currently supports Windows 2003 Server, Windows 2008 Server, Windows Vista and up. You can download it from here.
-Shiva, Student Intern
Update #1
even bing appears to leverage hadoop
Update #2
Tags: Technology