Jul 11, 2012

Upcoming open source technologies

Apache CouchDB:


‘An open source database that completely embraces the web’, that’s Apache Couch DB or simply Couch DB. Couch DB is a NoSQL database and it uses JSON documents to store data. Created in 2005 under former IBM Lotus notes developer Damien Katz, Couch DB was initially used as storage system for large scale databases until it became an Apache project in 2008. Still number of companies like the BBC and Credit Suisse uses Couch DB’s dynamic content platforms to store configuration details for its marketing data framework. CouchDB is available under the Apache License 2.0.





8. MongoDB:


Another NoSQL database technology to make this list is MongoDB. Developed by the founders of DoubleClick, MongoDB formulates implementation of data in certain applications faster, as it stores data in the form of JSON like documents that are coalesced with dynamic schemas. In 2007 10gen began offering MongoDB commercial licenses. As for today a whole lot of enterprises like MTV Networks, Craigslist, Disney Interactive Media Group, The New York Times and Etsy utilize its technology. MongoDB is available under GNU General Public License, with language drivers available under Apache License.



Apache Cassandra:


An open source distributed database management system, Apache Cassandra is another NoSQL data store developed by Facebook. Facebook created Apache Cassandra to power up their Inbox Search feature, only later to be dropped in 2010 as they pursued HBase. But Apache Cassandra is still preferred by other companies like Netflix as a back-end database. Apache Cassandra is also available in Apache License 2.0.






6. Apache HBase:


Another open source big data technology to make this list is Apache HBase. Apache HBase is one among a handful of open source technology that supports NoSQL data stores. Apache HBase runs on Hadoop Distributed Filesystem (HDFS), as it is designed as a non-relational columnar distributed database. In 2010 Facebook acquired HBase as it provides fault-tolerant storage and access to large quantities of spare data. Now Apache HBase is available under Apache License 2.0.





5. ElasticSearch:


ElasticSearch is a free open source search server that is being developed in Java. It is also a distributed, RESTful search server which is developed under Apache Lucene.  Created by Shay Banon, ElasticSearch came to reality when Banon was to release the third version of Compass, an open source Java Search Engine Framework. Banon created ElasticSearch as a much awaited answer to his ‘scalable search solution’. Now ElasticSearch is available as an open source technology under Apache License 2.0. Implemented by a number of firms like StumbleUpon and Mozilla, ElasticSearch now provides support for real-time search and multitenancy without a special configuration.



4. Scribe:


Developed by Facebook in 2008, Scribe is a server used to combine and send log data, generated by other large servers that are associated with Facebook. Scribe was initially designed to minimize the server big data challenges in Facebook, but as for now the company boasts that it handles over tens of billions of messages a day. At present, Scribe is an open source technology and it’s available under the Apache License 2.0.










3. Cascading:


Available under the GNU General Public License, Cascading is an open source software abstraction layer for Hadoop. It allows users to create and execute data processing workflows on Hadoop clusters using any JVM-based language. Designed by Chris Wensel as an alternative against the complexity of MapReduce jobs, Cascading provides support to a number of sectors in ad targeting, log file analysis, bioinformatics, machine learning, predictive analytics, Web content mining and ETL applications. Wensel also developed a firm called Concurrent to provide commercialization support for Cascading.




2.R:


Popularized under Revolution Analytics, R combines open source programming language along with software environment only to provide solutions for statistical computing and visualization. In 1993 Ross Ihaka and Robert Gentleman formulized R at the University of Auckland, New Zealand. As for now, R is considered as one of the best in statistical analysis, as recently reports have emerged that R is now tracking services and supports from models stimulated by Red Hat’s support for Linux.







1.Apache Hadoop:


Created by Doug Cutting, Apache Hadoop is considered as one of the best in storing the structured, semi-structured and unstructured data. Constructed to be an open source software framework for data-intensive distributed applications, the Apache Hadoop uses a series of nodes to store the data. Named after his son’s toy elephant, Cutting structured a MapReduce facility with a distributed file system to meet the multi processing requirements of his firm, Nutch. This technology later to be known as Apache Hadoop became so popular that it’s considered as one of the best in open source technology.






0 comments:

Post a Comment

Thanks for sharing your knowledge