Archive for the ‘Cloud computing’ Category

BigData – A primer

August 29, 2012 Leave a comment

“Big Data” is not new to any of us. We see that everyday, every moment. We are contributing to Big Data every minute and are making it Bigger.

Yes, right now, before you ended up on this post, I’m sure enough you’ve clicked a couple of links that would have updated your IP address, geographic location, the website you were browsing on, and several other details on a server.

And your smart phone might have been conversing with it’s manufacturer about the current version of O.S or a recent crash report or updates for existing applications. So hundreds of millions of users like you and me, that means a vast amount of data generated every hour.

A Boeing 737 generates 240TB of data in a single cross country flight  – the speed/velocity of data generation is very high.

So what is happening to all the information being generated at this rate? This could easily measure a few hundreds of gigabytes to a few tera bytes or even peta bytes.

So, Big Data technologies are all about answering the following two questions

  1. How and where do you capture all this data?
  2. How do you organize and make meaningful business decisions based on this data?

Capturing Big Data

So the data being fed by sources like vast number of surveillance cameras, microphones, a wide variety of sensors, mobile phones, Internet Click Streams, tweets, facebook messages is actually, not structured. The velocity of this data demands very high write performance from the datastore – So much so that the ACID properties promised by RDBMS themselves could become a bottleneck for performance. A poor write performance means inability to capture the data as it comes. 

Availability of the store at all times necessitates that the data be distributed on multiple servers, and that brings in the problems of replication and consistency.

Simple Key Value stores have evolved in the recent times that work without conforming to ACID principles. These stores accept an application defined “Key” and some “Value” and persist the record as per a preset configuration. They offer a variety of SLAs for Replication, Consistency and speed of access. These are also called NoSQL databases.

NoSQL databases are Distributed Hash Tables that store “items” indexed by ‘keys”.

As per the CAP theorem, in a distributed environment it is impossible to guarantee all three of Consistency, Availability and Partition tolerance, you need to sacrifice one of them. All NoSQL databases are built to be operated in distributed environments (although they can be operated on lone hosts). They are optimized for very high write performance by conforming to BASE properties (RDBMS follow ACID properties, we all know that). 

BASE – Basically Available Soft-state Eventual consistency

Data from the application need not be normalized to multiple tables (as with RDBMS), so an object is written or read in one shot, into a single Key-Value table. All the data for an object resides at one place, and is co-located on the disk. So it is a sequential read which means very high disk through put.

Key Value stores are classified into four types based on the type of value they store.

  1. Simple KeyValue stores (Amazon Dynamo)
  2. Graph DB (Flock DB)
  3. Column families (Cassandra)
  4. Document databases (Mongo DB)

Making sense out of Big Data

Map Reduce is a programming model to process large data sets.

The idea is to send the code to where the data resides, because we are talking of large data sets and moving them around could be expensive and time consuming.


In this step, the actual problem is divided into sub-problems and are assigned to worker nodes (typically where the data resides).


All the results will be gathered from the worker nodes and will be merged to produce the final result.

Example: Apache Hadoop


Technology trends (in search volumes)

January 21, 2011 Leave a comment

Trends in search patterns for technologies & businesses in the decade ended 2010

These graphs from Google trends may not reflect the actual & exact number of people who had their interests on each of these technologies over time, but I believe, it is to some extent, true. In here are my personal opinions on how the trends in the industry have been & going to be.


Trends in Googling for Artificial Intelligence – S.O.A – Virtualization – Cloud computing – NoSQL

Trends: Artificial Intelligence VS Cloud Computing VS Service Oriented Architecture VS NoSQL VS Virtualization

A.I - Cloud - SOA - NoSQL - Virtualization

The picture clearly shows that A.I. ruled for about three years in a row, the volume of search queries in this space has been slowly falling though. We have seen so many A.I. products coming to light. Chat Bots, Spam filters, and fraud detection systems have all become popular over time by incorporating Natural Language Processing & Machine learning. In spite of studies, and R&D over a decade, A.I. has still got a whole lot of shapely changes to bring about on the industry. Knowledge representation and application can get a lot better than it is today!

To A.I., it has already been a long journey as compared to other technologies, ‘coz Google trends are available only from 2004, but A.I. as such, actually, dates back to 1956 (Ref: Wikipedia). To my knowledge, Artificial Intelligence has been in the curricula at universities since late 90s, or could be even before that. Reasearch & development has also been “on” ever since, and it appears that the search trend had its highest peak in early 2004.

When it comes to “Virtualization” and “Cloud computing”, there is a steep rise in the volume of search queries from mid 2004 and 2007 respectively. Cloud computing has revolutionized the market. So much so that… Larry Ellison, CEO & founder of Oracle corporation, has once said that Cloud computing is an insane concept (Ref: 1, 2, 3, ), and is now providing Oracle 11g database on Amazon Web Services and his team is presenting about Cloud in various conferences.

Cloud computing dramatically drives down costs and improves overall server utilization how? . This one factor helped the technology create shock waves in business computing. That is the reason why the spike is so steep for this technology.

The NoSQL movement, MapReduce, and CAP theorem  have started gaining popularity slowly in the mid 2008 as per the graph above. These technologies have had a great impact on the way businesses are classifying and handling the data they work with. They scale to vast sizes and offer High Availability.

Now that we have observed the technology trends, let us look at the Businesses and/or business areas that have had the edge over these years.

Oracle -vs- Miccrosoft -vs- Google -vs- Yahoo -vs- Wikipedia

Yahoo, Google, Wikipedia, are over the software giants Oracle and Microsoft. It can be inferred that open-source and free-to-use services gained the edge.

It is needless to state now that open source & free-to-use websites are the Go for startups, but is worth noting that it is the technology & the scope of business you choose that makes the difference! Know the trend before you hit the ground.




FAQs on the cloud

January 3, 2011 1 comment

What are people thinking about cloud computing?


1. What is this thing called “Cloud computing”?

a. Cloud computing is kind of generalized, automated & integrated virtualization;


2. Why is there a lot of hype all over?

a. There is a lot of hype all over because.. cloud computing is a revolution to come…;


3. Is that something fictional? Is that really possible?

a. It is not fictional.. If you believe you can create more than a few virtual machines on a single physical computer, Cloud computing is very much possible – In fact, it is proven and current, you cannot really question it’s practicality (,, …);


4. How can a Virtual Machine (VM) be scaled to a capacity more than what the underlying host has?

a. There is not one but a whole lot of physical servers that serve as the cloud. Usually there will be a master host which controls the behavior of the rest of the hosts. It manages to get VMs that have uncorrelated work-load peaks and valleys on to a physical host, and is transparent..


5. I don’t think a VM on the cloud is as efficient as a physical, dedicated server for my application, I’d rather opt to have a dedicated server?

a. Latencies may pop-in, but there is always a trade-off. You evaluate the TCO (Total Cost of Ownership), for your business. Physical dedicated servers **may** do good, but if I were you, I would consider the maintenance costs like datacenter cooling, power, building lease and other misc. maintenaces & also overall server utilization against the efficiency of the application. I would not want my servers to be idle during non-peak hours – I have to pay the same amount out of pocket even it is non-peak as for my business. I want my business to be **cost-effective**



Test drive on the cloud using Amazon web services.

September 1, 2009 Leave a comment


I wish to write a couple of lines on working with the Amazon Elastic Compute Cloud (EC2) today.
Of course, visiting the aws website would be sufficient for one to understand how things work there, this one’s some kind of a primer! Outlines what’s there in the store!

Getting a server up and running is as simple as 1..2..3…

1. Register
2. Choose and Submit payment info.
3. Create an instance (choose your server configuration), and launch.

The server configuration may be chosen manually, or from a list of “Community instances” (pre-configured).

The server, with all the resources you’ve requested along with the OS, and other application softwares like Oracle Database or SQL Server, is delivered in a jiffy – as fast as you click the launch button!

Amazon Web Services provides you with a private-key file using which you can login to the instance. All other tools like those to monitor the resource usage are readily available to choose and use.

Technical team that a cloud demands:

The cloud not only brings about a revolution across small and mid-sized businesses, but also creates a shock wave in the fields of Infra Structure/Networking/DBA man power.

It happens this way…
Say, a company Xyz Pvt Ltd of India is to host its online sales and eCommerce website where transaction volume per day is about one million. The company is unable to afford a million dollars in purchasing a server, with-out which the database/app-server in no way can handle about a million transactions per day. A Cloud enables the company to do this.
With the advent of Cloud computing, the company can now pay about a 100 dollars per day and host its website on the server capable enough.

This is only a small illustration where Cloud computing helps small and middle scale companies. Now if there are about tens of thousands of such companies in a city like Hyderabad [India], atleast thousands of them would come up with their own websites like eCommerce/online sales/customer survey, where there were only a few earlier.

Now let us change our perspective, let us have a look from a company that offers this cloud, say Abc Pvt Ltd, from India. Say there are about 50 to 100 companies using the cloud offered by Abc. Abc has to see that the cloud it offered is up and running all the time. If it is offering about 100 high-end servers for its customers, imagine the workforce it would need to look after them.

Cloud computing: A revolution to come

August 19, 2009 Leave a comment

(Moved from Originally posted on: Wed, Aug 19, 2009

I would like to make my comments on Cloud computing. As an admirer of technology and sciences, and more as a professional in computer software engineering, I have been reading articles about cloud computing on the web.
The meaning, importance, power and almost every other thing that relates to Cloud computing can be understood from the very name. It is self-explanatory.
Hee, the objects (software/hardware) are available in the form of a cloud, which any authorized user can access. It is sizeable, on the go; one can upsize or downsize their server as easily as they post a message to his/her friend on a social networking site, no wires, no cables, no power and obviously no hassles, no worries.

Today, there are many Independent Software Vendors, we will call them ISVs hereafter, who cannot afford full sized web-servers and database servers for the applications they develop. For the costs of their procurement and maintenance are too high and far, really far beyond imagination. That said, another reason for their not procurement of their own servers is scalability.

Now that this part is taken up by the provider of the cloud, the ISV must now pay only for the machine, and that is based on the time he uses it for!
Now suppose that there are about a 100 such , and with the advent of the cloud computing, suppose that 50 of them will host their applications on their own servers.
Now the purpose or domain of these applications is what brings competition among these 50 ISVs.
But let me interrupt the narration about these 50 ISVs and ask you a simple question, do you think it is only 100 such ISVs in this real world? No. Millions are there. So imagine the competition and the revolution that is to come.

A picture is worth 1000 words… A video is worth a thousand pictures…