What’s Wrong with Big Data?

We know that the whole world led by tech sector is surfing the Big Data wave. Billions of bytes of data is being collected, cleaned up, indexed and mined by the time I will finish this sentence. According to Wikipedia, as of 2012, roughly 2.5 exabytes (2.5×1018 bytes) of data is getting collecting every day. The promises of mining that data range from cloud intelligence health care to driverless automobiles.

What could possibly be wrong with Big Data anyway?

There is nothing inherently wrong. But there are two problems I can point out.

  1. Big Data is too big. It is incomprehensible. Big Data remains infertile unless you can redact Big Data to Small Data, which can be summarized, visualized and put to use. Right there is the metaphorical big elephant of our Big Data living room. How do we handle that complexity? Google, Facebook and other tech giants are already scratching their corporate heads to unscramble this puzzle. Doable, but expensive and often could be erratic.
  2. secondly Big Data convinces us that we all can become mini-statisticians. “Show me the data” is the new mantra. We often mistake correlation to causation. With Big Data, this tendency can get even more accentuated.

There are plenty of avenues for making use of “small data” available to us. One of them is our on personal data. What do we eat, how much we exercise, what are our biometric readings etc. Data collection is going to be easier than ever with the emergence of wearable technology. If we could collect our own personal data and dive a little deeper, we can derive intelligence about ourselves and it can have a positive impact to our lives. On a personal level, it is more worthier a quest than crunching mega-billion abstract data points.

A big shout out to all those Life Logging ninjas! And to the good folks at Quantified Self.

A Request to Pocket App

Pocket is a gorgeous app, available in almost all mobile platforms and the web. It helps you to save online reading for later offline reading. You can tag the articles and have a little organization around it. With simple and elegant controls, it provides clutter free reading pleasure. It also can remember where you left off individual articles. If you haven’t used it, please check it out. Trust me, you would love it.

Pocket Hits

Once every few days, Pocket automatically curates a set of articles, which match my reading interests and sends as an email newsletter. They call it “Pocket Hits”. For me this news letter content is almost always right on target. I love to read right out of my inbox. It’s as if the algorithm can read my mind. Under each article, there is a link to save it to your Pocket account. (see picture.) Usually the newsletter contains 10 article links, for which you have to click 10 times to save them all. But it is worth the effort. Since the articles are saved for offline reading, it is perfect for subway reading, when you are completely disconnected from the web.

But …..

Wouldn’t it make sense to add an extra link like “Save all to Pocket” ? What do you say, Pocket team?


Two Tools for Teaching Children to Code

Why should we teach our children to code?

  • Because it gives them the ability to see what’s going on under the hood.
  • Because it sharpens their logical reasoning.
  • Because it empowers them to create instead of consume.
  • Because they may just like it and someday they might want to make a living out of coding.
  • Because it makes them “cool” among their peers.
  • Because it helps them to work better with computer professionals even if they chose a different line of specialization.
  • Because it is pure fun. Coding gives them instant gratification. Working code can show them magic unfolding in front of their eyes.

I am sure I wasn’t persuasive enough but please give it a try. There are tons of resources out there online to help you choose the right toolkit. It will be a true gift to your children.

To begin with here are a couple of links to checkout. Scratch from MIT is really easy and fun for young kids of age 5-8. App Inventor (created by Google and later donated to MIT) lets you create real Android apps using some of the event driven programming concepts. It is appropriate for ages 9-15. A child who can do Legos can do coding as well with either of these tools. Both are great tools for teaching.

Rights to be forgotten

Our status updates and pictures live on servers. Where do they go to die?

Early this year, an EU court asked Google and other search engines to institute a facility to request to be forgotten and Google complied without dispute. That decision may conflict with our right to know but I save that topic for another day. Today I am not talking about that but about something related.

This year, two of my friends on Facebook passed away. Their Facebook profiles are still alive. I fear that someday a strange Facebook algorithm would pull out an old memory from their profiles and present it on my timeline. I am not sure if they had a will written on how they wanted to settle their belongings and real estate but I am sure they didn’t have means to prepare a cyber will.

This is something Facebook, Google and other service providers who trade in cloud/social networking services in exchange for our privacy, could do. As consumers who are willing to barter details of our private lives, we have a right to be forgotten after we are gone from this world. We should be able to decide how long a post, a picture or a profile should live.

What we blather often dissipate in thin air. Our books, letters and paper documents get faded, eaten by termites or get recycled. The status updates on the other hand, are God-like. Invincible, pervasive and deathless. They live limitless lives through server farms, get cloned for backups, mined for profits and surveillance and sold out for unknown ransoms without our knowledge. Your words and pictures are not yours anymore, once they bid farewell from your keyboard.

Unfortunately, true privacy is not a viable option anymore. Internet has changed the way we live our lives. Completely.

So, as customers, citizens, and most of all, as humans, who desire to live honorable and precious lives, we demand the right to be forgotten for the content we post on the web.

Our status updates too, need a time and place to die.

What should Windows do

Well, the title of the post should have been “What should Microsoft do”, but right now I am not interested in talking about Microsoft Office or XBox or Bing. For our discussion, I prefer to treat “Windows” as a stand-alone organization.

In all those places I have worked, I was required to use Windows desktops. At home, we own a couple of devices running different flavors of Windows. I am not planning to decommission Windows anytime soon from my life. As a long time Windows customer – albeit a reluctant one at times – I am worried about its future.

The announcement of new Windows (Windows 10) created a reasonable amount of social media buzz and somewhat lukewarm expectations from (once) software giant. The twitter crowd started joke-streaming the popular elementary school humor of “7 ate 9″.

The release schedule is going to be late 2015 – TBD. To Be Decided is the problem that Microsoft lately has. Looking around we can see that the industry has shifted to rolling out major versions of software at least once every year. Customers became more savvy of nuances of the feature set that their favorite OS vendor brings in. Millions of people started watching Apple / Google / Samsung launch events.

Traditionally Windows adopted a refresh frequency of 2-3 years. It made sense in the past because Microsoft enjoyed a universal monopoly on PC operating systems and the smallest denomination of personal computing available to customers were desktops and laptops.  That beautiful candyland doesn’t exist anymore. Mobile has become ubiquitous. Apple and Google now dominate the lion’s share of that market while Microsoft was marginalized to be a bystander. Microsoft has tried to change its strategy – quickly dumped the well designed XP/Windows 7 products and embraced the metro-style tablo-desktop-sundae-OS called Windows 8. I can’t think of a product that was hated by more number of its fanbase than Windows 8. And it took 3 years for Microsoft to release it.

With the new leadership at its helm, Microsoft announced its next version of OS last month. The new version of Windows, which is expected to clean up after Windows 8, would come out shortly – well, in 2015, TBD to be precise!

What Windows should do is to commit for shorter cycles of OS iterations, that rejuvenate the market every once in a while. If a version fails to capture the imagination of its customer base, let it fail quickly and recover from that failure faster. Windows 8 came out in October 2012. It should not take 3 years for a software behemoth to recover from a failure as staggering as that of Windows 8. (Did you say, Windows 8.1? That doesn’t count..).

The first and foremost action Windows should do is to reduce their refresh frequency. Now more than ever, we need Windows in the market place.

Universal Back Button for Fluid Navigation

I usually take a long train ride every day to reach work. Most days, I have to switch trains in between. Following the morning crowd, I get off the train, go down the stairs and go up an escalator to board another train. This is the most annoying part of my daily commute. Very rarely, though, the first train stops right on the a platform adjacent to my next train. How I rejoice those moments!

What if the train allowed me to sit in the same car and it switches without me getting off at all? That would be a dream commute, isn’t it? You could stay in the same car and get off at your final destination, without even realizing that you were travelling multiple trains.

What is the single most differentiator in user experience between iOS and Android? I would argue that it is the universal back button. It stitches different applications and enables the users to navigate back and forth across applications. On the otherhand, iOS makes you go down the stairs and then catch an escalator to board the next app. I am surprised that all these years, they never implemented this super useful back button. May be they have a design deadlock that prevents them to implement it.

While the back button is very handy for Android users, it still feels like swiching the platforms and taking another train. Wouldn’t it make sense to have the ability to navigate across apps without even realizing that we are operating on many of them? What if the view animations were designed such that the app switches are transparent to the users? What if Android could give a fluid navigation across apps? That is in my wishlist from Android.

I wonder if the Lollipop is going to be sweet enough.

Next Delicacy on the Menu: Python Pickles

Python pickle could be a delicacy that vegans, vegetarians and meat lovers alike might relish. No, I am definitely not talking about the serpentine meat in jars of brine stored for eternity. This pickle is from Python programming language.


Python pickle is an aptly named package that helps us literally “pickle” objects and data from Python code and save it in flat files. This can be reloaded into its original form using the same package. In programming lingo, we call it serialization or marshaling of data, but in simple terms, this is a way of storing data in a format that enables us to resurrect the object if and when required.

Use cases of Pickling
Why we do we need pickling at all in the first place? A few sample uses are listed below:

  • Sending data across a trusted network. You can make data in pickle format and sending through a trusted network, so that the recipient can compose the same objects.
  • If a process takes a long time to do a task, and there are intermittent results available, you can pickle the results into a flat file. If the process goes down for some reason, you could bring up the process, which could reconstruct the object from the pickle file. The process could literally start from where it was left off.
  • Suppose you have a cluster of processes, which work in tandem. One process would take an input, produce an intermittent result and hand off to another process and so on. You could employ pickle to serialize the output objects to flat files and next process in line can pick them up and start processing as if an object that was handed off to it.

How do we use Pickle?

There are inner classes within pickle module and there are many helper functions to dump and load different data types. But essentially there are only two functions within the module. dump() and load(). Using dump() you can serialize data to a file descriptor (an open file, a socket etc.) and load() will help you load the serialized data and return you an object that was dumped before.

Sample Code: Dumping Data

# data_dump.py
# Dumps a dictionary to a flat file

import pickle
import sys

def main():
    my_data = {"name" : "Python", 
      "type" : "Language", "version" : "2.6"}

    # File opened in binary writable format (wb)
    fp = open("picklejar.dat", "wb")

    # Data is written to the file
    pickle.dump(my_data, fp)

    # As program quits this happens, but just
    # to make code cleaner, we can close the file descriptor.

if __name__ == '__main__':

Let us deconstruct the program. The package pickle is going to do all the heavy lifting for us. We have a little dictionary object that we need to store. First thing to do is to open a file, named as you wish. Remember to open the file in writable binary mode. The data dumped will not be purely printable ASCII format.

Save the code above to data_dump.py and run it.

python data_dump.py

If you check the file system, you should be seeing a file containing the dictionary.

ls -l picklejar.dat

Sample Code: Loading Data
Now we need to reload the data that was stored previously in pickle jar.dat.

# data_load.py

import pickle
import sys

def main():
    # Open the file readonly binary format. Why?
    fp = open('picklejar.dat', 'rb')
    out = pickle.load(fp)

if __name__ == '__main__':

If you execute this program, you will see that the output is written in the same fashion as we had originally saved from our previous example. Cool, isn’t it?

If you have a skeptical mind (that is a good thing) you would be asking questions. It is wonderful to store a dictionary, but I have many data structures, how would I store them? Can we store native data types? What can we store and what we can’t?

One way to store a large number of objects is to create a simple wrapper object – list, dictionary etc – which comprises all the objects that you want to pickle and then dump that wrapper object itself. You can not only store objects, but native variables such as integers and strings can also be stored. But you can’t store methods or classes, obviously. But if you think about it, methods are your logical segments, which comes to life as soon as you instantiate the object itself. The missing element of the puzzle is the data itself, which pickle easily saves and restores.

Like everything else in life, this too comes with a caveat or two.

  • Pickle storage is not secure. If you are planning to send data between machines across a non-trusted network (say internet), never use pickle. There are packages such as Trusted Pickle aka TPickle to help us with this task. It uses public key encryption to sign the pickle data and send. I haven’t used it myself, but here is the link for those who would like to venture out. http://trustedpickle.sourceforge.net/
  • If you have large number of objects, pickling can be very slow. One reason is that this package is a “pure Python” package. But there is a C implementation of pickle called cPickle, which is a 1000 times faster than pickle itself. If performance is your concern, go for it!

Next time you come across a scenario, where you want to save and recover “states” of objects, give it a shot. Surely worth it.

Photo by Three Points Kitchen