Skip to main content

Posts

Showing posts from June, 2016

KEEP CALM and JSON

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for software to parse and generate. Here’s a JSON representation of a customer data in JSON. This has served well for interchange and integration. So far, so good. As long as JSON was used for data interchange between multiple layers of application, everything was good. The moment people started using JSON as the database storage format, all hell broke loose. When I first saw JSON as the data format in database, I was surprised databases would pay the cost of keeping both key names and values for every document and process them for every query. I've heard numerous questions in my talks and articles . Here are the common objections: JSON is text. It’s inefficient. JSON has no enforceable structure. Data quality is gone. Key-value pairs for every document (row)? You’re nuts. 1. JSON is TEXT. It’s inefficient Online transactions provided by RDBMS w...

Is Oracle's Larry Ellison Wrong on Object Stores?

It's All About the DATA Here's  Larry Ellison's critique of Workday . " Workday does not use a database, they use an object store. So, they can't really do reporting. They use flash. So, they can't run on iPhones or iPads. Besides that, they're great!"  Here's the overview Workday application architecture (from Workday). Workday uses object model and stores the objects MySQL table as a blob instead of normalizing and storing the data in multiple tables, tuples, and columns. Object store makes object-relational mapping, object disassembly/assembly unnecessary. Because you store objects, you lose some benefits of RDBMS: query processing -- select, join, project, schema, optimizer to rewrite complex query, and help the query to perform better.  Complex query processing is essential for canned or ad hoc reporting.  Each task will be have to be written as a program that retrieves all the objects, explode them into runtime objects, and then d...

SQL on Twitter: Analyzing Twitter Data Made Simple

SQL on Twitter "If I had more time, I would have written shorter letter" -- Blaise Pascal There have been lengthy articles on analyzing twitter data by Cloudera here , here and here . More from Hortonworks here and here . This article is going to be short. Thanks to features in Couchbase 4.5 ! There have been lengthy articles on analyzing twitter data by Cloudera here , here and here . More from Hortonworks here and here . This article is going to be short. Thanks to features in Couchbase 4.5 ! Step 1: Install Couchbase 4.5 Use Couchbase console create a bucket called twitterand CREATE PRIMARY INDEX on twitter using query workbench or any other tool. Step 2: Request for your twitter archive . Once you receive it, unzip it. (You can use larger twitter archives as well). cd <to the unzipped location>/data/tweets Step 3: $ for i in `ls`;          do            grep -i ^Grailbird $i > $i.out ;  ...