[Reposting my article from https://dzone.com/articles/json-files-whats-in-a-new-york-name-unlocking-data] Data.gov , started in 2009, has about 189,000 datasets . Data is published in XML, CSV, JSON, HTML, PDF, and other formats. Data.gov aims to improve public access to high value, machine-readable datasets generated by the Executive Branch of the Federal Government. Lots of this data comes from the Socrata database. They also provide Socrata APIs to retrieve the subset of the data that you need. Data is valuable. Insights are more valuable. Instead of working with data trickle, let’s load all the data and analyze them. We start this series using a dataset on a simple and seemingly inconsequential decision parents make: baby names. Obviously, parents take this decision quite seriously. If New York is the melting pot, let’s see what baby names those parents choose. The techniques shown here will help you analyze not just this dataset,...
A blog about all things data and data processing issues and interests. SQL, NoSQL, flexible schema, scale-up, scale-out, transactions, high availability.