Sunday, September 7, 2014

Redirect

Hello All,

I would like to inform you that I am closing this blog. I have in place started another blog that will continue the topics started here. You can find this new blog at: http://thebigdatablog.weebly.com/. This page will now redirect you to the new blog.

Thank you for your continued support!

Sunday, July 20, 2014

Getting Started With Big Data

When I first started on my journey I was overwhelmed with all of the terminology and concepts associated with Big Data. I knew that I wanted to get started learning but I had no clue or idea where to start. At the time there were no introductory resources I could use as a starting point. In this space it is extremely important to grasp the fundamentals and basics before proceeding further.

For your convenience, I have gathered a collection of useful, visual and easy to understand resources to help you get started. The resources below are designed to provide an education on fundamentals/concepts, sandbox environments and tutorials. Together, the collection of resources provides individuals at any understanding a place to start their Big Data learning and journey. Please feel free to add this to getting started toolkit because you will refer back to these resources occasionally.

My advise as you learn from the resources below is to pace yourself and take one step at a time. Make sure that you understand the concepts before proceeding to the sandbox and tutorials. Please feel free to reach out with any questions that you may have and I will do my best to provide an answer.

Good Luck!


Understanding Big Data Concepts:
What is NoSQL
Links:
- http://youtu.be/qUV2j3XBRHc
- http://youtu.be/pHAItWE7QMU
- http://www.couchbase.com/sites/default/files/uploads/all/whitepapers/NoSQL-Whitepaper.pdf

What is Hadoop
Links:
- http://youtu.be/3Wmdy80QOvw
- http://youtu.be/xYnS9PQRXTg
- http://hadoop.apache.org/docs/r0.18.0/hdfs_design.pdf

What is Big Data
Links:
- http://youtu.be/j-0cUmUyb-Y
- http://youtu.be/ahZGEusG13A
- http://www.slideshare.net/remyavivek/big-data-ppt-23276173


Big Data Sandboxes:
Hortonworks
Link:(http://hortonworks.com/products/hortonworks-sandbox/)

Cloudera
Link:(http://www.cloudera.com/content/support/en/downloads/quickstart_vms/cdh-5-0-x.html)


Tutorials:
Hortonworks Hadoop, Hive and Pig tutorials
Link:(http://hortonworks.com/tutorials/)

Cloudera Hadoop and MapReduce tutorials
Link:(http://www.cloudera.com/content/cloudera-content/cloudera-docs/HadoopTutorial/CDH4/Hadoop-Tutorial.html)

Spark tutorials
Link:(https://spark.apache.org/documentation.html)

Flume tutorials
Link:(http://www.openscg.com/2013/09/using-hadoop-to-flume-twitter-data/)

Hive tutorials
Link:(https://cwiki.apache.org/confluence/display/Hive/Tutorial)


Sample Data Resources:
Tableau Data Sets
Link:(http://www.tableausoftware.com/public/community/sample-data-sets)

InfoChimps Data Sets
Link:(http://www.infochimps.com/datasets)

World Health Organization Data Sets
Link:(http://www.who.int/research/en/)

Sports Data Sets
Link:(http://www.amstat.org/sections/sis/sports%20data%20resources/)

Miscellaneous Data Sets
Link:(http://mathforum.org/workshops/sum96/data.collections/datalibrary/data.set6.html)

Monday, July 14, 2014

Welcome!

Data is everywhere. Every day, we create 2.5 quintillion bytes of data. Almost every interaction we have with a piece of technology is recorded, stored, analyzed and used. Data has established itself as the most valuable currency because it can provide us a rationalization of human behavior and define the world we live in. Within the last decade we have finally developed the tools to harvest, analyze and parse the massive amounts of data that is generated. This term Big Data encompasses the tools, technologies, people, processes and skills to deal with this data.

The world of Big Data can be a mysterious and confusing landscape. Much of the mysticism comes from the lack of understanding of tools, technologies, business applications and use cases. My goal is to provide a location to demystify and explain the world of Big Data. This blog will serve as a useful guide through the world of Big Data.

Things you can expect from this blog include:

- Explaining complex concepts
- Analyze new tools and technologies
- Providing useful resources
- Bridging the gap between technologies
- Answering any and all open questions

Please follow this blog and together we can learn and traverse through the world of Big Data, Analytics and NoSQL.

Thank you!