Written on December 15, 2016
Author: Lewis Gavin

Big Data and Data Science Recap 2016

As the year draws to a close, I’d like to take the opportunity to reflect on what I’ve learnt in 2016. Towards the start of this year, I started this blog as a tool to simply store information on topics I’d learnt so that I could revisit them when needed. I then had a revelation that this information would not only be useful to me, but to others in my field: so I began sharing it!

I integrated Google Analytics and thicmds has allowed me to track how the blog has grown from a couple hundred sessions a month to over two thousand and has been slowly rising!

Blog growth google analytics

The spike has been due to the increased volume of blog posts, after setting myself a challenge to write a post a week since around June/July.

So what have I learnt and what could you learn from my posts? And what will I be doing with the blog next year?

End of Year Recap!

10 Steps to Big Data

Learn how to ingest data from your Data Warehouse into Hadoop with Apache SQOOP and Apache Flume!
Transform your data into Hive structures ready for analytics using Pentaho Data Integration for ETL
Find out what I learnt from the Strata and Hadoop World Conference in London
Understand Real Time Streaming concepts through Apache Kafka and How to scale spark streaming applications
Use Apache Kudu for RDMS like data storage alongside your Hadoop cluster
My most popular post of the year! Learn how to improve the performance of your Spark applications
Build a search engine with Elasticsearch
Begin performing analytics on your data in Real-Time with Apache Spark
Follow the Big Data Journey to learn the whole Big Data pipeline from ingestion to analytics - Part 1, Part 2, Part 3 and Part 4
Understand the differences between RDBMS and NoSQL data stores and gain knowledge of how to use Apache HBase

The road to Data Science

Take the Data Science 101 class
See how we run our first Data Science Hackathon at Capgemini
Step into Machine Learning with this simple intro on Naive Bayes
Step 2 of classification - Support Vector Machines
Finishing off classification with Decision Trees
Moving into Clustering for recommendations with K-Means clustering
[Sarcasm Detection using Machine Learning in Spark] (https://www.lewisgavin.co.uk/Sarcasm-Detector/)
Better understand Textual data with Natural Language Processing

Everything In-Between

What did I learn from being a part of an evolving agile team at a large company.
I went to SAS Forum in the UK for their new product launch
Some android programming to build a fitness app - this was a neat algorithm to calculate how far you’ve ran without using GPS

What’s Next

I will continue to issue my newsletter in the new year so please sign up using the form at the top or bottom of this page.

I will also look to introduce new types of content to the blog. Don’t worry, the technical big data and data science posts will still exists however I will look to bring in posts on other topics such as process and behaviour improvement and the tools and techniques I use to improve how I live and work.

Thanks for a great year and I hope to see you all back here in 2017!