Big Data and Data Science Recap 2016
As the year draws to a close, I’d like to take the opportunity to reflect on what I’ve learnt in 2016. Towards the start of this year, I started this blog as a tool to simply store information on topics I’d learnt so that I could revisit them when needed. I then had a revelation that this information would not only be useful to me, but to others in my field: so I began sharing it!
I integrated Google Analytics and thicmds has allowed me to track how the blog has grown from a couple hundred sessions a month to over two thousand and has been slowly rising!
The spike has been due to the increased volume of blog posts, after setting myself a challenge to write a post a week since around June/July.
So what have I learnt and what could you learn from my posts? And what will I be doing with the blog next year?
End of Year Recap!
10 Steps to Big Data
- Learn how to ingest data from your Data Warehouse into Hadoop with Apache SQOOP and Apache Flume!
- Transform your data into Hive structures ready for analytics using Pentaho Data Integration for ETL
- Find out what I learnt from the Strata and Hadoop World Conference in London
- Understand Real Time Streaming concepts through Apache Kafka and How to scale spark streaming applications
- Use Apache Kudu for RDMS like data storage alongside your Hadoop cluster
- My most popular post of the year! Learn how to improve the performance of your Spark applications
- Build a search engine with Elasticsearch
- Begin performing analytics on your data in Real-Time with Apache Spark
- Follow the Big Data Journey to learn the whole Big Data pipeline from ingestion to analytics - Part 1, Part 2, Part 3 and Part 4
- Understand the differences between RDBMS and NoSQL data stores and gain knowledge of how to use Apache HBase
The road to Data Science
- Take the Data Science 101 class
- See how we run our first Data Science Hackathon at Capgemini
- Step into Machine Learning with this simple intro on Naive Bayes
- Step 2 of classification - Support Vector Machines
- Finishing off classification with Decision Trees
- Moving into Clustering for recommendations with K-Means clustering
- [Sarcasm Detection using Machine Learning in Spark] (https://www.lewisgavin.co.uk/Sarcasm-Detector/)
- Better understand Textual data with Natural Language Processing
Everything In-Between
- What did I learn from being a part of an evolving agile team at a large company.
- I went to SAS Forum in the UK for their new product launch
- Some android programming to build a fitness app - this was a neat algorithm to calculate how far you’ve ran without using GPS
What’s Next
I will continue to issue my newsletter in the new year so please sign up using the form at the top or bottom of this page.
I will also look to introduce new types of content to the blog. Don’t worry, the technical big data and data science posts will still exists however I will look to bring in posts on other topics such as process and behaviour improvement and the tools and techniques I use to improve how I live and work.
Thanks for a great year and I hope to see you all back here in 2017!