numpy and numba are popular Python libraries for processing large quantities of data. This talk explains how numpy/numba work under the hood and how they use vectorisation to process large amounts of data extremely quickly. We use these tools to reduce the processing time of a large, real 600GB dataset from one month to 40 minutes, even when the code is run on a single Macbook
Python for Data Analysis and Visualization - 32 HD Hours !
Fundamentals of Data Analysis for Big Data
Statistics for Data Analysis Using R
Big Data Internship Program - Data Ingestion-Sqoop and Flume