Excerpt from, Working with Big Data: Infrastructure, Algorithms, and Visualizations LiveLessons (6 Hours of Video Instruction) which presents a high level overview of big data and how to use key tools to solve your data challenges. This introduction to the three areas of big data includes:
• Infrastructure – how to store and process big data
• Algorithms – how to integrate algorithms into your big data stack and an introduction to classification
The goal was not to be exhaustive, but rather, to provide a higher level view of how all the pieces of a big data architecture work together.
Paul Dix is the author of “Service Oriented Design with Ruby and Rails.” He is a frequent speaker at conferences and user groups including Web 2.0, RubyConf, RailsConf, The Gotham Ruby Conference, and Scotland on Rails. Paul is the founder and organizer of the NYC Machine Learning Meetup, which has over 2,900 members. In the past he has worked at startups and larger companies like Google, Microsoft, and McAfee. Currently, Paul is a co-founder at Errplane, a cloud based service for monitoring and alerting on application performance and metrics. He lives in New York City.
In Unstructured Storage and Hadoop you learn how to set up Hadoop, load files into the Hadoop File System (HDFS), and write your first map reduce job.
In Structured Storage and Cassandra you will set up Cassandra, learn how to model data in Cassandra’s column oriented storage, use Cassandra from a Ruby library, and write data into Cassandra from a Hadoop map reduce job.
Real Time Processing and Messaging is about real-time processing with messaging systems. Specifically, you will learn about Kafka, an open-source distributed messaging system. You’ll install Kafka, read and write data from the messaging server, write data into Hadoop, and learn how to mplement highly available and scalable message consumers.
Working with Machine Learning Algorithms introduces you to machine learning and the k-nearest neighbors algorithm. In it you will implement k-nearest neighbors, prepare raw text for use with machine learning algorithms, and make predictions using k-nearest neighbors.
In Experimentation and Running Algorithms in Production you learn how to test the accuracy of machine
learning models and how to integrate them into a running big data architecture.
LiveLessons Video Training series publishes hundreds of hands-on, expert-led video tutorials covering a wide selection of technology topics designed to teach you the skills you need to succeed. This professional and personal technology video series features world-leading author instructors published by your trusted technology brands: Addison-Wesley, Cisco Press, IBM Press, Pearson IT Certification, Prentice Hall, Sams, and Que. Topics include: IT Certification, Programming, Web Development, Mobile Development, Home & Office Technologies, Business & Management, and more. View All LiveLessons