Hadoop Fundamentals I

  • Published on
    01-Nov-2014

  • View
    415

  • Download
    2

Embed Size (px)

DESCRIPTION

IBM Innovation Center DACH/Zurich, Romeo Kienzler

Transcript

  • 1. 2013 IBM Corporation1 AVNET Hadoop Fundamentals I Romeo Kienzler IBM Innovation Center Zurich
  • 2. 2013 IBM Corporation2 1) Welcome 2) What is big data? 3) Introduction to Hadoop 4) BigInsights 5) Hadoop architecture 6) Lab 1 Core Hadoop 7) MapReduce 8) Lab 2 MapReduce 9) Pig, Jaql, Hive, BigSQL, SystemT/AQL 10) Lab 3 Pig, Hive, and Jaql 11) Certification on BigDataUniversity Agenda
  • 3. 2013 IBM Corporation3 What is BIG data?
  • 4. 2013 IBM Corporation4 Traditional Business Intelligence / Data Warehousing ...60 percent, were unsatisfied with their data warehousing system. http://www.information-management.com/issues/20010601/3494-1.html
  • 5. 2013 IBM Corporation5 What is BIG data?
  • 6. 2013 IBM Corporation6 What is BIG data?
  • 7. 2013 IBM Corporation7 What is BIG data? Big Data Hadoop
  • 8. 2013 IBM Corporation8 What is BIG data? Business Intelligence Data Warehouse
  • 9. 2013 IBM Corporation9 Map-Reduce Hadoop BigInsights
  • 10. 2013 IBM Corporation1010 Why is Big Data important? Data AVAILABLE to an organization data an organization can PROCESS Missed opportunity Enterprises are more blind to new opportunities. Organizations are able to process less and less of the available data. 100 Millionen Tweets are posted every day, 35 hours of video are beeing uploaded every minute,6.1 x 10^12 text messages have been sent in 2011 and 247 x 10^9 E-Mails passed through the net. 80 % spam and viruses. => Prefiltering is more and more important.
  • 11. 2013 IBM Corporation11 Why is Big Data important?
  • 12. 2013 IBM Corporation12 Why is Big Data important?
  • 13. 2013 IBM Corporation13 Why is Big Data important?
  • 14. 2013 IBM Corporation1414 Volume Terabytes, petabytes, even exabytes Variety All kinds of data All kinds of analytics Velocity Agility Analyze data in. . . Hours instead of days Days instead of weeks Dynamically responsive Rapid data exploration Traditional / Non-traditional data sources Store Analyze Explore What is BIG data? Volume*Variaty*Velocity=Value
  • 15. 2013 IBM Corporation15 BigData Analytics
  • 16. 2013 IBM Corporation16 BigData Analytics Predictive Analytics
  • 17. 2013 IBM Corporation17 BigData Analytics Predictive Analytics
  • 18. 2013 IBM Corporation18 BigData Analytics Correlation / Text / NLP
  • 19. 2013 IBM Corporation19 BigData Analytics Feature Extraction Feature extraction involves simplifying the amount of resources required to describe a large set of data accurately : Wikipedia
  • 20. 2013 IBM Corporation20 BigData Analytics Predictive Analytics Storage / DataCPUs / Algorithm Business Value / Insight
  • 21. 2013 IBM Corporation21 BigData Analytics Predictive Analytics "sometimes it's not who has the best algorithm that wins; it's who has the most data." (C) Google Inc. The Unreasonable Effectiveness of Data http://www.csee.wvu.edu/~gidoretto/courses/2011-fall-cp/reading/TheUnreasonable%20EffectivenessofData_IEEE_IS2009.pdf No Sampling => Work with full dataset => Long Tail Distributions
  • 22. 2013 IBM Corporation22 Realtime / In-Memory Computing: InfoSphere Streams / Watson
  • 23. 2013 IBM Corporation23
  • 24. 2013 IBM Corporation24
  • 25. 2013 IBM Corporation25
  • 26. 2013 IBM Corporation26 The Paris Hilton Problem Watson Workshop: What is Watson?
  • 27. 2013 IBM Corporation27 Introduction to Hadoop
  • 28. 2013 IBM Corporation28
  • 29. 2013 IBM Corporation29 BigInsights
  • 30. 2013 IBM Corporation30
  • 31. 2013 IBM Corporation31 BigInsights Demonstration
  • 32. 2013 IBM Corporation32 Hadoop Architecture
  • 33. 2013 IBM Corporation33
  • 34. 2013 IBM Corporation34
  • 35. 2013 IBM Corporation35 HDFS Hadoop File System
  • 36. 2013 IBM Corporation36
  • 37. 2013 IBM Corporation37
  • 38. 2013 IBM Corporation38
  • 39. 2013 IBM Corporation39
  • 40. 2013 IBM Corporation40
  • 41. 2013 IBM Corporation41
  • 42. 2013 IBM Corporation42
  • 43. 2013 IBM Corporation43
  • 44. 2013 IBM Corporation44
  • 45. 2013 IBM Corporation45
  • 46. 2013 IBM Corporation46
  • 47. 2013 IBM Corporation47
  • 48. 2013 IBM Corporation48
  • 49. 2013 IBM Corporation49
  • 50. 2013 IBM Corporation50
  • 51. 2013 IBM Corporation51
  • 52. 2013 IBM Corporation52
  • 53. 2013 IBM Corporation53
  • 54. 2013 IBM Corporation54 Lab 1 Hadoop Architecture 1)Start from chapter 1.2 2)Replace /home/biadmin with /home/biadminX where X is your user ID 3)In chapter 1.3 skip task 1.3.1._1 and go to http://10.199.20.51:8080 instead 4)Skip 1.3.5 5)In chapter 1.3.6._30 use any file you like on your desktop computer
  • 55. 2013 IBM Corporation55 Map-Reduce
  • 56. 2013 IBM Corporation56
  • 57. 2013 IBM Corporation57
  • 58. 2013 IBM Corporation58
  • 59. 2013 IBM Corporation59
  • 60. 2013 IBM Corporation60