Big data 101

  • Published on
    15-Jan-2015

  • View
    2.617

  • Download
    2

Embed Size (px)

DESCRIPTION

Introduction to Big Data concept that talk at BugDay Bangkok 2013

Transcript

<ul><li> 1. Big Data 101#BugDay2013 @somkiat-- --</li></ul> <p> 2. Gartner Identify Top Technology2013 Big Data Modern Information Infrastructure Semantic Technology The Logical Data Warehouse NoSQL DBMS In-Memory Computing Information * http://www.gartner.com/newsroom/id/2359715 3. 1 ? 4. ? Social MediaSensor- Location - ClimateScientistMobile usageDataSocial MediaPurchase Tx Photo VDO 5. http://whatsthebigdata.com/2013/02/04/the-big-data-explosion-infographic/ 6. http://whatsthebigdata.com/2013/02/04/the-big-data-explosion-infographic/ 7. http://whatsthebigdata.com/2013/02/04/the-big-data-explosion-infographic/ 8. http://whatsthebigdata.com/2013/02/04/the-big-data-explosion-infographic/ 9. http://whatsthebigdata.com/2013/02/04/the-big-data-explosion-infographic/ 10. 2 Big Data !! 11. Big Data ? 12. ?Data = ?Information = ?Knowledge = ?Decision = ? 13. ? Data = Information = Knowledge = Decision = 14. Modelhttp://www.infogineering.net/data-information-knowledge.htm 15. 3Big Data 16. Big Data WikipediaBig data usually includes data sets with sizesbeyond the ability of commonly used softwaretools to capture, curate, manage, and processthe data within a tolerable elapsed time. 17. Big Data ( What ) ( How ) ( What with ) 18. Big Data 19. Big Data to YOU UP 20. Big Data (3V) Volume Variety Velocity 21. Big Data Volume GB -&gt; TB -&gt; PB -&gt; EB -&gt; ZB -&gt; YB EB = Zettabyte 1 EB = Trillion GB 1 EB = Billion TB 22. Big Data Variety Structured Unstructured Semistructured VDO, Photo, Audio, Document, Text Log, Monitoring Stock reacord, Transaction Need pre-processing and data cleaning 23. Big Data Velocity Batch, Near real time Stream processing Need real time Online VDO, Location tracking, AR 24. Real time processing model http://www-01.ibm.com/software/data/bigdata/ 25. 3V 26. Big Data ( IBM ) 4V Volume Variety Velocity Veracity Noise/Outlier 27. Example of Veracity issue Twitter Message from Spam bot ? Message from human ? Fake account ? 28. 4VVolumeVelocityValueVariety Veracity 29. Volume Variety 30. Visualization Big Data application Public data New information Data service 31. memory Volume, Variety Parallel Clustering Cloud 32. MapReduce Distributed File System Object Storage NoSQL 33. Distributed File System file, shard, chunk, block file server remoteserver concurrency distribution replication 34. Distributed File System Hadoop File System (HDFS) GlusterFS MogileFS Google File System ( GFS ) MooseFS 35. HDFS Architecturehttp://www.ibm.com/developerworks/library/wa-introhdfs/ 36. NoSQL Key-value Memcached, Redis, Riak Column Cassandra, HBase Graph Neo4J, FlockDB Document MongoDB, CouchDB 37. Testinghttp://searchengineland.com/why-big-testing-will-be-bigger-than-big-data-145452 38. Big Testinghttp://searchengineland.com/why-big-testing-will-be-bigger-than-big-data-145452 39. Big Testing Big Mistakehttp://searchengineland.com/why-big-testing-will-be-bigger-than-big-data-145452 40. Big Testing Big Data Big Testing 41. Cycle of Big Data 42. software Software Service 43. Big Data Big Data transform business IT 44. Big Data migrate Data Warehouse Big Data 3V + Technology 45. !! Big data that is very small Large datasets that arent bighttp://mike2.openmethodology.org/wiki/Big_Data_Definition 46. Nathan Marzhttp://www.slideshare.net/nathanmarz/the-secrets-of-building-realtime-big-data-systems 47. View 1Raw View 2DataView 3 48. TwitterURLTweetRetweetTrendTopic 49. Server Human error ( Low latency ) ( Scalable ) Debug 50. Architecture Batch Layer Speed Layer 51. Batch Layer ( High latency ) ( Horizontal ) Apache Hadoop MapReduce 52. Batch Layer Master data set view = function( Master data set ) 53. Batch Layer View 1 Batch View 2process View 3 54. Batch Layer Debug 55. Speed Layer Batch layer 56. Speed Layer Riak Cassandra HBase !! 57. Batch Layer MergeSpeed Layer 58. Batch layer Speed layer layer "Eventual Accuracy" 59. Data model record 60. Data model 1 A 1 B 10 A 61. Data model Master History 62. Storm Framework Real time Open source Free http://storm-project.net/ 63. Storm 64. This is Big DataYoull never walk alone 65. Big Data is ArtThank you </p>