Building Data-Centric Businesses

  • Published on
    10-Jan-2017

  • View
    14.699

  • Download
    0

Embed Size (px)

Transcript

<ul><li><p>Daniel Aragao &amp; Simon Hope</p></li><li><p>Daniel Aragao Simon Hope@dear_dr_dan @mapbutcher</p></li><li><p>REALESTATE.COM.AU</p><p>6BMarket Cap</p><p> 11MAustralian Properties</p><p> 55MVisits in September</p><p>4.7MApp Downloads and counting</p><p>http://realestate.com.au</p></li><li><p>3,500PEOPLE</p><p>13COUNTRIES</p><p>34OFFICES</p><p>TECHNOLOGY &amp; </p><p>SOCIAL JUSTICE</p></li><li><p> In the beginning </p><p> Organising our Data </p><p> Implementation approaches </p><p> Hipster Batches </p><p> Reactify </p><p> Bring Your Own Data </p><p> Finding the Data </p><p> What we have learned so far</p><p>THIS IS WHAT THE STORY IS ABOUT</p></li><li><p>SORRY ITS OK TO LEAVE NOW</p><p> Nope, we didnt create a new Hadoop </p><p> No hardcore Data Science </p><p> There are some implementation details </p><p> REA embraced the Cloud. AWS everywhere </p><p> Under construction</p></li><li><p>IN THE BEGINNING</p></li><li><p>ORGANISING OUR DATA</p><p>Increasingly, content is being distributed through searchand social platforms... Theres less visiting of publishers as destinations.</p><p>Jeff Weiner, CEO, Linkedin</p></li><li><p>Data sources</p><p>Data warehouse</p><p>PROBLEM</p></li><li><p>STRATEGY</p></li><li><p>STRATEGY</p></li><li><p>STRATEGY</p></li><li><p>Data Warehouse</p><p>StagingSSIS Dim Fact</p><p>PROBLEM</p></li><li><p>Data Warehouse</p><p>StagingSSIS Dim Fact</p><p>PROBLEM</p><p>Star schema leaky details</p></li><li><p>No Data Warehouse</p><p>StagingSSIS Dim Fact</p><p>STRATEGY</p></li><li><p>STRATEGY</p><p>Data Warehouse Facade</p><p>StagingSSIS Dim Fact</p></li><li><p>???</p><p>WHATS IN THE BOX?</p></li><li><p>Good things come in small packages services </p><p>THE HIPSTER BATCH</p><p>???</p><p>Hipster Batch</p></li><li><p>Hipster Batch</p><p>THE HIPSTER BATCH</p><p> Small and short lived </p><p> Decoupled via flat files via S3 </p><p> Single purpose</p><p> Idempotent </p><p> Polyglot </p><p> Minimal runtime dependencies </p><p> Discoverable</p></li><li><p>SNS, SQS</p><p>Data </p><p>A TYPICAL IMPLEMENTATIONHipster Batch</p></li><li><p>SNS, SQS</p><p>ASG, ECS, Lambda</p><p>Data </p><p>A TYPICAL IMPLEMENTATIONHipster Batch</p></li><li><p>SNS, SQS</p><p>ASG, ECS, Lambda</p><p>KMS</p><p>Data </p><p>A TYPICAL IMPLEMENTATIONHipster Batch</p></li><li><p>Logs</p><p>SNS, SQS</p><p>ASG, ECS, Lambda</p><p>KMS</p><p>Data </p><p>A TYPICAL IMPLEMENTATIONHipster Batch</p></li><li><p>Logs</p><p>SNS, SQS</p><p>ASG, ECS, Lambda</p><p>KMS</p><p>Cloudwatch</p><p>Data </p><p>A TYPICAL IMPLEMENTATIONHipster Batch</p></li><li><p>Logs</p><p>SNS, SQS</p><p>ASG, ECS, Lambda</p><p>KMS</p><p>Cloudwatch</p><p>S3 buckets</p><p>Data </p><p>A TYPICAL IMPLEMENTATIONHipster Batch</p></li><li><p>Hipster Batch</p><p>HIPSTER BATCH DOES SCIENCE</p><p> Behavioural models for targeted marketing </p><p> Recommendation engine </p><p> External channels</p></li><li><p>Hipster BatchSCIENCE!</p></li><li><p>x 20</p><p>Hipster Batch</p><p>Stats models</p><p>SCIENCE!</p></li><li><p>x 20</p><p>API</p><p>Hipster Batch</p><p>Stats models</p><p>SCIENCE!</p></li><li><p>API</p><p>x 20</p><p>API</p><p>Hipster Batch</p><p>Stats models</p><p>SCIENCE!</p></li><li><p>API</p><p>x 20</p><p>API</p><p>Hipster Batch</p><p>Stats models</p><p>SCIENCE!</p></li><li><p>API</p><p>x 20</p><p>API</p><p>Hipster Batch</p><p>Stats models</p><p>GoogleNowAPI</p><p>SCIENCE!</p></li><li><p>From legacy to reactive </p><p>REACTIFY</p><p>Reactify</p><p>???</p></li><li><p>Reactify</p><p>http://www.reactivemanifesto.org</p><p>REACTIFY</p><p> Manage Data flow with messages </p><p> Protect consumers and care about isolation </p><p> Resilience is important and Data replication is just fine </p><p> Demand is elastic - and your components should be too</p></li><li><p>Reactify</p><p>Listings</p><p>Data coupling</p><p>No resilience or elasticity</p><p>Coupling</p><p>PROBLEM</p></li><li><p>Reactify</p><p>Listings</p><p>SOLUTION</p></li><li><p>Reactify</p><p>Listings Reactify</p><p>SOLUTION</p></li><li><p>Reactify</p><p>Listings Reactify</p><p>SOLUTION</p></li><li><p>Reactify</p><p>Listings ReactifyHipster Batch</p><p>SOLUTION</p></li><li><p>Reactify</p><p>Listings ReactifyHipster Batch</p><p>Shielded consumers</p><p>IsolationDecoupled</p><p>SOLUTION</p></li><li><p>Reactify</p><p>Listings</p><p>IMPLEMENTATION</p></li><li><p>Reactify</p><p>ListingsRESTAPI</p><p>IMPLEMENTATION</p></li><li><p>Reactify</p><p>ListingsRESTAPI</p><p>IMPLEMENTATION</p></li><li><p>Reactify</p><p>ListingsRESTAPI Dynamo</p><p>Event Maker</p><p>Event Differ</p><p>IMPLEMENTATION</p></li><li><p>Reactify</p><p>ListingsRESTAPI Dynamo</p><p>Event Maker</p><p>Event Differ</p><p>Kinesis</p><p>2</p><p>IMPLEMENTATION</p><p>2</p></li><li><p> Exposes current state only </p><p> Stream of change notifications </p><p> Hypertext Application Language - HAL </p><p> Clear entity types </p><p> Linking over embedding </p><p> Cacheable and discoverable</p><p>REST API</p><p>REACTIFY REST API</p></li><li><p>REST API</p><p>https://feeds.listings.realestate.com.au/combined-listings/120449689</p></li><li><p>REST API</p><p>https://feeds.listings.realestate.com.au/combined-listings/120449689</p></li><li><p>REST API</p><p>https://feeds.listings.realestate.com.au/combined-listings/120449689</p></li><li><p>REST API</p><p>https://feeds.listings.realestate.com.au/combined-listings/120449689</p></li><li><p>REST API</p><p>Event Maker</p><p>https://feeds.listings.realestate.com.au/combined-listings/-/changes</p></li><li><p>REST API</p><p>Event Maker</p><p>https://feeds.listings.realestate.com.au/combined-listings/-/changes</p></li><li><p>REST API</p><p>Event Maker</p><p>https://feeds.listings.realestate.com.au/combined-listings/-/changes</p></li><li><p>REST API</p><p>Event Maker</p><p>https://feeds.listings.realestate.com.au/combined-listings/-/changes</p></li><li><p>Reactify</p><p>Event Differ</p></li><li><p>Reactify</p><p>Event Differ</p></li><li><p>Reactify</p><p>Event Differ</p></li><li><p>Reactify</p><p>Event Differ</p></li><li><p>The octopus in the box </p><p> Did you use that data set? Errr No, we have another one </p><p>BRING YOUR OWN DATA</p></li><li><p>BRING YOUR OWN DATA - BYOD</p><p> Allow data to flow freely </p><p> Help the business to get what they need when they need it </p><p> Self-service</p></li><li><p>BYOD</p></li><li><p>BYOD</p><p>CSV</p></li><li><p>BYOD</p><p>CSV</p><p>x 5</p></li><li><p>BYOD</p><p>CSV</p><p>x 5</p><p>Smarts on datatypes</p></li><li><p>BYOD</p><p>CSV</p><p>x 5</p><p>TableauServer</p><p>Smarts on datatypes</p></li><li><p>BYOD</p><p>CSV</p><p>x 5</p><p>TableauServer</p><p>Smarts on datatypes</p></li><li><p>BYOD</p><p>CSV</p><p>x 5</p><p>TableauServer</p><p>Audit, auth, share</p><p>Smarts on datatypes</p></li><li><p>These were the implementation approaches, now to </p><p>FIND THE DATA </p><p>Meaningful, automated, and easy-to-search metadata</p></li><li><p>WE TRIED</p></li><li><p>SNS, SQS</p><p>ASG, ECS, Lambda</p><p>KMS</p><p>Cloudwatch</p><p>Logs</p><p>MORE THAN DATAHipster Batch</p></li><li><p>SNS, SQS</p><p>ASG, ECS, Lambda</p><p>KMS</p><p>Cloudwatch</p><p>Logs</p><p>MORE THAN DATAHipster Batch</p></li><li><p>SNS, SQS</p><p>ASG, ECS, Lambda</p><p>KMS</p><p>Cloudwatch</p><p>Logs</p><p>Dataz</p><p>Ancestry</p><p>MORE THAN DATAHipster Batch</p></li><li><p>SNS, SQS</p><p>ASG, ECS, Lambda</p><p>KMS</p><p>Cloudwatch</p><p>Logs</p><p>Dataz</p><p>Ancestry</p><p>Metadata</p><p>MORE THAN DATAHipster Batch</p></li><li><p>Ancestry</p></li><li><p>Ancestry</p></li><li><p>Ancestry</p></li><li><p>Ancestry</p></li><li><p>Ancestry</p></li><li><p>RESTAPI</p><p>METADATA PIPELINE</p><p>Producers</p></li><li><p>RESTAPI</p><p>Ancestry</p><p>Ancestry</p><p>Ancestry</p><p>METADATA PIPELINE</p><p>Producers</p></li><li><p>RESTAPI</p><p>Ancestry</p><p>Ancestry</p><p>Ancestry</p><p>METADATA PIPELINE</p><p>Producers</p></li><li><p>RESTAPI</p><p>Ancestry</p><p>Ancestry</p><p>Ancestry</p><p>METADATA PIPELINE</p><p>Producers</p><p>Scrapy</p></li><li><p>RESTAPI</p><p>Ancestry</p><p>Ancestry</p><p>Ancestry</p><p>METADATA PIPELINE</p><p>Producers</p><p>Scrapy</p></li><li><p>RESTAPI</p><p>Ancestry</p><p>Ancestry</p><p>Ancestry</p><p>METADATA PIPELINE</p><p>Producers</p><p>Scrapy</p></li><li><p>WHAT WE HAVE LEARNED SO FAR</p><p> Consumers create the last-mile data as needed </p><p> We must work with external, independent delivery channels </p><p> Push quality back to source/producer systems </p><p> Data belongs to the entire organisation, not to a single team</p></li><li><p>Ill give you my Data Warehouse when you can pry it from my cold dead hands.</p></li><li><p>THANK YOU </p><p>Daniel Aragao Simon Hope@dear_dr_dan @mapbutcher</p><p>REALESTATE.COM.AU</p></li></ul>