Elastic @Deezer

  • Published on
    08-Apr-2017

  • View
    111

  • Download
    0

Embed Size (px)

Transcript

  • Elastic @DeezerAurlien Saint Requier, Search Data Scientist

    ELASTIC @DEEZER

  • /01

    /02

    /03

    /04

    Where?

    Elasticsearch architecture

    Querying Elasticsearch

    ELK stack for analysis

    Table of contents

    ELASTIC #DEEZER

  • Where?

    /01

    ELASTIC @DEEZER

  • For search features

    ELASTIC #DEEZER 4

  • For chart and new release features

    ELASTIC #DEEZER 5

  • For recommendation features

    ELASTIC #DEEZER 6

  • Elasticsearch Architecture

    /02

    ELASTIC @DEEZER

  • Elasticsearch architectureOur needs

    ELASTIC #DEEZER 8

    Search and recommend

    3 millions of artists

    5 millions of albums

    50 millions of tracks

    2 millions of playlists

    Search and recommend content based on

    metadata and other features

    tag description

    New releases should become available in less than 2 hours

    Queries have to respond in less than 100ms

  • Elasticsearch architectureOverview

    ELASTIC #DEEZER 9

  • Elasticsearch architectureData workflow

    ELASTIC #DEEZER 10

  • Elasticsearch architectureData workflow

    ELASTIC #DEEZER 11

  • How we deploy full indexes in production ?

    ELASTIC #DEEZER

    1. Get json data from Hadoop cluster (using WebHDFS)2. Index documents on mastersearch (using ES bulk api)3. Package the new index :

    3.1. compress the ES index directory3.2. generate a deployment script

    4. Copy the package on the temporary node of each cluster (using assassin, an homemade rsync deploy script)

    5. Run deployment script : 5.1. Start a temporary ES instance and load the new index5.2. Set the required number of replica 5.3. Wait until data is replicated and then shutting down the

    temporary ES instance5.4. Warm the new index5.5. Switch alias on the new index and close the old index

    12

  • Querying Elasticsearch

    /03

    ELASTIC @DEEZER

  • How we analyze musical data?

    ELASTIC #DEEZER 14

    Use custom analyzers

    Black Pearl (He's A Pirate) [feat. Sidney Housen] - EP

    The Black Eyed Peas

    Lowercase asciifolding and char filters, music field synonyms :

    Edge_ngram tokenizer :

  • How we search in our data?

    ELASTIC #DEEZER 15

    Using a Java internal Elasticsearch plugin :

  • How we search in our data?

    ELASTIC #DEEZER 16

    Using Multi Search API and Query DSL:

  • How we recommend our data?

    ELASTIC #DEEZER 17

    Using function score queries :

  • How we explore our data?

    ELASTIC #DEEZER 18

    Using aggregation:

  • Some feedbacks

    ELASTIC #DEEZER

    In numbers: More 25 millions queries a day, around 5000 queries / minute Around 95% queries respond in less 100ms

    In lessons : Be careful with fielddata usage Big jvm ES instance = Long gc time Avoid prefix queries : use edge-ngram tokenizer and do match

    queries*

    In future : Use a dedicated client/data/master architecture Stop fuzzy queries (replaced by a Did you mean approach)* Migrate to Elasticsearch v2

    19

    *https://www.elastic.co/blog/elasticsearch-queries-or-term-queries-are-really-fast

    https://www.elastic.co/blog/elasticsearch-queries-or-term-queries-are-really-fast

  • ELK for analysis

    /04

    ELASTIC @DEEZER

  • Use of ELK

    ELASTIC #DEEZER

    Elasticsearch v1.7.5 : cluster of 3 nodes index logs from Logstash and homemade scripts Around 2 billions of documents

    Logstash 1.5 Kibana v 4.1.1

    26 dashboards / 189 visualisations Tools:

    curator for index retention elasticdump for saving kibana settings

    21

  • Use casesMonitoring

    ELASTIC #DEEZER 22

  • Use casesAnalysis what our users search

    ELASTIC #DEEZER 23

  • Thanks for your attention

    We are hiring !

    jobs.deezer.com

    Questions?

    https://www.deezer.com/company/jobshttps://www.deezer.com/company/jobs