Обработка данных в RTB: быстро, дешево и на 98% точно

  • Published on
    19-Jun-2015

  • View
    1.637

  • Download
    0

Embed Size (px)

DESCRIPTION

Highload++2014

Transcript

  • RTB: , 98%

  • @facultyofwonder

  • -

  • max, min, mean

  • ?

  • ?

  • reservoir sampling?

  • frugal
  • def frugal(stream):

    m = 0

    for val in stream:

    if val > m:

    m += 1

    elif val < m:

    m -= 1

    return m

  • - 1 int!

    def frugal(stream):

    m = 0

    for val in stream:

    if val > m:

    m += 1

    elif val < m:

    m -= 1

    return m

  • ?

  • : bit.ly/frugalsketch

    def frugal_1u(stream, m=0, q=0.5):

    for val in stream:

    r = np.random.random()

    if val > m and r > 1 - q:

    m += 1

    elif val < m and r > q:

    m -= 1

    return m

  • + =

  • ?

    aka

  • ?

    , ,

  • ?

  • :1010 ,

    109 int3240

  • : -

  • -:4

  • HyperLogLog:1.5, 2%

  • LogLog

  • , ,

  • 2 100

  • ?

  • ( 0 = )

  • , !*

    * -

  • :0xxxxxx - ~50%1xxxxxx - ~50%00xxxxx - ~25%

    ..

  • - 2R, R -

  • -

  • , - 2R, R -

  • M ,

    R

  • -

  • LogLog

  • LogLog

    SuperLogLog

  • LogLogSuperLogLog

    HyperLogLog ,

    +

  • -

  • LogLogSuperLogLogHyperLogLog

    HyperLogLog++Google, 2013

    32 -> 64bit +

    bit.ly/HLLGoogle

  • LogLogSuperLogLogHyperLogLog

    HyperLogLog++

    Discrete Max-CountFacebook, 2014

    bit.ly/DiscreteMaxCount

  • Large scale?

  • !

  • HLL-,

  • Voila!

  • : bit.ly/hyperloglog

  • bit.ly/SlidingHLL

  • -

  • ih1

    h2

    hk

    1 1 10 0 0 0 0 0 0 0 0 0 0 0 0

  • ?

  • - Count-Min bit.ly/CountMinSketch

  • wi

    +1

    +1

    +1

    h1h4

    hd

    d

    - d .

  • ,

    -

  • :

    ,,

    ( )

  • : stay tuned!

  • : Neustar Research:bit.ly/NRsketches :bit.ly/SketchesOverview :bit.ly/streaming-lectures

  • happy sketching!

    , ? pavel@rutarget.ru

  • :

    HyperLogLog SQL:bit.ly/HLLinSQL