Metrics: where and how

  • Published on
    25-Jan-2017

  • View
    1.275

  • Download
    0

Embed Size (px)

Transcript

Metrics: where and howgraphite-oriented story

Vsevolod PolyakovPlatform Engineer at Grammarly

2,5

, ,

GraphiteAll whisper-based systems

, .

Default graphite architecture

what?

RRD-like (gram.ly/gfsx)so.it.is.my.metric /so/it/is/my/metric.wspFixed retention (by name\pattern)Fixed size (actually no)

,

Retention and size

1s:1d 1 036 828 bytes10s:10d 1 036 828 bytes1s:365d 378 432 028 bytes (1 TB ~ 3 000)10s:365d 37 843 228 bytes (1 TB ~ 30 000)whisper calc

Retention and size

10s:30d,1m:120d,10m:365d 4 564 864 bytes240 864 metrics in 1 TBaggregation: average, sum, min, max, and last.can be assign per metric

Howterraform (https://www.terraform.io/)docker (https://www.docker.com/)ansible (https://www.ansible.com/)rocker (https://github.com/grammarly/rocker)rocker-compose (https://github.com/grammarly/rocker-compose)

Default graphite architecture

carbon-cache.py

single-coremany options in config filedefaultlink

architecturecarbon-cache.py

Start load testingm4.xlarge instance (4 CPU, 16 GB ram, 256 GB disk EBS gp2)retentions = 1s:1dMAX_CACHE_SIZE, MAX_UPDATES_PER_SECOND, MAX_CREATES_PER_MINUTE = infdefaultsalmost 1.5h to get limit :(

, EBS

carbon-cache.py cache size 75k req\s

results75 000 req\s max60 000 req\s flagman speedI\O :(

Try to tune!WHISPER_SPARSE_CREATE = true (dont allocate space on creation) non-linear I\O load.CACHE_WRITE_STRATEGY = sorted (default)

cache size 1k 195k req\s

results120 000 req\s flagman speedcache flush problem :(

Try to tune!CACHE_WRITE_STRATEGY = max will give a strong flush preference to frequently updated metrics and will also reduce random file-io.

from 1k to 150k

results90 000 req\s flagman speedcache flush problem :(

Try to tune!CACHE_WRITE_STRATEGY = naive just flush. Better with random I\O.

from 45k to 135k

results120 000 req\s flagman speed still CPU

sortedmaxnaive

Maybe its I\O EBS limitation? 512 GB disk. No.

go-carbon

multi-core single daemonwritten in golangnot many options to tune :(link

Start load testingm4.xlarge instance (4 CPU, 16 GB ram, 256 GB disk EBS gp2)retentions = 1s:1dmax-size = 0max-updates-per-second = 0almost 1h to get limit :(

, EBS

1k 130k req\s ~3k/min

results120 000 req\s flagman speed but its without sparse. try to implement

try to tune!remaining := whisper.Size() - whisper.MetadataSize()whisper.file.Seek(int64(remaining-1), 0)whisper.file.Write([]byte{0})chunkSize := 16384zeros := make([]byte, chunkSize)for remaining > chunkSize {// if _, err = whisper.file.Write(zeros); err != nil {// return nil, err// }remaining -= chunkSize}if _, err = whisper.file.Write(zeros[:remaining]); err != nil {return nil, err}

180 000 req\s !

try to tune!max update operation = 1500

resultsTLDR 210 000 - 240 000 req\s flagman speed31 000 000 cache size!

try to tune!max update operation = 0input-buffer = 400 000

results270 000 req\s flagman speed10-20 million req cache size!

try to tune!vm.dirty_background_ratio=40vm.dirty_ratio=60

300 000 req\s

results300 000 req\s flagman speed180k+ req\s without cache

Re:Lays

Default graphite architecture

arch forward

arch named\regexp

arch hash

arch hash replicafactor: 2

carbon-relay.pytwisted basednative

Start load testingc4.xlarge instance (4 CPU, 7.5 GB ram)~1 Gb landefault parametershashing10 connections

WTF!

carbon-relay-nggolang-basedweb-panellive-updatesaggregatorsspoolinglink

Recommended

View more >