Разработка высокопроизводительных серверных приложений для Linux/Unix (Александр Крижановский)

  • Published on
    17-Dec-2014

  • View
    1.051

  • Download
    3

Embed Size (px)

DESCRIPTION

 

Transcript

  • 1. Linux/UNIX ak@natsys-lab.com10/21/12
  • 2. (front-end server)10/21/12
  • 3. 10/21/12
  • 4. Original: for (int i = 0; i < 10; ++i) foo(i); Thread 1: Thread 2: for (int i = 0; i < 5; ++i) for (int i = 5; i < 10; ++i) foo(i); foo(i); : , CPU.10/21/12
  • 5. Original: foo(); bar(); Thread 1: Thread 2: boo(); bar(); : , , , . .10/21/12
  • 6. vs task_struct, do_fork() , .PostgreSQL: , shared memory buffer pool, shared memory10/21/12
  • 7. ? N kernel-thread O(1) O(log(N)) Kernel-thread ~ process: ~6 => M CPU => cache hit ( ) , coroutines; events...10/21/12
  • 8. Hyper Threading cache hit rate => , 10/21/12
  • 13. pthread_spin_lock () static volatile int lock = 0; void lock() { while (!__sync_bool_compare_and_swap(&lock, 0, 1)); } void unlock() { lock = 0; }10/21/12
  • 14. pthread_spin_lock (scheduled) CPU0 CPU1 Thread0: lock() . . . Thread0: some work Thread1: try lock() loop . . . Thread2: try lock() loop // Thread0 preempted . . . (loop) . . . Thread1: try lock() loop Thread2: try lock() loop // 200% CPU usage . . . // Thread2 preempted . . . Thread0: unlock() preempt_disable()10/21/12
  • 15. pthread_rwlock pthread_mutex ( ~ 1.5 , ) lock contention , per-cpu10/21/12
  • 16. Lock contention , ( ) : (CPU, IO etc), RPS : , 10/21/12
  • 17. Lock upgrade pthread_rwlock_rdlock(&lock); if (v > shared) { pthread_rwlock_unlock(&lock); return; } pthread_rwlock_unlock(&lock); pthread_rwlock_wrlock(&lock); if (v > shared) { pthread_rwlock_unlock(&lock); return; } shared = v; pthread_rwlock_unlock(&lock);10/21/12
  • 18. 10/21/12
  • 19. Data cache (L1d, L2d) Instruction cache (L1i, L2i) TLB (L1, L2) L3 10/21/12
  • 20. getconf # getconf LEVEL1_DCACHE_SIZE 65536 # getconf LEVEL1_DCACHE_LINESIZE 64 # grep -c processor /proc/cpuinfo 210/21/12
  • 21. MESI (Modified, Exclusive, Shared, Invalid) RFO (Request For Ownership) := M I CPU1 X: X M CPU2 X: CPU1: X I CPU2: X M10/21/12
  • 22. : std::shared_ptr std::shared_ptr reference counter , 10/21/12
  • 23. False sharing MESI cacheline RFO cacheline, RFO10/21/12
  • 24. GCC , 10/21/12
  • 25. 1. ( ) : ( )2. 3. Flush- 10/21/12
  • 26. HTTP IM ID 10/21/12
  • 27. : , (L1d, L2d, L3d, TLB L1d/L2d etc) TLB (~1024 ) 4 => : 10/21/12
  • 28. Page Table10/21/12
  • 29. & page table Application Tree Page table a.1 a b a.0 c c.0 c.110/21/12
  • 30. Radix-tree : IPv4 2 - 4 (Linux VMM )10/21/12
  • 31. {B,T}-tree , RAM-only 10/21/12
  • 32. , 10/21/12
  • 33. , : 1 - ( ) 10/21/12
  • 34. : (1) , ( ) 10/21/12
  • 35. : 10/21/12
  • 36. LRU (Least Recently Used): (O(1)), fundamental locality Clok: ( concurrency) LRFU: LFU (Least Frequently Used), , c CAR (Clock with Adaptive Replacement)/CART (CAR with Temporal filtering): fundamental advanced locality, 10/21/12
  • 37. : 10/21/12
  • 38. 1. Flushing-thread: 2. : / soft-realtime 10/21/12
  • 39. Lock-free ( , ) - lock contention : , .10/21/12
  • 40. 1 , N push() pop() ( ) () : std::queue, mutex lock contention10/21/12
  • 41. (Read-Modify-Write) unsigned long a, b = 1; b = __sync_fetch_and_add(&a, 1); mov $0x1,%edx lock xadd %rdx,(%rax)10/21/12
  • 42. (Compare-And-Swap) unsigned long val = 5; __sync_val_compare_and_swap(&val, 5, 2); mov $0x5,%eax mov $0x2,%ecx lock cmpxchg %rcx,(%rdx)10/21/12
  • 43. () cache coherency (MESI) , .. RFO shared_ptr reference counting10/21/12
  • 44. Lock-free (): push()push (q: pointer to fifo, cl: pointer to cell): Loop: cl->next = q->tail if CAS (&q->tail, cl->next, cl): break10/21/12
  • 45. Lock-free (): pop()pop (q: pointer to fifo): Loop: cl = q->head next = cl->next if CAS (&q->head, cl, next): break return cl10/21/12
  • 46. ABA problem T1 A, T1 , T2, T2 A B A, T1 , , , 10/21/12
  • 47. Lock-free : ABApop (q: pointer to fifo): Loop: cl = q->head next = cl->next # cl = A, next = B ------->scheduled # pop(A), pop(B), push(A) if CAS (&q->head, cl, next): # cl->head => B ( C) break return cl10/21/12
  • 48. ABA pop() push() CAS2 (Double CAS) : CMPXCHG16B x86-6410/21/12
  • 49. Lock-free Ring-Buffer: (0)10/21/12
  • 50. Lock-free Ring-Buffer: (1) push(): while (tail_ + Q_SIZE < head_) sched_yield(); Thread 1 Thread 2 read tail_ read tail_ read head_ read head_ ------->scheduled push an element push an element10/21/12
  • 51. Lock-free Ring-Buffer: (2) ( )push(): unsigned long tmp_head = __sync_fetch_and_add(&head_, 1); while (tail_ + Q_SIZE < tmp_head) sched_yield(); ptr_array_[tmp_head & Q_MASK] = x;10/21/12
  • 52. Lock-free Ring-Buffer: (3) Per-thread current LH & LTpush(): volatile unsigned long last_head_; volatile unsigned long last_tail_; auto min = tail_; for (size_t i = 0; i < n_consumers_; ++i) { if (thr_p_[i].tail < min) min = thr_p_[i].tail; } last_tail_ = min;10/21/12
  • 53. Lock-free Ring-Buffer: (4) tail i- push(): auto min = tail_; for (size_t i = 0; i < n_consumers_; ++i) { auto tmp_t = thr_p_[i].tail; if (tmp_t < min) min = tmp_t; } last_tail_ = min;10/21/12
  • 54. Lock-free Ring-Buffer: (5) tail i- pop(): thr_pos().tail = __sync_fetch_and_add(&tail_, 1); // ......... T *ret = ptr_array_[thr_pos().tail & Q_MASK]; thr_pos().tail = ULONG_MAX; return ret;10/21/12
  • 55. Lock-free Ring-Buffer: (6) tail tail_pop(): thr_pos().tail = tail_; __sync_synchronize(); thr_pos().tail = __sync_fetch_and_add(&tail_, 1);10/21/12
  • ...

Recommended

View more >