Scalable load-balancing for large-scale big data applications (+Brazil, So Paulo, USP, IME)

  • Published on

  • View

  • Download


A Ph.D. proposal and some information about Brazil, So Paulo, USP and IME. There are some images/info about So Paulo and Brazil at "random" locations to lower sleeping probability :)


Scalable load-balancing for large-scale big data applications + Brazil, So Paulo, USP, IME Carlos Eduardo Moreira dos Santos University of So Paulo University of Tokyo, 2014-05-29 Brazil Brazil 5th largest country (8,515,767 km) 27 states and over 5.5k cities Capital: Braslia Language: (Brazilian) Portuguese 6th most populous (202,656,788 in 2014) 8th largest economy (Gross Domestic Product) Currency: Brazilian Real Info relative to Japan Size: 22.5 * Japan's Population: 1.6 * Japan's "Distance": 27h-hour flight Time: Japan's minus 12h Brazil So Paulo So Paulo So Paulo So Paulo Largest Japanese community outside Japan (665k in 2010) 7th largest metropolitan area (7,943.818 km) 0.59 * Tokyo's 8th most populous (19,956,590 in 2012) 0.54 * Tokyo's "Financial capital of Brazil" 10th largest Gross Domestic Product in the world BOVESPA stock exchange Largest in Latin America Second in the world, in market value Largest number of helicopters in the world University of So Paulo Latin America's largest University >25% of Brazilian scientific production QS World University Rankings Improving rank position 2009: 207th 2013: 127th Global top 50 in 7 of the 30 disciplines University of So Paulo USP (2010) Tokyo Univ. (2013) Professors 5,732 2,604 Undergrads 56,998 14,120 Graduate students 25,591 13,878 Main campus size 7.4 km 1.6 km Institute of Mathematics and Statistics (IME) CS Department 42 full-time professors (+ 4 active retired) 250 undergrads 223 graduate students (124 masters + 99 PhD) Graduating per year 40-50 Bachelors 44 Masters 10 PhDs Institute of Mathematics and Statistics (IME) CS Department Research Areas Computer Theory Artificial Intelligence Software Engineering Parallel, Distributed, and Grid Computing Continuous Optimization Combinatorial Optimization Databases Software Systems Bioinformatics FLOSS Competence Center Founded in January/2009 at our department: USP Free and Open Source Competence Centre Funded by European Commission, Brazilian government, and USP Goal: promote the use of FLOSS and work towards improving its quality Teaching Research Consulting 2014 Brazilian Soccer Team Parallel/Distributed Systems Group Professors 1. Alfredo Goldman 2. Daniel Batista 3. Fabio Kon 4. Marco Aurlio Gerosa 5. Marcos Dimas Gubitoso Parallel/Distributed Systems Group Close Collaborators 1. Joo Eduardo Ferreira 2. Marcelo Finger 3. Siang W. Song 4. Flavio S. C. Silva 5. Kunio Okuda 6. Routo Terada 7. Kelly Braghetto 8. Renata Wassermann Parallel/Distributed Systems Group Students ~20 doctoral ~30 masters ~20 undergrads Parallel/Distributed Systems Group Research Areas Software Engineering Agile Software Development methodologies OOP and Patterns Parallel Computing / HPC Distributed Systems / Middleware Grid Computing / Cloud Computing Big Data Databases (distributed / mobile) Object-Orientation in Software Architectures Mobile Computing Energy Efficiency Parallel/Distributed Systems Group Education Undergraduate and graduate courses Parallel, Distributed, and Cloud Computing Advanced Object Oriented Software Development eXtreme Programming Laboratory Entrepreneurship in Software Startups Continuing Education and Community courses Grid/Cloud Computing Web development with advanced OO tools Design Patterns and Agile Software Development Parallel/Distributed Systems Group Education Consulting work - OO software development So Paulo Legislature (Assemblia Legislativa) Ministry of Health USP administration, CPqD, LARC, Scopus, ITM, etc. Entrepreneurship (for startups) Main Research Projects HP Baile (Scalable, cloud-based systems) CHOReOS (Web Service Choreographies) InteGrade (Opportunistic Grid Computing) Microsoft Borboleta (Telehealth with smartphones) Agile Methods for Software Development Qualipso (Quality in Open Source) IBM Eclipse Innovation CHOReOS Scalable Web Service Choreographies for the Future Internet 2010 - 2013 European Commision funding 16 partners (education/industry) from Europe (France, Greece, Italy, Lithuania, Latvia and UK) and Brazil (IME - USP) CHOReOS Enactment Engine Input Web Services (implementation and/or URL) Metadata (dependency info, etc) Provision cloud resources Deploy Web Services Configure dependencies (by roles) Technologies: Java, SOAP, REST, Chef Embraer 3rd biggest world's aircraft manufacturer 20k employees Clients in 55 countries Japan Airlines (E170) Armed forces in 48 countries 2013 net income: US$ 342 millions HP Baile Development and use of WS choreographies in large-scale environments 2010-2012 Funded by HP Brasil Collaboration with HP Labs Some outcomes Rehearsal: WS choreographies with TDD Scalability Explorer Tech transfer on change impact analysis for workflow repository management. InteGrade 2002 - 2011 Object-oriented grid middleware Opportunistic My final work for undergraduation Grid Computing Resource Management - Node Control Center, 2009 Limit CPU usage on the client-side Web interface (C++) InteGrade Node Control Center Multi-core CPU affinity Brazil Brazil Scalable load-balancing for large-scale big data applications Motivations Increasing supercomputer power 2008 Blue Gene/P Intrepid system 40,960 nodes 163,840 processor cores 2011/12 Blue Gene/Q Sequoia system 98,304 nodes 1.6 million processor cores "Unlimited" resources in cloud computing Big Data Scalable load-balancing for large-scale big data applications Questions Can centralized systems handle the load? What about decentralized systems? Are distributed systems required? Scalable load-balancing for large-scale big data applications Many applications and solutions are available. We will start with MapReduce. Apache Hadoop implementation (2005) Open Source (Apache top-level project) Useful in a wide range of applications Global community of users and contributors Commercial support for companies Sponsors: Yahoo!, Google, HP, IBM, Facebook, ... So Paulo - Liberdade MapReduce Paradigm 2004 by Google MapReduce Paradigm function map(String name, String document): // name: document name // document: document contents for each word w in document: emit (w, 1) function reduce(String word, Iterator partialCounts): // word: a word // partialCounts: a list of aggregated partial counts sum = 0 for each pc in partialCounts: sum += ParseInt(pc) emit (word, sum) Hadoop v1 Hadoop v2 (YARN) Hadoop v1 vs v2 Pros Job Tracker was split Resource Manager (RM) Application Master (AM) As many AMs as jobs 5k nodes in 2009 to 10k in 2012 Same price = 2x resources RM and AM are still centralized components Research Study scalability in Hadoop v1, v2 Experiments Understand scalability gains in YARN Scalability limits Model centralized components overhead Predict scalability limits by simulation Conceive and simulate an alternative solution Related Works - MTC Falkon (2007) Less features 487 vs 11 tasks/sec in Condor (2004) MATRIX (2013) Fully distributed Work-stealing leads to better efficiency (18-82% to vs 92-97%) CloudKon (CCGrid 2014) Based on cloud services (IaaS and SaaS) The only one to support 256 VMs (up to 1024) Blames too many open TCP connections Related Works - MTC Proposal Less functionalities Distributed selfish load balancing by Adolphs & Berenbrink No global information Less open connections -approximate NE convergence in O(ln (m/n)) Mathematically guaranteed to be fast Scalable Can deal with different speeds, weights Data-awareness Progress Measuring Hadoop v1 and v2 latency HiBench suite Nuvem USP Cloud up to 64 VMs Experiments automatization with Python VM Management Hadoop Management Monitoring Log parsing Graphs Deadline: qualifying exam in August The End Thank you! Questions? Links Contact: cadu at


View more >