logo Insalogo Insa

Platforms for massive data processing

Presentation

Program (detailed contents):

 

Physical infrastructure:

  • Introduction to Cloud Computing;
  • Concepts of network protocols optimization, particularly at the Transport layer (TCP evolutions, DCCP, SCTP, …)
  • Network functions virtualization (NFV)
  • Advanced management of networks infrastructure (SDN - Software Defined Network)
  • Concepts of distributed storage medium (SAN networks) for the efficient storage of massive data
    • Concepts of recent virtualization processors (Intel VT- X, set of specific instructions, dedicated structures such as VMCS)

     

    Organization and Data Management:

    • Databases (SQL and NoSQL)
    • Concept of distributed and efficient files systems (focus on Hadoop Distributed File System)
    • Indexing techniques Data
    • File formats

Objectives

At the end of this module, the student will have understood and be able to explain the concepts and techniques related to the main pillars that have to be managed by a big data platform provider, in terms of:

  • physical infrastructure (network, storage , computing power ) ;
  • organizational and data management (allocation of storage , ...);
  • computation services of such data (based on calculation models like map reduce, etc.).

 

The student will be able to:
1) With regard to physical infrastructures

  • design and deploy a network architecture adapted to a big data oriented service, using advanced network technology (network virtualization, optimization protocols, etc.);
  • dimension and deploy a physical storage infrastructure aimed at receiving massive amounts of data;

assess and deploy the computing power required to process massive data, based on the latest

  • technologies for processors, such as virtualization.

2) With regard to the organization and data management

  • design and implement tools to organize data within the physical infrastructure;
  • provide appropriate interfaces for access to such data;
  • choose a data organization adapted to the constraints of treatment (eg real time);

3) With regard to the data processing services

provide facilities for processing data adapted to the constraints.

Form of assessment

The evaluation of outcome prior learning is made as a continuous training during the semester. According ot the teaching, the assessment will be different: as a written exam, an oral exam, a record, a written report, peers review...