A secure Big Data processing model and framework across federated data centre clouds
Primary supervisor: Dr Wei Jie
Start dates: January, May and September of each academic year
Duration: This is a three-year position.
Research at the University of West London lives in an ecosystem of interdisciplinary research clusters. This PhD position is based in the School of Computing and Engineering.
Big Data is the collection of data sets so large and complex that it is difficult to store and process using traditional data storage and processing applications. In many science, engineering and commercial domains, Big Data has become a common phenomenon. Nowadays Big Data domain applications often need to process massive data sets naturally distributed across multiple data centre Clouds. This requirement promotes the idea of “federated data centre Clouds”. A number of issues need to be addressed in the implementation of federated data centre Clouds, including the following two key challenges:
- Programming models and software frameworks for the development, deployment and execution of large-scale distributed Big Data applications over federated data centre Clouds
In recent years, new Big Data processing paradigms and techniques have emerged, in particular, the widely used MapReduce programming model and its open source Hadoop implementation for large-scale data sets processing with large clusters, However, MapReduce and Hadoop are generally limited to data and computing resources within a single cluster within one data centre Cloud. The implementation of federated data centre Clouds demands the development of a more advanced Big Data processing model and software framework which can work across multiple data centre Clouds.
- Security mechanism and framework for federated authentication and access control across multiple data centre Clouds
Each Cloud data centre has its own well-established security mechanisms in place which have to be preserved. In federated data centre Clouds, it is essential that a user can gain authentication with all Cloud data centres without additional self-participating processes. To achieve this, security mechanisms and frameworks must be in place to perform automatic user authentication across federated data centre Clouds and enable the user to access distributed resources in a simple, easy and transparent manner.
We aim to develop a secure Big Data processing model and framework based on the MapReduce paradigm to support the execution of data-intensive applications across federated data centre Clouds. More specifically, the project aims:
- To integrate data centers at participating institutions at infrastructure level and establish federated data centre Clouds for Big Data application development, deployment and execution.
- To transform the MapReduce paradigm from single data centre mode to multiple data centres mode, and extend/modify the Hadoop implementation to enable inter-centre Big Data processing.
- To incorporate advanced security mechanisms to authenticate users (and administrators) with automatic single sign-on capability and to provide them with access only to authorized resources across federated data centre Clouds.
- To implement the proposed Big Data processing framework and demonstrate it on real Big Data problems/applications. We will develop various real Big Data applications based on the proposed model and run them over federated data centre Clouds. The model will be validated and its performance will be evaluated.
The ideal candidate should have an MSc or equivalent degree in computer science and combine solid theoretical background and excellent software development skills. Strong commitment to reaching research excellence and achieving assigned objectives is required, as well as an ability to work in a collaborative and interdisciplinary environment. It is expected that the PhD candidate will carry out applied research work that will start from the establishment of a theoretical framework, continue with the implementation of a software prototype and the experimentation with real data, and conclude with the validation of a proposed solution through real applications/case studies.
Background knowledge and/or previous experience in the following areas/technologies, will be considered very favourably:
- Big data storage and processing including MapReduce paradigm and Hadoop framework
- Cloud computing architecture, infrastructure, and solution design
- Security management for Clouds
- Storage, Data, and Analytics Clouds
- High Performance Cloud Computing
- Cloud applications and experiences
All applicants for whom English is not their first language must also demonstrate their English language proficiency through evidence of IELTS at overall 7 (with 6.5 in all four skills) or by providing access to MA/MSc chapters or published work.
For general enquiries about the application process visit the Graduate School pages.
Questions regarding academic aspects of the project should be directed to Wei.Jie@uwl.ac.uk.