BIGFOOT - BigData Analytics of Digital Footprints

Project start: 
Sunday, 30 September, 2012
Project end: 
Friday, 30 September, 2016
Project Website:

BigFoot addresses some of the most vital issues related to the deployment of large-scale data analytic technologies and innovative analytics-as-a-service business models.

Data interaction is hard. Current approaches lack an integrated interface to inspect and query (processed) data. Moreover, not much work has been done on the literature to optimize the efficiency, and not only the performance, of interactive queries that operate on batch processed data.

  • Parallel algorithm design is hard. While the design of parallel algorithms is already a difficult topic per se, current systems make the implementation of even simple jobs a tedious and exhausting experience.
  • Deployment tools are poor. Management tools are still in their infancy and target solely \bare metal" clusters.
  • Impact of virtualization. The effects of compute and network virtualization on the performance of data-intensive services has been largely overlooked in the literature and available solutions.
  • Lack of optimizations. Current systems entrust users with the task of optimizing their queries and algorithms. Moreover, dataflow and storage mechanisms are data-processing oblivious", which leave room for several optimizations that have not been addressed by current solutions.
How will your solution/service benefit the end-user? 

A platform-as-a-service solution for processing and interacting with large volumes of data. Bigfoot builds upon and contributes to the apache hadoop ecosystem and the apache openstack project.

Key differentiating benefits provided by BigFoot include :


  • Self-tuned deployments in private (and public) clouds
  • Hardware and data consolidation through virtualization
  • Performance enhancements to mitigate bottlenecks
  • Multi-site add-ons for geo-replication

Resource allocation mechanisms:

  • New scheduling components to deal with heterogeneous work- loads
  • New work-sharing optimizations for both batch and interactive engines

In-situ querying of RAW data:

  • Distributed query mechanism to operate on heterogeneous RAW data
  • On-the-fly indexing for modern storage devices

High-level languages

  • Scalable Machine Learning library
  • Time Series Library

Target groups benefiting most of BigFoot solutions (see our website for details):

  • In a generic context - Academic Researchers, Engineers and Data Scientists, Big Data companies
  • In the cyber-security context - Security software companies, CERT teams, Security researchers
  • In the Smart Grid context - Residential electric consumers, Managers of utility companies, Energy data scientists, Smart city services operators

Open Source software available on

AppHub Directory