BIGFOOT - BigData Analytics of Digital Footprints
BigFoot addresses some of the most vital issues related to the deployment of large-scale data analytic technologies and innovative analytics-as-a-service business models:
Data interaction is hard. Current approaches lack an integrated interface to inspect and query (processed) data. Moreover, not much work has been done on the literature to optimize the efficiency, and not only the performance, of interactive queries that operate on batch processed data.
- Parallel algorithm design is hard. While the design of parallel algorithms is already a difficult topic per se, current systems make the implementation of even simple jobs a tedious and exhausting experience.
- Deployment tools are poor. Management tools are still in their infancy and target solely \bare metal" clusters.
- Impact of virtualization. The effects of compute and network virtualization on the performance of data-intensive services has been largely overlooked in the literature and available solutions.
- Lack of optimizations. Current systems entrust users with the task of optimizing their queries and algorithms. Moreover, dataflow and storage mechanisms are data-processing oblivious", which leave room for several optimizations that have not been addressed by current solutions.
Watch the BIGFOOT's presentation on Big data for companies: Everybody wants to do big data analytics these days: storage is cheapand data is plentiful; best of all, software in the Hadoop ecosystem is free both as in speech and as in beer. If you are not Facebook or Amazon, however, you are not likely to put your precious data in the systems of cloud providers you may not trust; on the other hand, developing your own small or medium cluster can be prohibitive, since it requires a lot.
Open Source software available on
A platform-as-a-service solution for processing and interacting with large volumes of data. Bigfoot builds upon and contributes to the apache hadoop ecosystem and the apache openstack project.
Key differentiating benefits provided by BigFoot include :
- Self-tuned deployments in private (and public) clouds
- Hardware and data consolidation through virtualization
- Performance enhancements to mitigate bottlenecks
- Multi-site add-ons for geo-replication
Resource allocation mechanisms:
- New scheduling components to deal with heterogeneous work- loads
- New work-sharing optimizations for both batch and interactive engines
In-situ querying of RAW data:
- Distributed query mechanism to operate on heterogeneous RAW data
- On-the-fly indexing for modern storage devices
- Scalable Machine Learning library
- Time Series Library
Target groups benefiting most of BigFoot solutions (see our website for details):
- In a generic context - Academic Researchers, Engineers and Data Scientists, Big Data companies
- In the cyber-security context - Security software companies, CERT teams, Security researchers
- In the Smart Grid context - Residential electric consumers, Managers of utility companies, Energy data scientists, Smart city services operators