“ExploFun: Exploring and summarizing large-scale functional data with statistical learning tool for prediction purposes" project focuses on the development of methods for optimizing the management and storage of mass data using data provided by the French electricity company EDF from its Linky smart meters. The final goal of this project is that the methodologies developed become an operational tool that can be shared with the scientific community but also to bring the developed methods as close as possible to industrial end users.

Improving data mining techniques

The aim of this project is to develop co-clustering methods (data mining techniques) to optimize the management of mass data. In fact, the challenge is to handle the storage and management of data generated by Linky smart meters (data provided by EDF) which have been gradually installed throughout France and will reach 35 million by 2021. These new smart meters will allow electricity companies to record electricity consumption and have two, main applications:

  • analyzing and predicting the electric power consumption of French households;
  • finding cause and effect relationships between recorded health monitoring data, metadata such as environmental data (temperature, humidity, etc.), and other data like state changes (especially medical, environmental, and financial data).

Two kinds of data will be considered as part of this project. On the one hand, publicly available data will be used to test the methods developed. On the other hand, EDF will apply these methods on its own data to validate the use of co-clustering techniques (by testing the ability of the developed algorithms to handle large operational data).

Many possible applications

Until now, customer data were only recorded every six months, while with the smart meter, data can be taken every second. In practice, EDF plans to access the data every half hour, which means 17,472 measurements per year for each of the 27 million customers! Hence the importance of carefully managing and storing this flood of data. The recommended method will be to construct “summaries” of these data, with one way to achieve this being to group the data.

The spectrum of operational applications of co-clustering methods at EDF is very broad. Such applications can be found in many other areas at the firm, for example in: the design of new marketing services or offers, demand response programs, or new services such as outlier detection (notify the customer about unusual consumption increases) or comparison to a social norm (comparison with similar households).

A consortium of experts

This project is part of a collaboration with EDF and involves a consortium of three academic teams (Université Lyon 2 and Université Paris 1 in addition to Université Côte d'Azur). Each one brings specific expertise to the project, which will allow it to go beyond the boundaries of a classical academic work. The ultimate goal of this project is not only to achieve methodologies that could become operational tools to share with the entire scientific community but also to bring these methods closer to end users in the industry.

The Complex Systems Academy of Excellence is supporting this innovative project, which benefits from a public/private partnership, by funding project-related operating and travel expenses.

Image caption : Functional means of the estimated blocks obtained by co-clustering with FunLBM.

Partager cet article :