Using Deep Machine learning techniques, this project is exploring whether if we had more accurate predictions for solar electricity generation then we could reduce the amount of "spinning reserve" required. This would reduce carbon emissions and reduce costs to end-users, as well as increase the amount of solar generation the grid can handle
Benefits
Not required.
Learnings
Outcomes
Fully operational PV Nowcasting service running on two ML models:
PVNet for 0-6 hours
Blend of PVNet and National_xg for 6-8 hours, and National_xq from 8 hours and beyond.
Accuracy improvement over the previous OCF model by approximately 30% for the GSP and National forecasts (4-8 hours), resulting in forecasts approximately 40% more accurate than the BMRS model and over 40% against the PEF forecast (for 0-8 hours).
Probabilistic forecasts for all horizons:
Backtest runs for DRS project
UI including a new Delta view, dashboard view and probabilistic display
UI speedup, with query times from 20s down to <1 second
The most significant of these is the achievement of the target set by NESO of a 20% reduction in MAE error. This is extremely large in renewable forecasting and is the result of numerous machine learning improvements.
NWP ensembles:
Literature Review of NWP ensembles to identify common trends and finalise the methodologies for Solar PV forecasts creation using ensemble weather data.
Visualisations to highlight the general trend of the overall ensemble prediction and a probabilistic representation of the ensemble spread.
The most important outcome of the latest extension is in development now, which is to forecast Solar PV using ensemble weather data and evaluate the accuracy against existing solar PV forecasts.
Lastly, the forecast from Open Climate Fix is delivered completely open and documented. The resilience was significantly increased over the project duration, resulting in over 99.5% availability. This resilience is implementable by NESO, with all the infrastructure constructed in code to allow replicability
Lessons Learnt
In the course of W1 and WP2, the project identified the following lessons:
Never underestimate the importance of cleaning up and checking data in advance:
Several approaches to loading data were tried, from on-the-fly to pre-preparing, and instituted automatic and visual tests of the data to ensure the project was always lining up the various data sources correctly.
Having infrastructure as code allows the main production service to run uninterrupted:
Having code to easily instantiate infrastructure is very useful to the efficient management of environments to ensure the project could bring the algorithm into productive use. The Terraform software tool was used which makes spinning up (and down) environments very easy and repeatable. Being able to spin up new environments allowed the project to test new features in development environments while allowing the main production to keep on running uninterrupted.
Using Microservices to “start simple and iterate” accelerates development:
Using a microservice architecture allowed the project to upgrade individual components as we see benefit in improving them, independently of changing other components’ behaviour. This has been very useful when building out the prototype service, as it has allowed the project team to start with a simple architecture - even a trivial forecast model - and iteratively improve the functionality in the components. For example, first starting out with one PV provider of data has allowed the project to get a prototype working, and in WP3 we will expand onboard an additional PV provider.
Data processing may take longer than expected:
While it was initially planned to extend our dataset back to 2016 for all data sources during WP2, it turned out that data processing takes much longer than expected. This does not have a direct impact on project deliverables but is something to consider in further ML research.
Data validation is important:
For both ML training and inference, using clear and simple data validation builds trust in the data. This helps build a reliable production system and keeps software bugs at a minimum.
Engaging specialist UX/UI skills is important:
By acknowledging that UX and UI design is a specialised area and incorporating those skills, a UI has been developed which will be easier to use and convey information effectively. This will be validated over WP3 through working with the end users.
Building our own hardware demonstrates value for money but may pose other challenges for a small team:
Two computers have been built during the project with a total of sixGPUs and it is estimated that using on-premises hardware instead of the cloud for data-heavy & GPU heavy machine learning R&D can significantly reduce the direct costs. However, the time it would require for a small team to put together all the components is significant (approx. 25 days for one person in total). While the total costs would still be lower, appropriate resource planning should be considered if planning hardware upgrades in the future.
In the course of WP3 and WP1 (extension), the project identified the following lessons:
Merging the code right away when performing frontend testing is of upmost importance:
Merging the code after frontend testing proved to be time-consuming, and it is important to consider when performing tests.
Large Machine Learning models are harder to productionise:
Large Machine Learning models proved to be difficult to productionise and the size of the model makes it difficult to use. Going forward, we need to investigate further how to deploy large models.
Machine Learning training always takes longer than expected:
Even with an already made model, making datapipes to work correctly takes time. It is important to always have enough time allocated when planning ML training activities.
Security and authentication is hard:
Ensuring robust authentication/security measures are in place is harder than we envisaged. It may be easier to implement packages already built or contract third party providers to support the process.
Separate National Forecast Model: PVLive estimate of National Solar generation does not equal the sum of PV Live’s GSP generation estimate. This motivates us to build a separate National forecast, compared to adding up our GSP forecast.
Investment is needed to take open-source contributions to the next level:
Time and resources are needed to engage with open-source contributors and develop an active community. We may want to consider hiring an additional resource to support this activity.
Lessons from WPS (extension) were as follows:
Expensive cloud machines storage disk left idle:
We use some GCP/AWS machines for R&D, and we often pause the machine when they are not in immediate use. This is because some GPU machines can cost significant amounts per hour. It was discovered that costs still accrue for the disk (storage) of the paused machines. Balancing the pausing of the machines with the ability to start them up quickly versus starting a cloud machine from scratch each time has no golden rule, but it is useful to be aware of.
Challenging to maintain active communication with NESO:
Particularly high turnover at NESO forecasting team has affected communication on the project. This has evolved over the duration of the project, and more active and easier communication has been observed in the latter phase.
Reproducibility on cloud vs local servers:
When results differ between the cloud and local services, it can be tricky to determine the cause. Verbose logging, saving intermediate data, and maintaining consistent package versions and setup on both machines helped. One particular bug involved results differing when multiple CPUs were used locally, but only one CPU was used in the cloud.
Protection of production data:
Two environments in the cloud, “development” and “production,” are maintained to protect the “production” data. This setup allows developers to access the “development” environment, where changes do not affect the live service. Although maintaining two environments increases costs, it is considered worthwhile.
Probabilistic forecasts:
Some unreleased open-source packages were used to implement one of the probabilistic forecasts. The advantage of using this code before its release is noted, but it also means more thorough checks for bugs are required, which can take more time.
Further lessons from WPS (extension) of NWP ensemble model development were as follows:
Clear project goals are important from the outset:
Clarifying project goals early, even in research projects with high uncertainty, prevents scope creep and wasted effort. This ensures everyone is aligned with what success looks like from the outset.
Schedule enough time for obtaining data:
Obtaining ECMWF EPS data proved challenging due to its substantial size (over 100GB per day) and the considerable time it takes to download, which can range from over 60 hours for a month's archive to as much as 120 hours depending on the archival system's workload. It is therefore important to schedule enough time to obtain data in the early stages of project planning.
Optimizing Data for Efficiency:
Streamlining the number of variables used in the model can significantly cut down on data download times and storage requirements. A study on feature importance pinpointed 6 out of 12 variables that collectively hold most of the information PVNet needs for forecasting, enabling a halving of both download time and storage by providing regular deterministic data for the remaining variables during inference.
The Importance of Visualisation for Ensemble Data:
The multifaceted nature of ensemble data makes it difficult to quickly grasp. Various visualisation techniques are essential for comprehending general trends, spread, and areas of uncertainty.
National ensemble-based forecasts are achievable within reasonable timeframes:
Running inference for 50 forecast versions for a single Grid Supply Point (GSP) takes approximately 5 seconds, following an initial setup period of under 6 minutes. This suggests that a national ensemble-based forecast could be produced in around 25 minutes.
Investing in foundational tools streamlines the entire research and development process:
Making sure enough time is allocated to develop the right tools, such as ocf-data-sampler or new training models, accelerates future development and improves overall efficiency.