Air Handling Unit Control

Application of relearning framework to HVAC control

We demonstrate the relearning for continuous adaptation approach on a data-driven simulated model of a real HVAC system on our university campus. Since the data-driven simulated system needs to be updated with time, we also implement a relearning schedule where the simulated system is updated based on either of the periodic or trigger-based updates. We employ Deep LSTMs to model the heating energy, cooling energy, valve behavior, etc for the simulated setup. These models are trained on historical data from the real HVAC system. To develop the controllers, we are currently using the PPO algorithm which appears to be the State of the Art in solving benchmark RL tasks. PPO is a form of Policy gradient algorithm where we create a deep Policy Network which is the controller and a deep Value Network which evaluates the controller’s actions. These are trained using batches of experiences generated by the Policy Network and the simulated Non-Stationary System. The policy gradient algorithm is used to update the policy network at every training/update step. The reward signal is used to decide the quality of actions taken by the controller in the batches of experiences. For the HVAC, the reward signal incentivizes energy savings and comfort when it is controlled by supervisory actions from the RL controller.

Thus our approach can be termed as a “Data Drive Deep Reinforcement Learning Approach” where we use deep neural networks to learn features and subsequently patterns in the HVAC behavior as well as create controller networks for optimizing energy and comfort.