Multi-Objective Constrained Reinforcement Learning for Joint Routing–MAC–Duty Cycling in Low-Power Wireless Sensor Networks

Authors

  • Ghaida Muttashar Abdulsahib College of Computer Engineering, University of Technology, IRAQ Author
  • Mohammed Awad Mohammed Ataelfadiel Applied College, King Faisal University, Saudi Arabia Corresponding Author

DOI:

https://doi.org/10.47654/v30y2026i2p197-229

Keywords:

WSNs, reinforcement learning, multi-objective optimization, constrained Markov decision processes, cross-layer optimization, energy efficiency, energy efficiency

Abstract

Introduction: Wireless Sensor Networks (WSNs) face significant challenges in balancing energy efficiency, latency, and reliability while operating under severe resource constraints. Current methods either optimize network layers separately or use static cross-layer coordination, which doesn't work well when conditions change.

Purpose: The aim of this study is to introduce a Constrained Multi-Objective Reinforcement Learning Model (CMORLM) for optimizing Joint Routing, Medium Access Control (MAC), and Duty Cycling Optimization (DCO) in low-power WSNs.

Methods: In this paper, we have suggested the CMORLM approach as a constrained Markov Decision Process (MDP) with three competing goals: lowering Energy Consumption (EC), lowering End-to-End-Latency (EEL), and raising Packet Delivery Ratio (PDR). There are strict limits on the amount of residual energy, the buffer size, and the Quality of Service (QoS) requirements. Lagrangian Constraint Handling (LCH) and multi-objective policy gradients are combined within the primal-dual optimization method. For routing, MAC, and DCO, the policy network uses a shared encoder with factorized heads. Federated Gradient Aggregation (FGA) is used for distributed learning across Sensor Nodes (SN).

Results: Testing in NS-3 shows that EC is 34.2% lower, EEL is 41.3% higher, and PDR is 16.5% higher than Traditional Layered Protocols (TLP). Network Lifetime (NL) goes up by 38.4%. The constraint violation rate (CVR) is still below 1%. This is 23 times less than the CMORLM that was suggested. Ablation studies show that joint optimization increases the EC by 44.7% over single-layer control. 

Conclusion: The suggested CMORLM works well on networks with 50 to 200 nodes and can handle changes in traffic, node failures, and mobile sinks. To enable operator control over performance trade-offs through weight configuration, Pareto frontier analysis is performed.

References

Aburukba, R., & El Fakih, K. (2025). Wireless sensor networks for urban development: A study of applications, challenges, and performance metrics. Smart Cities, 8(3), 89.

Behera, T. M., Samal, U. C., Mohapatra, S. K., Khan, M. S., Appasani, B., Bizon, N., & Thounthong, P. (2022). Energy-efficient routing protocols for wireless sensor networks: Architectures, strategies, and performance. Electronics, 11(15), 2282.

Ben Yaala, S., Ben Yaala, S., & Bouallegue, R. (2025). Optimizing TSCH scheduling for IIoT networks using reinforcement learning. Technologies, 13(9), 400. https://doi.org/10.3390/technologies13090400

Bhutani, M., Oruganti, S. K., Gupta, S. K., Alsekait, D. M., AbdElminaam, D. S., & Albeshri, M. Y. (2025). Redesigning MAC superframes for adaptive priority handling in IEEE 802.15.7-based optical wireless sensor networks. IEEE Access, 13, 189607–189628.

Dey, K., & Ghosh, S. (2024). iTRPL: Multi-agent reinforcement learning-based objective function for RPL in IoT. Ad Hoc Networks, 163, 103586. https://doi.org/10.1016/j.adhoc.2024.103586

Ekpenyong, M. E., Asuquo, D. E., Udo, I. J., Robinson, S. A., & Ijebu, F. F. (2022). IPv6 routing protocol enhancements over low-power and lossy networks for IoT applications: A systematic review. New Review of Information Networking, 27(1), 30–68.

El-Hajj, M. (2025). Enhancing communication networks in the new era with artificial intelligence: Techniques, applications, and future directions. Network, 5(1), 1. https://doi.org/10.3390/network5010001

Feng, C., Zhang, A., Min, G., Huang, Y., Quek, T. Q. S., & You, X. (2025). Towards 6G native-AI edge networks: A semantic-aware and agentic intelligence paradigm. arXiv Preprint. https://arxiv.org/abs/2512.04405

Halloum, N., Ahmadi, A., & Darmani, Y. (2026). Aris-RPL: A multi-objective reinforcement learning framework for adaptive and load-balanced routing in IoT networks. Future Internet, 18(2), 72. https://doi.org/10.3390/fi18020072

Islam, T., & Lee, Y. K. (2019). A comprehensive survey of recent routing protocols for underwater acoustic sensor networks. Sensors, 19(19), 4256. https://doi.org/10.3390/s19194256

Khan, O., Ullah, S., Khan, M., & Chao, H.-C. (2025). RL-BMAC: An RL-based MAC protocol for performance optimization in wireless sensor networks. Information, 16(5), 369. https://doi.org/10.3390/info16050369

Khan, S., Mazhar, T., Shahzad, T., Ghadi, Y. Y., & Hamam, H. (2025). Integrating IoT and WSN: Enhancing quality of service through energy efficiency, scalability, and secure communication in smart systems. Peer-to-Peer Networking and Applications, 18(5), 249.

Kumar, S., Chinthaginjala, R., Ahmad, S., & Kim, T. (2025). Energy-efficient unequal multi-level clustering for underwater wireless sensor networks. Alexandria Engineering Journal, 111, 33–46.

Latif, S. A., Drieberg, M., Sarang, S., Abd Aziz, A., Ahmad, R., & Stojanovic, G. M. (2025). A reinforcement learning-based intelligent duty cycle MAC protocol for Internet of Things. IEEE Access, 13, 156170–156187. https://doi.org/10.1109/ACCESS.2025.3606053

Lei, J., & Liu, D. (2024). RARL: Reinforcement learning aided routing for load balancing in WSN. Pervasive and Mobile Computing, 99, 101891. https://doi.org/10.1016/j.pmcj.2024.101891

Luong, N. C., Hoang, D. T., Gong, S., Niyato, D., Wang, P., Liang, Y.-C., & Kim, D. I. (2019). Applications of deep reinforcement learning in communications and networking: A survey. IEEE Communications Surveys & Tutorials, 21(4), 3133–3174.

Mustafa, R., Sarkar, N. I., Mohaghegh, M., Pervez, S., & Vohra, O. (2025). Cross-layer analysis of machine learning models for secure and energy-efficient IoT networks. Sensors, 25(12), 3720. https://doi.org/10.3390/s25123720

Panda, N., Supriya, M., & Elsts, A. (2025). Prioritized and multi-agent reinforcement learning-based TSCH schedulers. IEEE Open Journal of the Computer Society, 6, 1763–1774. https://doi.org/10.1109/OJCS.2025.3624137

Priyadarshi, R. (2024). Exploring machine learning solutions for overcoming challenges in IoT-based wireless sensor network routing: A comprehensive review. Wireless Networks, 30(4), 2647–2673.

Rottleuthner, M., Schmidt, T. C., & Wählisch, M. (2025). Duty-cycling is not enough in constrained IoT networking: Revealing the energy savings of dynamic clock scaling. arXiv Preprint. https://arxiv.org/abs/2508.09620

Salim, A. (2023). An approach for data routing in wireless body area network. Wireless Personal Communications, 130(1), 377–399.

Santos, C. L. D., Mezher, A. M., León, J. P. A., Cárdenas-Barrera, J. L., Guerra, E. C., & Meng, J. (2024). Q-RPL: Q-learning-based routing protocol for advanced metering infrastructure in smart grids. Sensors, 24(15), 4818. https://doi.org/10.3390/s24154818

Schlichter, J., Schwarz, M., & Wolf, L. (2025). Evaluating the effects of different layer multi-connectivity on reliable multi-hop industrial WSNs. IEEE Internet of Things Journal. https://doi.org/10.1109/JIOT.2025.3549254

Terven, J. (2025). Deep reinforcement learning: A chronological overview and methods. AI, 6(3), 46. https://doi.org/10.3390/ai6030046

Trigka, M., & Dritsas, E. (2025). Wireless sensor networks: From fundamentals and applications to innovations and future trends. IEEE Access, 13, 98504–98529. https://doi.org/10.1109/ACCESS.2025.3574660

Yuan, J., Peng, J., Yan, Q., He, G., Xiang, H., & Liu, Z. (2024). Deep reinforcement learning-based energy consumption optimization for peer-to-peer communication in wireless sensor networks. Sensors, 24(5), 1632. https://doi.org/10.3390/s24051632

Zerguine, N., Aliouat, Z., & Kharchi, S. (2025). A Reinforcement Learning-Based Scheduling Scheme for the IEEE 802.15.4e TSCH Network. Engineering, Technology & Applied Science Research, 15(5), 27060–27068. https://doi.org/10.48084/etasr.12033

Published

2026-04-05

How to Cite

Abdulsahib, G. M. A., & Awad Mohammed Ataelfadiel, M. (2026). Multi-Objective Constrained Reinforcement Learning for Joint Routing–MAC–Duty Cycling in Low-Power Wireless Sensor Networks. Advances in Decision Sciences, 30(2), 197-229. https://doi.org/10.47654/v30y2026i2p197-229