Skip to main content

Deep Q-network to produce polarization-independent perfect solar absorbers: a statistical report


Using reinforcement learning, a deep Q-network was used to design polarization-independent, perfect solar absorbers. The deep Q-network selected the geometrical properties and materials of a symmetric three-layer metamaterial made up of circular rods on top of two films. The combination of all the possible permutations gives around 500 billion possible designs. In around 30,000 steps, the deep Q-network was able to produce 1250 structures that have an integrated absorption of higher than 90% in the visible region, with a maximum of 97.6% and an integrated absorption of less than 10% in the 8–13 µm wavelength region, with a minimum of 1.37%. A statistical analysis of the distribution of materials and geometrical parameters that make up the solar absorbers is presented.

1 Introduction

In the pursuit of renewable and green energy sources, the sun provides an enormous amount of energy waiting to be harvested in a meaningful way. Perfect solar absorbers play an important role in solar energy harvesting by converting photons into thermal energy [1, 2]. Using perfect solar absorbers allows all of the absorbed energy to be used in the conversion process. An ideal solar energy absorber should have two main properties. First, it should absorb all wavelengths of electromagnetic radiation that reach the Earth, and second it should not radiate that absorbed energy away as heat. This process allows the absorbed solar energy to be completely converted to other forms of energy for practical everyday use. The proposed materials in this work can also be used as perfect absorbers in the visible region.

One way to produce such solar absorbers is through metamaterials. Since their introduction by Pendry et al. [3], metamaterials have been used for numerous applications, such as light absorbers [4,5,6,7,8,9,10,11,12,13,14,15], cloaking devices [16, 17], and nonlinear optics [18, 19]. By carefully designing the subwavelength geometrical properties of metamaterials, their optical properties can be manipulated for specific purposes. The design process is usually performed using knowledge from previous research and the intuition of the researcher. This can be an arduous process for more complex designs. Recently, artificial intelligence (AI) has been used to help find solutions to complex problems and uncover underlying relationships between the design parameter space and the optical properties in the field of nanophotonics [20]. Neural networks have been used for research in optics recently to design nanophotonic structures [21,22,23,24,25,26,27] and chiral metamaterials [28], predict the optical properties of structures [29], and for signal processing [30].

The deep Q-network (DQN), a reinforcement learning algorithm is a powerful tool that can be used to optimize solutions for a problem [29, 31,32,33,34,35] by acting as an intelligent search. Through exploration, the network takes actions and receives feedback, allowing it to learn about the parameter space and make intelligent choices. This method and its benchmarks have been explained in more detail in a number of articles [32, 33, 36, 37]. In contrast to other deep learning, reinforcement learning does not learn the hidden nonlinear relationships in a predetermined dataset but uses rewards and punishments in order to maximize a given reward. To start with there is no dataset, but through exploration and exploitation of the data space by an agent, it learns how to traverse the space and make good decisions to maximize its long-term reward.

2 Methods

2.1 Structure

A schematic of the initial structure is shown in Fig. 1a. It is composed of an array of nanocylinders on top of a silver back reflector and 2 film layers, all on a glass substrate. This structure was inspired by our previous experimental experience. The starting point can be chosen arbitrarily, but a well-educated guess can help to reach final results faster. The cylinders assure that the final structure will be polarization independent, while the bottom layer is a 200 nm silver back reflector, as is common in the design of many perfect absorbers. Lots of research has been published based on this type of structure with a variety of different number of layers and shapes [6, 38, 39]. Here, the geometrical parameters and materials of the nanocylinders and layers are chosen by the DQN.

Fig. 1
figure 1

a A schematic of the structure for the DDQN model to optimize and b the algorithm flowchart of the DDQN model

2.2 Deep Q-network (DQN)

The DQN was originally introduced as an AI agent that can play videogames at a level that can rival human players [33, 40]. The DQN has been able to complete different games with the same algorithm. In videogames, each new screen is a new state where the agent can take an action, since there are so many possible states and actions, it is impossible to explore them all, or to use conventional algorithms to solve the game. A DQN starts by exploring a game and gradually learning the mechanics of it, the more the agent plays this game, the more it learns and is able to achieve higher scores. In this work, the DQN will learn the connection between the change of geometrical properties and their effect on final results through full wave FDTD simulations, and then use that knowledge to design structures that produce the optical responses that we desire. First, the environment is set up, this includes the initial structure design and the simulation environment, second, the actions that the agent can take to change the structure are decided and finally, the reward system is defined. The DQN algorithm that connects all these parts together is shown in Fig. 1b.

The decision of which action to take in a given state is decided by a neural network that is updated based on what it has learned. To improve the performance of the DQN, an auxiliary model is used alongside. This network is used to select the action for the agent to take, while the main DQN network is used to predict the Q-value of the state-action pair. This prevents the overestimation that is a problem in general DQN. At each iteration, two models are trained, and the weights of the target model are gained from the combination of the main model weights and the target model weights. This method helped the overestimation caused by using just one model. The auxiliary network is updated periodically with the parameters of the DQN. Since there are two networks working together, this is known as a double deep Q-network (DDQN) [41]. The rule for how an action is chosen is called the policy and the set of action, state, and policy form a Markov decision process (MDP). An MDP means that in a given state, the policy that is used to decide which action to take is based on the previous rewards gained from previous states and actions. The full details of this model and a pictorial comparison of the two q-network models is given in our previous work [24, 42]. Each neural network has 3 hidden layers with 12 neurons.

2.3 State

The state, which is an array of the materials and geometrical properties of the structure, its variations, and limits are defined as follows:

  • Cylinder material: 1 of 13 materials (Table 1).

    Table 1 The materials available to be used for the films and nanocylinders
  • Film #1 material: 1 of 13 materials (Table 1).

  • Film #2 material: 1 of 13 materials (Table 1).

  • Cylinder diameter: 0–200 nm, step size: 10 nm.

  • Cylinder thickness: 0–200 nm, step size: 10 nm.

  • Film #1 thickness: 0–2000 nm, step size: 10 nm.

  • Film #2 thickness: 0–2000 nm, step size: 10 nm.

  • Gap between cylinders: 50–200 nm, step size: 10 nm.

The total number of possible states is therefore, 13 × 13 × 13 × 20 × 20 × 200 × 200 × 15 = 527,280,000,000. Manually searching all of these states is impossible, but the DDQN can produce desirable results in a reasonable time. This will be discussed in more detail in the results section. It should be noted that the number of materials and geometrical properties can be chosen arbitrarily. Choosing a larger range of values could lead to better results but would take longer to train and converge. This is limited only by the available resources.

The initial state is defined as the central value of each parameter, i.e. cylinder, film #1 and film #2 materials: material 7 (GaAs), cylinder diameter: 100 nm, cylinder thickness: 100 nm, film #1 and film #2 thicknesses: 1000 nm and the spacing between cylinders: 100 nm

2.4 Actions

The actions available to the agent to change the geometrical properties of the design at each step are shown in Table 2. As with all numerical methods, the parameter space is continuous, so it is discretized into smaller steps, as defined in “State” section. A step size of 10 nm for the cylinder diameter is chosen as it was deemed an appropriate accuracy through testing. At each update, the model learns from its previous states, actions and rewards and decides the best action to take next.

Table 2 Definitions of the actions available to the agent

2.5 Reward system

The reward system gives feedback to the agent by giving information about how well it is learning. This is where the problem is set up to find a perfect solar absorber. A perfect solar absorber should have perfect absorption in the visible regime (350 to 800 nm) to absorb all the solar energy, while having minimum absorption in the mid-IR range of 8 µm to 13 µm to not radiate it back out as heat. An area under the curve (AUC) value for each absorption spectrum was calculated in each region and the reward function was designed as follows:

$$reward = 200 + absorption AUC\left( {350 - 800 nm} \right)\% - absorption AUC\left( {8 - 13 \mu m} \right)\%$$

200 is added to the reward to make sure that it remains positive. An ideal structure will gain a reward of 300, while the worst structure gets a reward of 100 (since the AUC ranges from 0 to 100%). The absorption over each range of wavelengths was calculated with power monitors for reflection (R) and transmission (T), taking the absorption (A) to be A = 1−R−T.

3 Results and discussion

The simulations were performed using a computer with a 3.40 GHz 16-core processor, 64 GB of RAM, and an NVIDIA RTX 2070 GPU with 8 GB DDR6 RAM. At each step, the DDQN code was run in Python for the AI calculations and connected to Lumerical, a commercial FDTD solver, to evaluate its predictions. All material data was taken from the inbuild database. This method requires a PC with both a strong CPU for the FDTD simulations and GPU for the neural network calculations. With this setup it took around 1 month, where around 35,000 steps were taken. In the process of uncovering the best structures, the DDQN finds many similar solutions with small differences in their rewards. This means that many structures with acceptable results and different geometrical properties are discovered, allowing for a statistical analysis. A situation called backflipping occurred, which means that the model is stuck in a single configuration. This issue was fixed by tuning the hyperparameters. Figure 2 shows histograms of the distributions of different geometrical properties for film #1, film #2, the cylinder and the lattice constant. These graphs are prepared for two different categories. The first shows the top 10% of structures, which have an AUC of higher than 90% in the 350 to 800 nm wavelength region and lower than 10% in the 8 µm to 13 µm region. The second displays the top 5% of structures, which have an AUC higher than 95% in the 350 to 800 nm wavelength region and lower than 5% in the 8 µm to 13 µm region. The DDQN produced 119 structures with these properties. The DDQN uncovered 1250 structures with these properties. shows histograms of the distributions of different geometrical properties for film #1, film #2, the cylinder and the lattice constant. These graphs are prepared for two different categories. The first shows the top 10% of structures, which have an AUC of higher than 90% in the 350 to 800 nm wavelength region and lower than 10% in the 8 µm to 13 µm region. The second displays the top 5% of structures, which have an AUC higher than 95% in the 350 to 800 nm wavelength region and lower than 5% in the 8 µm to 13 µm region. The DDQN produced 119 structures with these properties. The DDQN uncovered 1250 structures with these properties.

Fig. 2
figure 2

a, c Histograms of film #1 and film #2 thickness distributions, and b, d cylinder height, diameter, and lattice constant distributions for two criteria. a, c show the distributions for the structures with an AUC higher than 90% from 350–800 nm and lower than 10% from 8–13 μm (1,250 structures). b, d show distributions structures with an AUC higher than 95% from 350–800 nm and an AUC lower than 5% from 8–13 μm (119 structures)

The distributions of the material choices for film #1, film #2 and the cylinder are displayed in the pie charts in Fig. 3 for the same categories as previously described. These plots reflect which materials should be chosen to obtain perfect solar absorbers. Table 3 shows the materials and geometric parameters of the top 8 performing structures discovered by the DDQN, with the absorption curves of the top 2 shown in Fig. 4. In comparison with human findings, A. Al-Rjoub et al. [43] reported almost identical theoretical and experimental results of a 95.2% and 9.8% absorption in the first and second wavelength regions.

Fig. 3
figure 3

Pie charts of (a) the material distributions of film #1, film# 2 and the cylinder for structures with an AUC higher than 90% from 350 to 800 nm and an AUC lower than 10% from 8 to 13 μm (1250 structures) and (b) for the structures with an AUC higher than 95% from 350 to 800 nm and an AUC lower than 5% from 8 to 13 μm (119 structures)

Table 3 Some of highest efficiency structures by the DDQN
Fig. 4
figure 4

Absorption curves of the top two performing structures from Table 3 for (a) 350–800 nm and (b) 8–13 µm

4 Conclusions

A DDQN was used to design structures to be used as solar perfect absorbers in a parameter space that allows for 527 billion possible designs. Using a variety of materials, it was able to produce around 1250 perfect solar absorbers in around 35,000 steps. Each structure has an AUC higher than 90% in the visible region with low absorption in the 8 µm to 13 µm region. A statistical analysis was produced to help readers choose suitable geometrical properties and materials based on their fabrication limitations to design a perfect solar absorber.

Availability of data and materials

The datasets generated and/or analyzed during the current study are not publicly available due to the funding agency’s regulations but are available from the corresponding author on reasonable request.


  1. Y. Li, D. Li, D. Zhou, C. Chi, S. Yang, B. Huang, Sol. RRL 2, 1800057 (2018)

    Article  Google Scholar 

  2. Z. Zhou, E. Sakr, Y. Sun, P. Bermel, Nanophotonics 5, 1 (2016)

    Article  Google Scholar 

  3. D.R. Smith, Science 305, 788 (2004)

    Article  CAS  Google Scholar 

  4. J. Cong, Z. Zhou, B. Yun, L. Lv, H. Yao, Y. Fu, N. Ren, Opt. Lett. 41, 1965 (2016)

    Article  CAS  Google Scholar 

  5. X. Tian, Z.-Y. Li, Photonics Res. 4, 146 (2016)

    Article  CAS  Google Scholar 

  6. P. Yu, L.V. Besteiro, Y. Huang, J. Wu, L. Fu, H.H. Tan, C. Jagadish, G.P. Wiederrecht, A.O. Govorov, Z. Wang, Adv. Opt. Mater. 7, 1800995 (2019)

    Article  Google Scholar 

  7. T. Badloe, I. Kim, J. Rho, Sci. Rep. 10, 4522 (2020)

    Article  CAS  Google Scholar 

  8. T. Badloe, J. Mun, J. Rho, J. Nanomater. 2017, 1 (2017)

    Article  Google Scholar 

  9. I. Kim, S. So, A.S. Rana, M.Q. Mehmood, J. Rho, Nanophotonics 7, 1827 (2018)

    Article  CAS  Google Scholar 

  10. D. Lee, S.Y. Han, Y. Jeong, D.M. Nguyen, G. Yoon, J. Mun, J. Chae, J.H. Lee, J.G. Ok, G.Y. Jung, H.J. Park, K. Kim, J. Rho, Sci. Rep. 8, 12393 (2018)

    Article  Google Scholar 

  11. D.M. Nguyen, D. Lee, J. Rho, Sci. Rep. 7, 2611 (2017)

    Article  Google Scholar 

  12. N. Raeis-Hosseini, J. Rho, Appl. Sci. 9, 564 (2019)

    Article  CAS  Google Scholar 

  13. A.S. Rana, M.Q. Mehmood, H. Jeong, I. Kim, J. Rho, Sci. Rep. 8, 2443 (2018)

    Article  Google Scholar 

  14. G. Yoon, S. So, M. Kim, J. Mun, R. Ma, J. Rho, Nano. Converg. 4, 36 (2017)

    Article  Google Scholar 

  15. N. Mahmood, I. Kim, M.Q. Mehmood, H. Jeong, A. Akbar, D. Lee, M. Saleem, M. Zubair, M.S. Anwar, F.A. Tahir, J. Rho, Nanoscale 10, 18323 (2018)

    Article  CAS  Google Scholar 

  16. W. Cai, U.K. Chettiar, A.V. Kildishev, V.M. Shalaev, Nat. Photonics 1, 224 (2007)

    Article  CAS  Google Scholar 

  17. M. Manjappa, P. Pitchappa, N. Wang, C. Lee, R. Singh, Adv. Opt. Mater. 6, 1800141 (2018)

    Article  Google Scholar 

  18. I. Sajedian, A. Zakery, J. Rho, Opt. Commun. 397, 17 (2017)

    Article  CAS  Google Scholar 

  19. I. Sajedian, I. Kim, A. Zakery, J. Rho, Opt. Commun. 401, 66 (2017)

    Article  CAS  Google Scholar 

  20. S. So, T. Badloe, J. Noh, J. Bravo-Abad, J. Rho, Nanophotonics 9, 1041 (2020)

    Article  Google Scholar 

  21. D. Liu, Y. Tan, E. Khoram, Z. Yu, ACS Photonics 5, 1365 (2018)

    Article  CAS  Google Scholar 

  22. J. Peurifoy, Y. Shen, L. Jing, Y. Yang, F. Cano-Renteria, B.G. DeLacy, J.D. Joannopoulos, M. Tegmark, M. Solja, Sci. Adv. 4, EAAR4206 (2018)

    Article  Google Scholar 

  23. J. Peurifoy, Y. Shen, Y. Yang, L. Jing, F. Cano-Renteria, J. Joannopoulos, M. Tegmark, M. Soljačić, in Frontiers in Optics 2017 (OSA, Washington, D.C., 2017), p. FTh4A.4

    Book  Google Scholar 

  24. I. Sajedian, T. Badloe, J. Rho, Opt. Express 27, 5874 (2019)

    Article  CAS  Google Scholar 

  25. I. Malkiel, M. Mrejen, A. Nagler, U. Arieli, L. Wolf, H. Suchowski, Light. Sci. Appl. 7, 60 (2018)

    Google Scholar 

  26. S. So, J. Mun, J. Rho, A.C.S. Appl, Mater. Interfaces 11, 24264 (2019)

    Article  CAS  Google Scholar 

  27. S. So, J. Rho, Nanophotonics 8, 1255 (2019)

    Article  Google Scholar 

  28. W. Ma, F. Cheng, Y. Liu, ACS. Nano. 12, 6326 (2018)

    Article  CAS  Google Scholar 

  29. I. Sajedian, J. Kim, J. Rho, Microsyst. Nanoeng. 5, 27 (2019)

    Article  Google Scholar 

  30. I. Sajedian, J. Rho, Nano. Converg. 6, 27 (2019)

    Article  Google Scholar 

  31. T. Badloe, I. Kim, J. Rho, Phys. Chem. Chem. Phys. 22, 2337 (2020)

    Article  CAS  Google Scholar 

  32. M. Bukov, A.G.R. Day, D. Sels, P. Weinberg, A. Polkovnikov, P. Mehta, Phys. Rev. X 8, 031086 (2018)

    CAS  Google Scholar 

  33. V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis, Nature 518, 529 (2015)

    Article  CAS  Google Scholar 

  34. I. Sajedian, H. Lee, J. Rho, Sci. Rep. 9, 10899 (2019)

    Article  Google Scholar 

  35. I. Sajedian, H. Lee, J. Rho, Sol. Energy 195, 670 (2020)

    Article  Google Scholar 

  36. T. Hester, M. Vecerik, O. Pietquin, M. Lanctot, T. Schaul, B. Piot, D. Horgan, J. Quan, A. Sendonaris, G. Dulac-Arnold, I. Osband, J. Agapiou, J. Z. Leibo, and A. Gruslys, ArXiv:1704.03732 (2017)

  37. Q. Zhang, M. Lin, L.T. Yang, Z. Chen, S.U. Khan, P. Li, I.E.E.E. Trans, Serv. Comput. 12, 739 (2019)

    Google Scholar 

  38. A.K. Azad, W.J.M. Kort-Kamp, M. Sykora, N.R. Weisse-Bernstein, T.S. Luk, A.J. Taylor, D.A.R. Dalvit, H.-T. Chen, Sci. Rep. 6, 20347 (2016)

    Article  CAS  Google Scholar 

  39. H. Wang, L. Wang, Opt. Express 21, A1078 (2013)

    Article  Google Scholar 

  40. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, ArXiv:1312.5602 (2013)

  41. H. van Hasselt, A. Guez, and D. Silver, ArXiv:1509.06461 (2015)

  42. R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (The MIT Press, 1992)

  43. A. AL-Rjoub, L. Rebouta, P. Costa, N.P. Barradas, E. Alves, P.J. Ferreira, K. Abderrafi, A. Matilainen, K. Pischow, Sol. Energy 172, 177 (2018)

    Article  CAS  Google Scholar 

Download references


Not applicable.


This work was financially supported by the National Research Foundation (NRF) grants (NRF-2019R1A2C3003129, CAMM-2019M3A6B3030637, NRF-2019R1A5A8080290, NRF-2018M3D1A1058997) funded by the Ministry of Science and ICT (MSIT), Republic of Korea.

Author information

Authors and Affiliations



JR and IS conceived the idea and initiated the project. IS performed the simulations and analyzed the data. IS and TB wrote the manuscript and created the figures. All authors participated in the discussion and approved the final manuscript. JR and HL guided the entire work. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Heon Lee or Junsuk Rho.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sajedian, I., Badloe, T., Lee, H. et al. Deep Q-network to produce polarization-independent perfect solar absorbers: a statistical report. Nano Convergence 7, 26 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: