We start by defining the problem. We want to find *f* of a noisy sinusoidal wave

$$S(t) = A\sin (2\pi ft + \varphi ) + \varOmega (t),$$

(1)

where *A* is amplitude, *t* is time, \(\varphi\) is phase, and *Ω* is zero-mean Gaussian noise with a variance of σ^{2}. So the *S(t)* which is the noisy wave will be our input and the *f* which is the frequency will be our output. The signal-to-noise ratio (SNR), which shows the quality of the signal, is the ratio of signal power *P*_{S} to noise power *P*_{N} [9]:

$$SNR = \frac{{P_{S} }}{{P_{N} }} = \left( {\frac{A}{\sigma }} \right)^{2} ,$$

(2)

and can be expressed in decibels as

$$SNR_{dB} = 10\log_{10} \left( {SNR} \right) = 10\log_{10} \left[ {\left( {\frac{A}{\sigma }} \right)^{2} } \right],$$

(3)

which gives us the variance as

$$\sigma^{2} = \frac{{A^{2} }}{{10^{{\frac{{SNR_{dB} }}{10}}} }}.$$

(4)

So given *SNR*_{dB} and *A* we can obtain the variance that is needed for calculating the noise function.

### 2.1 Neural network

Now we discuss the neural network architecture that we used to solve this problem. We explain the process in two parts. First, we discuss the details of data preparation and data preprocessing needed for the model to work more efficiently and also the validation process that assures that the model works for the unseen data. Then we discuss the model design that we used.

#### 2.1.1 Data preparation

In NNs, we need three datasets to assure that the model works for any new unseen data. These datasets are named training, validation, and testing dataset. The training dataset is used to train the model at each step. The validation dataset is the first unseen data; this set is used to check the model at each step, specifically to tune the hyperparameters of the model to get the lowest possible loss in predicted results. Once the best model is found (the one that has the lowest loss on the validation dataset), it is checked one more time on the test dataset to assure that the model works on any unseen data. This step assures that the model was not biased to work for the validation dataset, and so works for any new unseen data [31]. We prepared 100,000 waves for the whole dataset; we used 72% of the waves as the training dataset, 18% as the validation dataset, and 10% as the test dataset.

We considered the range 1 kHz ≤ *f* ≤ 10 kHz. For each wave, we took 2000 samples from each generated wave in each *f* in 1-μs time steps from 0 to 2000 μs; i.e., our whole dataset was a 100,000 × 2000 array. This means that the input layer of our neural network should have 2000 nodes. Since we want to find the frequency of each wave, the output layer of our neural network should only have 1 node, which corresponds to the frequency sought.

Neural networks work better if their output is between 0 and 1 or in other words if their output is normalized, so we divided the output layer by the maximum *f* = 10 kHz before the training starts. This made our new output range from \(0. 1 { }\left( { = \frac{{1 {\text{KHz}}}}{{10 {\text{KHz}}}}} \right)\) to \(1\left( { = \frac{{10 {\text{KHz}}}}{{10 {\text{KHz}}}}} \right)\). We multiplied all results by 10,000 after the training is finished to recover the correct values.

#### 2.1.2 Network design

We used a three-layer network with 2, 2, 3 neurons in the first, second, and third hidden layer respectively (Fig. 1 right). This architecture was found after trying many designs; this one had the lowest loss on the validation dataset. We had to use a very small number of neurons to prevent the model from overfitting. Many methods can be used to prevent overfitting; examples include using dropout (or other types of regularization), or reducing the complexity of the model, or increasing the amount of data [32]. We found that reducing the complexity of the model had the best effect and led to very good results. For other hyperparameters of the network, we used the Nesterov–Adam optimizer with a learning rate of 0.001; the metric to measure the loss of the model was mean squared error

$$MSE = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {Y_{i} - P_{i} } \right)}^{2} ,$$

(5)

where *n* is the number of measurements, *Y*_{i} are the real values and *P*_{i} are the predicted values. All codes were written in Python with the help of TensorFlow and Keras packages. Calculations were performed on a computer with a 4-core 3.50-GHz processor, 32 GB of RAM, and an NVIDIA GTX 750Ti GPU with 2 GB GDDR5 RAM. The procedure of preparing the data and training the final model took less than 2 h on this computer. The trained model can predict new results in less than a second.