Biostatistical and mathematical analysis on Covid-19

Background: An infectious disease caused by a novel coronavirus called COVID-19 has raged across the world since December 2019. The novel coronavirus fi rst appeared in Wuhan, China, and quickly spread to Asia and now many countries around the world are aff ected by the epidemic. The deaths of many patients, including medical staff , caused social panic, media attention, and high attention from governments and world organizations. Today, with the joint eff orts of the government, the doctors and all walks of life, the epidemic in Hubei Province has been brought under control, preventing its spread from aff ecting the lives of the people. Because of its rapid spread and serious consequences, this sudden novel coronary pneumonia epidemic has become an important social hot spot event. Through the analysis of the novel coronary pneumonia epidemic situation, we can also have a better understanding of sudden infectious diseases in the future, so that we can take more eff ective response measures, establish a truly predictable and provide reliable and suffi cient information for prevention and control model.


Introduction
The coronavirus disease 2019 (COVID- 19) was irst reported in December 2019 in Wuhan, China. It quickly spread to other districts in the country, and a month later, to other countries across the world, impacting over 200 countries and territories [1]. On March 11, 2020, Tedros, the Director General of the World Health Organization, announced that, based on an assessment, the World Health Organization believes that the current novel coronary pneumonia could be described as a global pandemic [2]. COVID-19 is a highly contagious respiratory infection caused by a coronavirus that is transmitted primarily through close airborne droplets and contact with a patient's respiratory secretions and close contact, and may also be transmitted through droplet contaminants from a patient (e.g., through hands, clothing, food, water, or the environment). The incubation period of most patients is within 7 days. Common clinical symptoms of COVID-19 patients include: fever, respiratory symptoms, fatigue, normal or decreased peripheral blood lymphocyte count, and multiple bilateral laky glass cups around the two lungs in computed tomography (CT) results turbid [3]. Although the exact source of COVID-19 is still unknown, patients with COVID-19 are by far the most certain source of infection.
As of June 26, 2020, it has been reported that the cumulative number of con irmed cases of COVID-19 in the world has reached 9,690,148, and the cumulative deaths have reached 488,971 [4]. At this time, there are 68,135 cumulative con irmed cases of COVID-19 and 4,512 cumulative deaths in Hubei Province, China [5]. The outbreak of COVID-19 has had a great impact on people's lives and the development of the national economy.
From March to the present, novel coronary pneumonia has been basically controlled in China. The normal life of the people and the economy affected by the epidemic are recovering. However, in many areas except China, the epidemic situation is still very serious, and the number of infected people is still high. We analyze Hubei Province, the initial epidemic center of the new outbreak of pneumonia, and combine with the actual situation in Hubei Province, using different models to provide the world with valuable experience and effective measures in the ight against the epidemic.
In view of the fact that the Chinese government had adopted different policies over time during the ight against the novel coronavirus. When we are modeling, we use different models at different time periods to more effectively conform to the development trend of the epidemic and to respond to changes brought about by policies. Then we use a software that could simulate the spread of novel coronavirus, in order to ind a result in a theoretical circumstance.

Data
The data in this paper on Hubei Province are from authoritative data published by the Hubei Provincial Health Planning Commission on its of icial platform from January 23, 2020 to May 16, 2020 [6]. Data include cumulative diagnosed cases, cumulative deaths, cumulative cures, suspected cases, and asymptomatic infections, etc., and get Hubei Province's 2019 total population from of icial sources [7].
The data we collect is very large, so we must process and analyze the large amount of data collected. The approach is to use Excel to determine the data categories, and then we use MATLAB to further optimize the parameters so that we can effectively use these data to gain the results. The speci ic operation is: according to the known data, ilter the data and through basic operations in Excel to get the data we actually need. Then bring these data into MATLAB to calculate, get the optimized parameter value through fmincon function.

The model
Based on the characteristics of novel coronary pneumonia transmission, we use differential equations to establish dynamic infectious disease models and analyze the whole process in three time periods, depending on the time of transmission and the studies published by scientists on novel coronary pneumonia epidemics at different times.
Taking January 23, 2020 to February 7, 2020 as the irst phase, the SIR model [8,9] is established. Because it was in the early phase of the outbreak of novel coronary pneumonia, research in all aspects was not enough, and did not realize that novel coronary pneumonia had an incubation period and asymptomatic infection. Therefore, the data selected are the daily number of con irmed diagnoses, the cumulative number of deaths, and the cumulative number of cures.
Taking February 8, 2020 to March 30, 2020 as the second phase, the SEIQR model [10.11] is established. According to the data, the suspected case was released for the irst time on February 8, and with the control of the state, most of the diagnosed patients were able to receive effective isolation measures and treatment. Therefore, we take into account the patients in the incubation period and the quarantined patients, that is, we select the daily number of con irmed diagnoses, cumulative deaths, cumulative cures, centralized isolation, and suspected numbers.
Taking March 31, 2020 to May 16, 2020 as the third phase, the SEIQLR model [12,13] is established. According to the data on March 31, the of icial released information for asymptomatic people for the irst time. Therefore, we also consider asymptomatic infections, that is, we select the daily number of con irmed diagnoses, cumulative deaths, cumulative cures, centralized isolation, suspected number, and daily number of asymptomatic infections.

SEIQLR-based method for estimation
Based on the known data, we set the 2019 population of Hubei Province as N. Then we divide the population of Hubei Province into six categories. Among them, people who are not infected with the novel coronavirus are classi ied as S(t), the daily number of suspects is classi ied as E(t), and the daily number of diagnoses that exist daily is classi ied as I(t), those who are quarantined after diagnoses are classi ied as Q(t), asymptomatic infected people are classi ied as the latent, that is L(t), and cumulatively cured and died patients are classi ied as R(t).
Therefore, we make the following assumptions.
1. The population is evenly distributed.
2. The cured people will be permanently immune to the virus and will not be re-infected. 3. The quarantined and the diagnosed have the same infectious power.

The latent patients, the diagnosed, and the suspected have different infectious power.
For the SEIQLR model, we set speci ic de initions for the six categories of people as shown in table 1.
However, not all data for the above six categories are directly available, and some require a merging operation of known data. Speci ically, for the susceptible (S), we need to subtract all the people infected with the virus from the total population N. For the infectious (I), we need to subtract the number of people quarantined (Q) and the number of people who are exposed to the virus (E) from the number of people diagnosed. And for the removal (R), we need to add up the number of people cured and the number of people who have died because of the COVID-19.
For all the mathematical symbols mentioned above, the explanations are shown in table 2. In the analysis of the epidemic in Hubei Province, different models are used depending on the time period. Figures 1-3 that shown above, represent the schematics of the SIR, SEIQR, and SEIQLR models, respectively. Figure 4 that shows, represent the resume of igures 1-3.
For the SIR model, assuming that the total number of people is N, the proportions of healthy people, patients, and

Categories
Explanations for each category S(t) People who are possible to be infected by the COVID-19 People who are exposed to the COVID-19, but not diagnosed yet People who are diagnosed currently, but not quarantined People who are diagnosed and quarantined L(t) People who are infected but have no symptoms People who are cured after infection and would not be re-infected by COVID-19 again, and people who died because of the COVID-19     It is assumed that the number of effective contacts per patient per day is β, which is called the contagion rate, and when a healthy person is effectively contacted by the patient, he will be immediately infected and become ill. Assuming that the number of health people effectively exposed per patient per day is βS(t), the number of health people exposed per day for all patients I(t) is βS(t)I(t), these healthy individuals are https://doi.org/10.29328/journal.abb.1001016 MATLAB, and use the built-in function fmincon to optimize the values of these three parameters to get more accurate values. The remaining parameters are given in advance by the parameter estimation method.

SEIR-based simulation
With a software that simulates the SEIR model [16], we create a closed environment (Small World) to study the process of transmission of infectious diseases. Among the parameters we set are the total population, the number of initial diagnoses, etc. The detailed parameter settings are shown in tables 3-5.
The model used here is the SEIHD model, which is the equivalent of the SEIR model. Because (H) and (D) here represent the number of people cured and the number of people who died from the disease, respectively, adding these two together gives (R). To study infectious diseases for the long-term effects on society, we set the number of simulation days to 180, which is about six months.

The result of SIR-based method in Phase 1
In MATLAB, optimization of the parameters by the fmincon function [17] yields α = 0.08, β = 0.5 for the irst phase. By itting the curves, we can see from igure 5 that in the irst phase the curves it perfectly to the observed values.
As can be seen in igure 5, the itted values are in perfect agreement with the actual values, and the predicted values are also close to the actual situation. In early April the epidemic will be largely contained, and indeed it is. In many areas of immediately infected. Monotonic reduction in S(t) based on the assumption that the contagion rate is β. Among patients, the rate of diagnosed case transfer per day is ν, where ν = 1. Patients are transferred to inpatient care with a removal rate of α, where α includes cure rates and mortality, i.e., the number of daily removals is ανI(t) [13,14 ].
We establish the transformation relationship through the micro-method, thus we can get the following equation set.
When Δt→0, the model can be described by set of kinetic equation [15] (ODEs), and the inal differential equation set of the SIR model is obtained as follows.
Taking the same approach, we can obtain that the SEIQR model differential equation set.
After establishing the equation sets, we need to solve the three parameters of contagion rate β, removal rate for quarantine α and removal rate for the latent η through Table 3: Parameter settings of various types of personnel in the SEIR simulation.

Parameters of various population groups Parameter settings
Total population 8000000 Susceptible (S) 7999970 Number of exposed cases (E) 25 Number of diagnosed cases (I) 5

Number of cured cases (H) 0
Number of dead cases (D) 0  Hubei province, there were no new con irmed cases on a single day, and the blockade was lifted in early April.

The result of SEIQR-based method in Phase 2
The second phase parameter sizes are obtained after parameter optimization by the fmincon function, where α = 0.025 and β = 0.1. In igures 6-8 we can see the relationship between the observed and predicted values.
From the itted values in igures 6-8, it can be seen that the largest number of the quarantined patients in Hubei Province was in mid-February, which corresponds to the actual opening of the square cabin hospital in early February and the vigorous construction of new isolation sites. This alleviated the initial novel coronary pneumonia outbreak that could not accommodate all patients due to insuf icient medical resources. After that, the number of suspects gradually decreased, and the number of removals continued to rise. This shows that with the implementation of the policy, the situation in Hubei Province was getting better.
On the other hand, we can see that in igure 7, due to the surge of initial data, the itted curve has a large deviation. This also shows that some unexpected situations in reality cannot be effectively re lected in the mathematical model.

The result of SEIQLR-based method in Phase 3
Optimization of the parameters by the fmincon function yields third stage α = 0.13, β = 0.21, η = 0.04. With igures 9-11 we can see the relationship between the observed and predicted values.
It can be seen from igures 9-11 that the epidemic at the end of May has been completely controlled, which is consistent with the actual situation in Hubei Province. However, we can also see from igures 9-11 that the degree of curve itting is not good. This is due to the large changes in the observed values, which caused the itting curve to deviate.
Besides, this also fully demonstrates that the economic and productivity sacri iced by the Hubei government has paid off. In fact, most areas of Hubei gradually resumed production    in April. This is due to the timely establishment of makeshift hospitals by the government and the people's active response to the government's call to isolate themselves at home and wear masks whenever they go out.

The result of SEIR-based simulation
The result image is shown in igure 6, which includes ive curves representing S, E, I, H, and D. The meanings of the letters are indicated in table 3, which has been shown already.
We use the values of the parameters given in tables 3-5 to obtain the image shown above. It is evident from the image that COVID-19 has a signi icant impact on a society that has not implemented comprehensive and stringent measures. This impact is demonstrated not only by the fact that more than 80% of the population is infected with the novel coronavirus, but more importantly by the hundreds of thousands of deaths. Even though this is a virtual environment and there are many unknowns in real life, it can be used as a guide to reality through simulation. Therefore, in order to effectively combat the novel coronavirus, it is necessary to strengthen social control measures and medical means.

Conclusion
As can be seen from the above, we use two different software to analyze the data, namely MATLAB and SEIR simulator. In comparison, MATLAB is more powerful, it can improve differential equations according to our needs, but it is relatively complicated in parameter setting and image drawing; SEIR simulator is more convenient: only need to set a few parameters to generate image, but there are certain limitations in the optimization of the equation. And we combine the above two cases to achieve a more accurate purpose.
In addition, we have established three different models based on different phases, namely the SIR model, the SEIQR model and the SEIQLR model, which are gradually improved in order to better it the actual situation of the epidemic.
It can be seen from the results that the degree of itting curve is different using different models. Although we considered more factors in Phase 2 and Phase 3, the curve itting effect is not ideal. This may be because what happened in reality is accidental, and these phenomena cannot be explained by traditional mathematical models.
On the other hand, the factors taken into account do not accurately re lect reality. However, in general, the three models we establish can effectively re lect the trends in reality.   Result of SEIR simulation.
In summary, the traditional mathematical model cannot effectively explain reality to a certain extent, but this is not to deny the value of the traditional mathematical model. Although the SEIQLR model we establish does not work well in curve itting, it takes into account more factors than the SIR model, and there are more in luential factors in reality.
Therefore, for such an event that contains many factors, we should consider using the improved traditional model, such as the SEIQLR model, or use more advanced methods, such as time series analysis, neural networks, etc.

Discussion
Novel coronavirus pneumonia is in luenced by many factors, but we use a time-phased approach and establish different models for different time periods. In the case of COVID-19, an unprecedented malignant epidemic, inexperience in the early phase of the epidemic made it dif icult to make sound judgments.
Therefore, we initially establish the SIR model based on of icial published data and previous information on infectious disease models. Over time, latent patients with novel coronavirus were also counted in the data, and government control was further increased with vigorous efforts to isolate and treat patients, so we establish the SEIQR model.
When various experiences became more available, studies found that novel coronary pneumonia had asymptomatic infections, thus we establish the SEIQLR model. This approach to modeling provides a better simulation of the actual situation. Finally, we obtain the transmission of novel coronary pneumonia from the initial phase to 180 days afterwards by setting the relevant parameters in a closed environment through the SEIR simulator, which will also give us some reference value in the process of combating novel coronary pneumonia [18].
Our model of infectious disease which is established by differential equation has a wide range of operating prospect, except for infectious disease itself (e.g. COVID-19 and SARS) of the prediction, prevention and control, there are a lot of social behaviors and incidents in our life follow the rule similar to the model of the spread of infectious disease. The infectious disease model can be widely used in the diffusion of innovation, the network public opinion spread, the spread of inancial risk, and other areas of the social science research [19,20].
The diffusion process of management accounting matters, which is shown in the table 6 and igure 12, clearly uses the SIR model for analysis [21].
We can see from Figure 12 that the conversion relationship between neutrals, opponents and supporters can also be described by the SIR model, but there are some differences from the SIR model of infectious diseases. For instance, a neutral person (S) can directly become an opponent (R), but in an infectious disease, a susceptible person (S) must be transformed from an infected person (I) to a removal (R).

Limitations
When we establish the models, we do not consider the impact of natural birth and mortality on the whole. Because there is a lack of data on the mobile population and on infections among the mobile population, we ignore the impact of population movement between provinces and districts on the epidemic in the pre-blockade period in Hubei Province.
The model we have established is only for Hubei province, but it is actually worth discussing at the national level, and the spread of the novel coronavirus to rural and pastoral areas. In addition, for modeling, how to group the total population and characterize random phenomena, and how to study the strati ication of population subgroups that affect the predictive control mechanisms of infectious diseases based on epidemiological characteristics such as age, behavior, geographic distribution, and mobility. The models established are in luenced by many factors such as differences between patient infectiousness, individual susceptibility, differences in morbidity between local districts, differences in intensity of prevention and control in different regions, and errors in statistical data [22].

S(Netural)
I(Supporter) R(Opponent) Figure 12: Process of management accounting matters. The learning cost, information collection cost, business adjustment cost and income balance caused by the new management accounting practice, and the net income will aff ect the employee group with less impact Supporter(I) People who are infected by the virus currently The group of employees with increased tangible and intangible benefi ts Opponent(R) People who are cured after infection and would not be re-infected by COVID-19 and people who died due to the COVID-19 The group of employees whose cognitive costs and information collection costs become larger, their benefi ts become smaller, and their overall net income are negative https://doi.org/10.29328/journal.abb. 1001016 We can also see from the resulting images that as the complexity of the model increases, the it does not improve correspondingly, and even the it is worse than the simple model. This is not only because of the discrepancy between reality and theory, but more importantly because the factors taken into account in differential equations do not necessarily re lect reality effectively [23].
This also tells us that theoretical mathematical models alone are not enough if we want to better re lect reality, because there are many unknown factors in reality that mathematical models cannot accurately represent.

Confl ict of interest
We have no con lict of interests to disclose and the manuscript has been read and approved by all named authors.