Second-order conic programming for data envelopment analysis models

Data envelopment analysis (DEA) is a widely used benchmarking technique. Its strength stems from the fact that it can include several inputs and outputs of not necessarily the same type to evaluate efficiency scores. Indeed, the aforesaid method is based on mathematical optimization. This paper constructs a second-order conic optimization problem unifying several DEA models. Moreover, it presents an algorithm that solves the former problem, and provides a MATLAB function associated with it. As far as known, no MATLAB function solves DEA models. Among different types of DEA, this function can handle deterministic, Malmquist index, and stochastic models. In fact, DEA is involved in various practical applications, thus, this work will provide some possible future extensions, not only for MATLAB but also for any programming software in applications of decision science and efficiency analysis.


Introduction
Data envelopment analysis (DEA) is a nonparametric mathematical tool for productive efficiency evaluation. This method produces what's known as the efficiency frontier based on the given data related to inputs and outputs of the decision-making units (DMUs) under study. Efficient DMUs are lying on what is known as "bestpractice-frontier" [1], [2]. Its strength arises from not only being nonparametric with no predefined assumptions imposed on it but also because of the simplicity of the model. Even though it was initiated by maximizing the production efficiency ratio, it turns out to be a linear constrained optimization problem in absence of stochastic variables. This method is exponentially increasingly used in diversified fields [3], [4]. The main contribution of this method is in operations research and management science. Dutta et al. provided a literature review of DEA applications in supplier evaluation and selection [5]. Among the various application fields, DEA is extensively applied in various energy sectors like renewable energy and wastewater treatment plants e.g. [6]- [8], healthcare, where lately it was used for evaluating efficiencies of healthcare systems during COVID-19 pandemic e.g. [9]- [11], economic and Banking e.g. [12], [13], agriculture e.g. [14]- [16], education [17], [18], etc. Several software systems implement DEA models to collect information about relative productivity. For example, DEAP is an open-source program established by Tim Coelli. This program can calculate technical, cost efficiencies and Malmquist total factor productivity indices with the ability to handle different constraints on returns to scale in different orientations [19]. MDeap2, Open Source DEA, EMS, MaxDEA, GAMS and PIM-DEA are all available programs that calculate various DEA-related scores. MATLAB is a programming platform developed by MathWorks in 1994 [20]. It is widely used by scientists for data analysis, model creation and algorithm development. It comes out that DEA models are not implemented in this platform.
This paper provides an algorithm and an open-source MATLAB code, which can be transformed into any programming language, for implementing different DEA models and it is organized as follows: In Section 2, a unified DEA model is constructed. Then, in Section 3, the algorithm to solve this last model together with the discretization are detailed. After, an illustrative example is given in Section 4. Finally, Section 5 concludes the paper.

DEA Models
Data envelopment analysis is a widely used efficiency evaluation method. It calculates efficiency scores for decision-making units (DMUs). The idea arises from the fact that the efficiency score is the ratio of outputs to inputs. Indeed, as these later variables are mostly different in nature, a linear combination is used to unify the unit for these variables. That is, for DMUs to be evaluated based on inputs denoted by ( ) 1≤ ≤ ,1≤ ≤ and outputs denoted by ( ) 1≤ ≤ ,1≤ ≤ , the relative efficiency score for DMU is then The real numbers and are the weights associated with the inputs and outputs , respectively.
Given the abovementioned, Charnes et al. [1] introduced the constant return to scale model (CCR), by linearizing the above fractional programming, with two different orientations, as follows Input-oriented model: These optimization problems, whether input or output-oriented, are known by the multiplier form. In practice the dual of these models is solved. The dual forms are known by the envelopment forms [21] and are given as follows Input-oriented envelopment form model: Output-oriented envelopment form model: where and are the weight and the efficiency score for the -th DMU, respectively. A DMU with an efficiency score equal to 100% is relatively efficient, otherwise, it is inefficient. It is to be noted that the CRS input and output-oriented models produce the same efficiency score.
Following, in 1984, Banker Charnes and Cooper modified the CCR model and proposed what is known as the variable returns to scale model (VRS). It is worth mentioning that CCR and CRS refer to the same model which is the constant return to scale created by Charnes, Cooper, and Rhodes. This type of model reflects that the change of output(s) relative to input(s) is constant and hence the efficiency score is not substantially related to the DMU size. Whereas the BCC named after Banker Charnes and Cooper is also labeled by VRS. This last waive the mentioned condition and, therefore, any variation in either input(s) or output(s) does not necessarily produce a proportional change in the other [22], [23].
It is worth mentioning that the choice of the orientation depends on the ability in increasing any of the outputs or decreasing any of the inputs. The input-oriented model minimizes the input at the given output level, whereas the output-oriented one maximizes the output at the given input level. The VRS model differs from the CRS model only by adding the below extra constraints on the weights associated with the DMUs Due to the embedding of this extra constraint, it can be concluded that the VRS efficiency scores are larger than those obtained from the CRS model.
After that, the stochastic data envelopment analysis model was initiated. It allows any variable, whether input or output, to be stochastic, i.e., it can be a random variable that follows a probability distribution [24]. The required constraint is preserved by enforcing the probability value of this constraint to be almost one. This is the so-called chance-constrained programming (CCP), more details can be found in [24], [25].

Malmquist DEA
The be the vectors containing the values of inputs and outputs, respectively. Indeed, as elaborated by Lin & Fei [26]and Cooper et al. [22], the MI-DEA calculates the total factor productivity change over time (tfpch), and can be computed by where ( , ) represents the efficiency score of the -th DMU observed in period measured by frontier technology [27], [28]. It is to be noted that, the values obtained from the MI-DEA model show the variation in productivity among two terms of production: the "Catch-up" and the "Frontier-shift". The former identifies the effect of growth in a DMU and is calculated by 2 ( 2 , 2 )/ 1 ( 1 , 1 ), while the last verifies the change in the efficient frontiers and is equal to the square root of ( 1 ( 1 , 1 ) 1 ( 2 , 2 ))/( 2 ( 1 , 1 ) 2 ( 2 , 2 )).
Accordingly, the output and input-oriented radial MI-DEA models are Input-oriented MI-DEA model: Output-oriented MI-DEA model: If the efficiency score ( , ) = 1 ( , ) + δ 0I / ( , ) is identically one, then the DMU is said to be efficient in period measured by frontier technology , otherwise, it is inefficient. It can be observed that, for = , the ( , ) is the (CCR) relative efficiency score for the -th DMU in the period .
Following the above methodology, a unified MI-DEA model is It is also to be mentioned that the MI-DEA model is defined in the literature based on the CRS model only [28].

Discretization and algorithm
Multiple software systems implement the DEA model like DEAP, MDeap 2, Open Source DEA, EMS, and GAMS. None of the afore mentioned can handle all the above models at the same time. For example, DEAP is an open-source program that runs DEA and MI-DEA input/output-oriented and VRS/CRS models, with no prior installation, however, the stochastic DEA is not included [19]. In this section, an algorithm together with an open-source code [29], which solves any of the former DEA models, with no restrictions on the number of DMUs, are provided for MATLAB users. It is to be mentioned that the discretization and the obtained finitedimensional optimization problem can be adapted to any programming language.
The only existing equality constrained can be written using matrix multiplications as follows While the inequality constraints are updated for each DMU . Let where [ ] : is the -th column in a matrix , and ].
Then the inequality constrained is given by Indeed, the nonlinear constraints are a consequence of the chance constraints and are evaluated following the stochastic distribution variable (input or output). For ∈ ℐ , ∈ and ∈ {1, … , }, let ~( , ) (resp. ~(̃,̃)) for all ∈ {1, … , }. Then, the chance-constrained for normally distributed random variables and Φ ( ) = ϵ, Φ is the cumulative density function for standard normal distribution.
While the chance-constrained of the output variable where is the Dirac delta function [30] commonly known by → 0 ( − 0 ) = ∞ and ( ) = 0 for ≠ 0, and loosely thought of to be After solving the optimization problem (M-3) the relative efficiency score for the -th DMU is calculated by Moreover, the MI-DEA optimization problem (M-2) is equivalent to the following linear optimization problem The matrices ( , ) and are defined in the same manner as in (4) and (5)  ].

Algorithm
The algorithm for solving various types of DEA is aligned in this section.
1. Initialize the parameters : number of DMUs, : number of inputs, : number of outputs. 2. Determine the stochastic variables if any (i.e., determine ℐ , ℐ , , ). 3. Choose the scale variation ( = 1 for VRS and = 0 for CRS), and the orientation ( = 1 for inputoriented and = 0 for output-oriented). 4. Load the data for the deterministic input(s) and output(s) of each DMU, and the parameters of the stochastic variables if exist. 5. Construct the matrices ℳ ℐ , ℳ using (2). 6. If ℐ ∪ ≠ ∅, set the matrices and ̃ with the vectors and ̃, then define the nonlinear function ( , ) as in (6). 7. Construct the matrices and using (3). 8. Determine the upper and the lower bounds to be imposed on , as described above 0 ≤ ≤ ( 1 ) + 1, and define the objective function ( , ) = . 9. For = 1, … , , update the matrices and according to Equations (4) and (5) then solve the problem (M-3) to obtain the relative efficiency score for the -th DMU as defined in (9).
In case the Malmquist Index DEA is to be used, then two periods should be defined and accordingly two sets of inputs and outputs for the two periods are loaded. Thus, the steps followed are as follows 1. Initialize the parameters : number of DMUs, : number of inputs, : number of outputs.
2. Choose the orientation (I=1 for input-oriented and I=0 for output-oriented).
8. For = 1, … , , update the matrices ( , ) and defined in (7) and (8), then solve the problem (M-4) to obtain the relative efficiency score for the -th DMU in period measured by frontier technology . 9. Go to step 6 and initialize ( , ) with a different value. 10. Calculate the productivity change over time for all the DMUs using equation (1).

Illustrative numerical example
This section provides a hypothetical example with three cases to show the performance of the given algorithm. The data are generated randomly and shown in Table 1. Case 2. In this case, a stochastic input is added to the efficiency evaluation procedure. The mean of that input for each DMU is given in Table 1, the covariance matrix is randomly generated, and the efficiency scores are in Table 2. The performances of DMUs 1, 2, 3, 4, 6,7,8,10,11,12,14,15,18,19, and 20 are not affected by the addition of this variable. While DMUs 9, 13, 16, and 17 performances are extremely affected by this variable, they become efficient in all cases. DMU 5 has a special case where it's inefficient in all cases except for the stochastic VRS input-oriented model.  It is worth mentioning that the MATLAB function used obtained the same results as the well-known program DEAP [19] with the absence of a stochastic variable, as this last is not implemented in DEAP.

Conclusion and future work
This paper provides a constructive algorithm to solve various types of DEA models. A unified second-order conic optimization problem is obtained representing all the former models. In addition, an associated MATLAB code is given for further usage and implementation. It is to be noted that this code can be easily transformed into different programming languages. Among the DEA models, input/output-oriented, variable/constant return to scale, and normal/Malmquist models are considered together with the fact that stochastic variables (inputs or outputs) can be integrated.
For future work, it will be interesting to allow the stochastic variables under consideration to follow different probability distributions and not be restricted to the normal one [31]. Furthermore, reflecting fuzzy DEA models [32] is an enhancement of the provided algorithm. Moreover, including a function that systematically runs regression analysis of the obtained efficiency scores perhaps is an added value to DEA.

Declaration of competing interest
The authors declare that they have no known financial or non-financial competing interests in any material discussed in this paper.

Funding information
No funding was received from any financial organization to conduct this research.