### 1. Introduction

CFBSS is good at dealing with not only large sample sizes, but also with small sample sizes. It is very useful for individual investors. The CFBSS has been developed based on the SS theory. Hence, it inherits the advantages of SS to deal with small sample sizes.

CFBSS has an excellent adaptive capacity in the business environment. CFBSS combines with qualitative analysis and quantitative analysis for BFP. It has many advantages of dealing with qualitative, quantitative, and uncertainty information over statistical approaches, the fuzzy set theory, and so on.

CFBSS does not have the contradiction where the over-fitting problem will occur and the generalization power will be poor as the number of components increases. Because SS is a parameterized family of subsets of the set

*U*.For CFBSS, the weight coefficient is determined by a novel weight scheme, namely by the ROCB method, which is based on the receiver operating characteristic (ROC) curve [29]. It adaptively puts more weight on the component that produces a more accurate performance.

CFBSS is easy to implement.

### 2. Literature Review

### 2.1 Brief Review of the Soft Set Theory

*F*,

*E*) is called a soft set over

*U*, if and only if,

*F*is a mapping of

*E*into the set of all subsets of the set

*U*. In other words, the soft set is a parameterized family of subsets of the set

*U*. Every set of

*F*(ɛ),(ɛ ∈

*E*) from this family may be considered to be the set of ɛ – elements of (

*F*,

*E*), or as the set of ɛ – approximate elements of the soft set.

### 2.2 Brief Review of Combined Forecasting Models for BFP

### 3. The SS-Based Combined Forecasting Method for BFP

### 3.1 Combined Forecasting Model

*N*(

*n*=(1,2,…,

*N*))sample firms and

*I*(

*i*=(1,2,…,

*I*)) forecasting methods, where

*Y*

*is the vector that represents the state of each sample,*

_{N}*Y*

*is the matrix that represents the forecasting result of the*

_{NI}*I*method of

*N*samples, and

*Y*

*is the vector that is the combined result of*

_{NS}*Y*

*based on SS.*

_{NI}*Y*

*,*

_{N}*Y*

*, and*

_{NI}*Y*

*are showed in Formula (1):*

_{NS}*λ*

*is the weight coefficient of individual*

_{i}*i*th forecasting methods, and:

*y*

*= 0 means the*

_{n}*n*th sample firm is in a state of failure in practice and

*y*

*= 1 means that the*

_{n}*n*th sample firm is in normal state in practice.

*y*

*= 0 means the*

_{ni}*i*th individual forecasting method believes that the

*n*th sample firm will be in a state of failure and

*y*

*= 1 means that the*

_{ni}*i*th individual forecasting method believes that the

*n*th sample firm will be in a normal state.

*y*

*= 0 means that the CFBSS result shows that the*

_{ns}*n*th sample firm will be in a state of failure and

*y*

*= 1 means that the CFBSS result shows that the*

_{ns}*n*th sample firm will be in a normal state.

*λ*

*. As mentioned above, the combination method is based on the soft set for its advantages. Thus, we will focus on the study of the other two key points.*

_{i}### 3.2 Individual Forecasting Method Selection

*U*. The increase of the number of

*E*(

*E*is the set of individual forecasting methods) will cause SS to perform better [27]. To reduce the complexity of CFBSS, the three components seem to be well balanced between obtaining better forecasting performance and reducing model complexity.

### 3.3 Weight Determination Based on the ROC Curve

*ACC*to measure the forecasting accuracy of a forecasting method, as shown in Formula (4):

*W*is the matrix whose entry w

*(*

_{ij}*i=*1,2,

*…*,

*I*;

*j=*1,2,

*…*,

*J*≤

*N*) represents the

*ACC*of the

*i*th forecasting method on the

*j*th group of the testing sample, where

*J*is the number of groups of training samples.

*W*′=[

*w*

_{1},

*w*

_{2}, …,

*w*

*], in which*

_{I}*w*

_{i}(

*i*=1,2,…,

*I*) is the mean of

*j*th column elements. It is a more accurate measurement to show the forecasting ability of each individual forecasting method. That is:

*λ*

*of each individual forecasting method as follows:*

_{i}### 3.4 Algorithm

**Preprocess financial data**

*Step 1.**x*

*is the original value of the*

_{nm}*m*th variable for the

*n*th firm and max

*, min*

_{m}*are the maximal value and minimal value of the*

_{m}*m*th variable for all firms, respectively.

**Obtain the forecasting results of ES, LR, and SVM**

*Step 2.**y*

*, as showed in Formula (1).*

_{NI}*x*

*represents ‘advice of the financial analysis organization’ for ES forecasting in this paper.*

_{ES}*T*(

*t*= (1,2,…,

*T*)) experts, and

*T*is an odd number. For samples, we got the matrix

*Y*

*to reflect the forecasting results of*

_{NT}*T*ESs, as shown in Formula (9):

*n*th firm from ES, as shown in Formula (10):

*T*.

*y*= 1 was greater than 0.5.

**Compute the**

*Step 3.**ACC*of each individual forecasting method

*ACC*of each individual forecasting method. With Formulas (5) and (6), the

*ACC*mean of each individual forecasting method can be computed.

**Compute the weighted coefficient**

*Step 4.**λ*

_{i}**Combine forecasting results with the soft set theory.**

*Step 5.**U*is the set of firms (

*U*= {

*h*

_{1},

*h*

_{2}, …,

*h*

*}) of interest and*

_{n}*E*is the set of forecasting methods (

*E*= {

*e*

_{1},

*e*

_{2}, ···,

*e*

*}). Let*

_{i}*F*be a mapping of

*E*into the set of all subsets of the set

*U*. According to the definition of the soft set theory [27], a SS (

*F*,

*E*) can be constructed. Then, we are able to obtain the tabular representation of (

*F*,

*E*), as shown as Table 1.

*λ*

*, we can get the weighted tabular representation of (*

_{i}*F*,

*E*), as shown in Table 2.

*F*of the soft set (

*F*,

*E*) can be presented as Formula (11).

*n*th firm. It means that if

*y*

*≥ 0.5, then*

_{sn}*n*th firm will be a success. If

*y*

*< 0.5, then*

_{sn}*n*th firm will fail in the future.

### 4. Empirical Research

### 4.1 Samples and Data

*T*=5)finance analysis organizations as our Expert System. Those five companies were randomly selected from the organizations that have similar backgrounds and forecasting abilities. The five companies selected are as follows: CITIC Securities, SHENYIN & WANGUO Securities, HAITONG Securities, CMS, and INDUSTRIAL Securities. Their research reports can be downloaded from either their websites or from the website of Invest Today.

### 4.2 Features Selection

*x*

*is the only feature for qualitative analysis. Thus, in the following, we will focus on the feature selection for quantitative analysis.*

_{ES}### 4.3 Experiment Design

*t*using the data from the year (

*t*– 2) or (

*t*– 3) is more difficult than using the data from the year (

*t*– 1). In this paper, we tackled this challenge.

Collect data and classify NST and ST firms. We randomly split this data into three groups for this experiment. One is the holdout data set that will be rejected. The other is the training data set and the last one is the testing data set.*Step 1.*With the training data set and the selected features, we separately obtained forecasting results using ES, LR, and SVM. Then, the forecasting results were CFBEWs, CFBNNs, CFBRSDS, and CFBSS.*Step 2.*The testing data set was used for evaluating each forecasting model and we obtained the ACC.*Step 3.*We then compared the prediction performance and finished.*Step 4.*

### 5. Results and Discussion

### 5.1 Results

*t*– 2) and (

*t*– 3) for the percentages (20%, 80%), (50%, 50%) and (80%, 20%) are showed in Tables 6–11. The accuracy of each forecasting method for BFP was computed with Formula (4).

#### 5.2.1 Forecasting accuracy discussion

### Analysis of forecasting accuracy for the year (*t* – 2)

*t*– 2), we can easily see that CFBSS has the highest mean and median for forecasting accuracy percentage than those of other forecasting methods. With the percentages changing from (20%, 80%) to (80%, 20%), the accuracy does not change a lot. It is around 85%, which is because SS has an excellent ability for BFP with different sample sizes. Moreover, ES outperforms the rest of the methods. The accuracy of ES is around 82%. This is understandable since ESs are professional practitioners and may have some more information.

*t*– 2), the sample sizes have a great effect on the forecasting accuracy of each method for BFP. However, CFBSS is an effective tool for BFP with different sample sizes. Without the limitation of sample sizes, combined forecasting methods have higher forecasting accuracy than individual forecasting methods. This is due to the fact that those methods utilize more information for BFP than individual forecasting methods, and that they been developed based on individual forecasting methods.

### Analysis of forecasting accuracy for the year (*t* – 3)

*t*– 3), we can easily see that the conclusion is similar to the result for the data for the year (

*t*– 2). Thus, we focused on analyzing the difference between them.

*t*– 2) will have higher accuracy than forecasting with data sets for the year (

*t*– 3). This is the same result that was pointed out in [39]. This is due to the fact that with a longer amount of time passing before forecasting, there may be more unpredicted incidents happening. Furthermore, unpredicted incidents may affect firms’ operations. Thus, using this current information is not very efficient for BFP long after.

#### 5.2.2 Forecasting stability discussion

### Analysis of forecasting stability for the year (*t* – 2)

*t*– 2), we can easily see that ES has the smallest value in terms of the forecasting variance and coefficient of variance out of all of the employed forecasting methods mentioned above. This means that ES has the best forecasting stability than the other forecasting methods. This is because ESs are professional practitioners who pay more attention to forecasting stability.

### Analysis of forecasting stability at year (*t* – 3)

*t*– 3), we can easily see that the conclusion is similar to the conclusion for the data for the year (

*t*– 2). Thus, we focused on analyzing their difference.

*t*– 2) will have a better forecasting stability than forecasting with data sets for year (

*t*– 3). This is in keeping with the result that was pointed out [25]. This is due to the fact that with a longer amount of time passing before forecasting, there may be more unpredicted incidents occurring. These unpredicted incidents will affect the operation of a firm. In other words, these unpredicted incidents are considered to be noise. Thus, using this current information is not very efficient for BFP long after.

#### 5.2.3 Summary

### 6. Conclusion

*F*that is employed in SS needs to be further developed. That is because these mapping functions will provide deeper insights to the forecasting performance of BFP. In our current work, we used ES, LR, and SVM as the components and set the number of ESs to 5. Doing so seems to be effective, but is based on heuristics. Systematical and theoretical developments on the selection of models are continuations of this work. Furthermore, the CFBSS obtained a good forecasting performance for the BFP of Chinese listed firms. We do not know its forecasting performance on financial data sets from other countries. It definitely deserves further exploration.