Cluster analysis: Difference between revisions

From CEOpedia | Management online
No edit summary
m (Text cleaning)
 
(53 intermediate revisions by 4 users not shown)
Line 1: Line 1:
Networks of economic entities, such as industrial enterprises and common production cycles, and organizations that provide support services (banks, consulting and marketing firms, research and educational institutions, insurance companies) form clusters, a '''complex economic system'''.
'''[[Cluster]] analysis''' is an explorative procedure to divide data sets into groups with regard to their similarity. Various criteria and characteristics can be used for cluster analysis, on the basis of which the similarity of the individual data is determined. A cluster analysis is based on the calculation of a similarity measure and belong to the unsupervised machine learning methods<ref>Everitt, Landau, Leese, Stahl, 2011, pp. 2-8.</ref>.  
Over the past decade, cluster policy has become one of the most important focal points of national policy in developed and developing countries to enhance national and regional competitiveness. This idea is spreading in the form of clearly defined policies and other policy initiatives such as regional strategies and activities supporting local production systems.


Most of today's industrialized economies need the institutional support of firms to become more competitive. (Afanasiev M.; Korchagina N. and Myasnikova L. (2006)) argue that enterprise consolidation and clustering are currently one of the most effective supports for increasing production efficiency.
==Prerequisites of the cluster analysis==
The global economy has been impacted by trends in the expansion of the role of the cluster approach. Innovative approaches to the creation of integrated management forms are required for the modern expansion of economic space across the globe, taking into account factors such as:
A cluster should be maximally homogeneous within itself and clearly distinguishable from other clusters. A clear demarcation must be ensured. Therefore, the following conditions should be met<ref>Aggarwal, Reddy, 2014, pp. 577-583.</ref><ref>Aggarwal, Reddy, 2014, p. 124.</ref>:
* factors both internal and external to regionalization, including enhancement of regional and national competitiveness
* '''Size of the data set:''' Under certain circumstances, a meaningful result can only be achieved with a sufficiently large data set. Depending on the task, it is therefore necessary to weigh up whether the amount of data is sufficient.  
* an increase in regional investments and innovation.
* '''Normalization of the data:''' if there are large differences in the value range of the data, the data should be normalized beforehand.
* development of long-term forms of economic and territorial integration.
* '''Elimination of outliers:''' outliers can strongly distort the results. Thus, the data should first be analyzed and evaluated for possible extreme values and outliers should then be eliminated.  
* enhancement of regional and national competitiveness.  
* '''Bias:''' If there are strong correlations between the data, the results could end up being heavily biased. This must be avoided.
* a rise in globalization processes.


Therefore, the cluster principle, which is based on the proactive promotion of propellant industries, becomes more relevant in terms of creating clusters of "growth poles" in the regional economy and increasing the effectiveness of the public policy.
==Procedure of a cluster analysis==
In a first step, the determination of characteristics or corresponding similarities takes place. Next, you should select an [[algorithm]] that you will use to analyze your data and thus lay the foundation for the formation of clusters. Thirdly, the determination of the number of clusters takes place as well as the formation of the respective clusters. Here, the data is assigned on the basis of segmentation criteria. For the grouping to take place, not only the number of groups must be evaluated, but also a similar cluster size for all your identified clusters<ref>Tian, Xu, 2015, pp. 166.</ref>.


==Analysis of Dutch Flower Clusters using Porter's Five Forces Model==
==Cluster analysis methods==
Porter’s five forces model is used to evaluate the competitiveness and strength of businesses. In this paper, this model is used to order to determine the sector's attractiveness and understand its competitiveness in the market regarding The Dutch Flower Cluster:
There are numerous algorithms for dividing data into clusters. Which [[method]] is most suitable generally depends on the question. Often, the results of different methods are compared at the end to determine the correct method. The best known methods are<ref>Aggarwal, Reddy, 2014, pp. 89-105.</ref>:  
* the '''Threat of New Entrants''' reflects how new market players impose threats to the existing market players. For the Dutch Flower Cluster, we can identify some factors which reduce the threat of new entrants such as the initial investments and capital needed to start the business, the presence of an economy of scale, the existing strict regulatory framework and a restricted distribution channel.
* '''K-Means:''' The k-Means method is an iterative algorithm. With each iteration step, the cluster centers are newly determined and the similarity of individual data points to the cluster center is reflected by the Euclidean distance. A data point is assigned to a cluster if the Euclidean distance to it is the smallest. This machine learning algorithm is quite simple, but the number of clusters must be determined in advance. A major drawback of this algorithm is also that it is very sensitive to outliers.
* the '''Threat of Substitute Products''', the availability of substitute products makes the competitive environment challenging for the Dutch Flower Cluster since customers can use alternative products in order to satisfy their needs.
* '''Hierarchical Cluster Analysis:''' This machine learning method is based on distance measures. A distinction is made between the divisive clustering methods and the agglomerative methods. The divisive procedures belong to the top-down procedures, in which initially all objects of the data set belong to a cluster. Then, step by step, more and more clusters are formed. The agglomerative methods, on the other hand, follow the opposite approach (bottom-up methods). Each object first forms its own cluster, and they are merged step by step until all objects belong to one cluster. Once formed, clusters can then no longer be changed. However, how to partition depends on the user. This is beside the complex computation the largest disadvantage of these methods. However, it is not necessary to know the number of clusters beforehand.  
* the '''Rivalry Among Existing Firms''' shows us the number of competitors that are present in The Dutch Flower Cluster. The Dutch Flower Cluster can face strong pressure from rival firms, which can limit each other’s growth
* the '''Bargaining Power of Buyers''' indicates the pressure that customers put on businesses in order to get high-quality products at affordable prices. Strong bargaining power lowers profitability and makes the industry more competitive for The Dutch Flower Cluster.
* the '''Bargaining Power of Buyers''' indicates the pressure that customers put on businesses in order to get high-quality products at affordable prices. Strong bargaining power lowers profitability and makes the industry more competitive for The Dutch Flower Cluster.


The application of the Five Forces Model of Porter can be used to allow companies to make wise strategic decisions. This model can be used as a starting point to analyse and determine the Dutch Flower Cluster's potential and attractiveness and it can be combined with other frameworks, such as PESTEL and Value Chain analysis, for a better understanding of the external environment.
==Applications of the cluster analysis==
Cluster analysis has become a common means of grouping data in a wide variety of fields<ref>Everitt, Landau, Leese, Stahl, 2011, pp. 9-13.</ref>:
* '''[[Marketing]]:''' Analyzing customers and sorting them into the right target groups can be an enormous [[competitive advantage]] [[in marketing]]. Cluster analyses are used here to identify similar customers from the entire [[customer]] base and to develop individual advertising strategies for these customers.
* '''Medicine and psychology:''' Behavioral patterns or disease patterns can also be grouped into clusters. Suitable therapies can then be developed on this basis.  


==Covid-19 consequences and the internationalisation of the Dutch Flower Cluster==
==Footnotes==
The Covid-19 pandemic struck the Dutch flower cluster during the worst of times, resulting in an 85% loss in 2020’s turnover. The Netherlands is responsible for 44% of the world trade in floriculture products, and 77% of flower bulbs sold worldwide. The virus hit the tulip market in the middle of the tulip sales, the period going from March to May which includes festivities such as Women’s Day, Mother’s Day and Easter, gaining 7 billion euros with an average of $30 million in flowers sold daily. At the beginning of the pandemic, in March 2020, tulip stems were introduced in the largest flower market of Aalsmeer at a stalling price of 0 euros. The growers had to destroy hundreds of millions of tulips and other blossoms. The losses vary from 10% of the turnover to 85% for some producers. Moreover, the transportation of flowers usually works through passenger planes, so in 2020 the early lockdown measures of cancelling all international flights resulted in a stall of the primary means of transportation for the horticulture and floriculture industries. The demand for space on cargo planes rose as well as the freight prices, which went from $1.85 per kg to $4 per kg. This discouraged farmers from the export market, and those who continued faced losses as they were not able to transfer the higher costs to consumers, which were not interested in sustaining the rising prices.
<references />


The Dutch Tulip cluster has historically been a partner with the Kenyan flower cluster. In 1980 the Kenyan government, the Dutch Ministry of Development Aid, and a group of Dutch growers funded a study on Kenya’s potential for a flower cluster, as the country could provide the Dutch market with high-quality flowers all year round. The Netherlands continued investing in Kenya with flows of FDI. In 2008, 70% of Kenyan cut flower production was owned by Dutch growers. The main Kenyan exports usually are destined for the Netherlands’ market, and in 2009, during the economic crisis, FloraHolland helped the producers in Kenya to develop alternative markets in Russia and Japan. With the pandemic, this strong link between the Dutch and Kenyan flower clusters meant that as soon as Covid-19 hit the Netherlands also the African country was weakened. As the auction at Holland was operating at a lower capacity and demand sank, Kenyan growers were left in a difficult position.
{{infobox5|list1={{i5link|a=[[Descriptive statistics]]}} &mdash; {{i5link|a=[[Mann-Whitney U test]]}} &mdash; {{i5link|a=[[Control limits]]}} &mdash; {{i5link|a=[[Systematic sampling techniques]]}} &mdash; {{i5link|a=[[Parametric analysis]]}} &mdash; {{i5link|a=[[Two-way ANOVA]]}} &mdash; {{i5link|a=[[Decision tree]]}} &mdash; {{i5link|a=[[CUSUM chart]]}} &mdash; {{i5link|a=[[Multiple regression analysis]]}} }}
The leadership of the Dutch cluster was also established thanks to the connections it made with foreign clusters (in Colombia, Ecuador, Kenya, China, Israel, Ethiopia, Japan, Brazil, and Canada): in fact, the Dutch provided them with supplies, service and knowledge to help them advance and boost their expansion. The strategic direction of the Dutch cluster consisted in moving parts of the production and its development in foreign and growing clusters, integrating the “satellite clusters” with the Dutch one within the framework of the auction system Royal Flora Holland. The factor that helped the most the internationalisation of flower production is the transportation means, as the ability to transport flowers by aeroplane increased the spread of flower production in all locations. The Dutch cluster connects flower clusters in all areas of the world, either with its aid or thanks to its auction system. But, even if all the foreign clusters are connected to the Dutch one, they still are in competition with one another, having their own advantages: e.g. Colombia might have an adequate climate for all-year-round production, without the need to pay gas to heat the greenhouses, but it also has higher costs of transportation, while the Netherlands is greatly connected to the places where flowers are most bought, such as Europe and the US.


==Conclusion==
Cluster development is increasingly shaping the economic geography of today's cities. It aims to improve the urban spatial structure and contribute to economic growth. As shown in this paper today, one of the most effective mechanisms for improving regional competitiveness is bundling socio-economic spatial domains.
In these last years, mainly because of the Covid-19 pandemic, the Dutch flower cluster is facing a high number of challenges, both internal and external. Although everything, the abovementioned cluster managed to remain the main player in this industry; the firms of this cluster are already trying their best to overcome these problems, thanks to the creation of new policies and approaches: the introduction of new products and services to meet the new demand, the improvement of logistical skills, the cooperation with international partners and the development of sub-cluster in places in which the costs of production are lower.
==References==
==References==
* Ahmed, J. U., Linda, I. J., & Majid, M. A. Royal FloraHolland: Strategic Supply Chain of Cut Flowers Business.
* Aggarwal, C. C., Reddy, C. K. (2014). [https://people.cs.vt.edu/~reddy/papers/DCBOOK.pdf ''Data Clustering. Algorithms and Applications''], "Chapman & Hall".
* Brinegar, A., Peña, J. (2013, 02 01). Academia.edu. Retrieved 2022, from Academia.edu: https://www.academia.edu/37072594/Topic_One_Case_Study_The_Dutch_Flower_Cluster
* Everitt, B. S., Landau, S., Leese, M., Stahl, D. (2011). [https://epdf.tips/cluster-analysis-fifth-edition-wiley-series-in-probability-and-statistics.html ''Cluster Analysis, 5th Edition''], "Wiley Series in Propability and Statistics".
* Frankowska, M. (2013). THE CONCEPT OF CLUSTER SUPPLY CHAINS AS THE DIRECTION FOR THE DEVELOPMENT OF EUROPEAN COOPERATIVE NETWORKS.
* Tian, Y., Xu, D. (2015). [https://link.springer.com/content/pdf/10.1007/s40745-015-0040-1.pdf ''A Comprehensive Survey of Clustering Algorithms''], "Annals of Data Science", 2(2), pp. 165-193.
* Kopijn, L. (2022). Made for travellers. Retrieved from Why is the Netherlands Famous for Tulips? https://madefortravellers.com/netherlands-famous-for-tulips/
[[Category: Methods and techniques]]
* Lawaspect. (2020). Lawaspect.com. Retrieved from https://lawaspect.com/dutch-flower-cluster-summary/
{{a|Max Bachmann}}
* Michael E. Porter, J. R.-V. (n.d.). Emba Pro. Retrieved from EMBA Pro Porter Five Forces Solution for The Dutch Flower Cluster case study: https://embapro.com/frontpage/porter5forcesanalysis/17646-flower-cluster
* Porter, M. E., Ramirez-Vallejo, J., & Van Eenennaam, F. R. E. D. (2011). The Dutch flower clusters. Harvard Business School Strategy Unit Case, (711-507).
* Tavoletti, E., & te Velde, R. (2008). Cutting Porter’s last diamond: Competitive and comparative (dis) advantages in the Dutch flower cluster. Transition Studies Review, 15(2), 303-319.
* Vertakova, Y., & Risin, I. (2015). Clustering of socio-economic space: theoretical approaches and Russian experience. Procedia Economics and Finance, 27, 538-547.
* Yang, Z., Hao, P., & Cai, J. (2015). Economic clusters: A bridge between economic and spatial policies in the case of Beijing. Cities, 42, 171-185.
* Zander, H. (2018, 08 22). case48.com. Retrieved from The Dutch Flower Cluster Porter Five Forces Analysis: https://www.case48.com/porter-case/17646-The-Dutch-Flower-Cluster
 
{{a|Francesca Scattolin}}
[[Category:Economics]]

Latest revision as of 18:23, 17 November 2023

Cluster analysis is an explorative procedure to divide data sets into groups with regard to their similarity. Various criteria and characteristics can be used for cluster analysis, on the basis of which the similarity of the individual data is determined. A cluster analysis is based on the calculation of a similarity measure and belong to the unsupervised machine learning methods[1].

Prerequisites of the cluster analysis

A cluster should be maximally homogeneous within itself and clearly distinguishable from other clusters. A clear demarcation must be ensured. Therefore, the following conditions should be met[2][3]:

  • Size of the data set: Under certain circumstances, a meaningful result can only be achieved with a sufficiently large data set. Depending on the task, it is therefore necessary to weigh up whether the amount of data is sufficient.
  • Normalization of the data: if there are large differences in the value range of the data, the data should be normalized beforehand.
  • Elimination of outliers: outliers can strongly distort the results. Thus, the data should first be analyzed and evaluated for possible extreme values and outliers should then be eliminated.
  • Bias: If there are strong correlations between the data, the results could end up being heavily biased. This must be avoided.

Procedure of a cluster analysis

In a first step, the determination of characteristics or corresponding similarities takes place. Next, you should select an algorithm that you will use to analyze your data and thus lay the foundation for the formation of clusters. Thirdly, the determination of the number of clusters takes place as well as the formation of the respective clusters. Here, the data is assigned on the basis of segmentation criteria. For the grouping to take place, not only the number of groups must be evaluated, but also a similar cluster size for all your identified clusters[4].

Cluster analysis methods

There are numerous algorithms for dividing data into clusters. Which method is most suitable generally depends on the question. Often, the results of different methods are compared at the end to determine the correct method. The best known methods are[5]:

  • K-Means: The k-Means method is an iterative algorithm. With each iteration step, the cluster centers are newly determined and the similarity of individual data points to the cluster center is reflected by the Euclidean distance. A data point is assigned to a cluster if the Euclidean distance to it is the smallest. This machine learning algorithm is quite simple, but the number of clusters must be determined in advance. A major drawback of this algorithm is also that it is very sensitive to outliers.
  • Hierarchical Cluster Analysis: This machine learning method is based on distance measures. A distinction is made between the divisive clustering methods and the agglomerative methods. The divisive procedures belong to the top-down procedures, in which initially all objects of the data set belong to a cluster. Then, step by step, more and more clusters are formed. The agglomerative methods, on the other hand, follow the opposite approach (bottom-up methods). Each object first forms its own cluster, and they are merged step by step until all objects belong to one cluster. Once formed, clusters can then no longer be changed. However, how to partition depends on the user. This is beside the complex computation the largest disadvantage of these methods. However, it is not necessary to know the number of clusters beforehand.

Applications of the cluster analysis

Cluster analysis has become a common means of grouping data in a wide variety of fields[6]:

  • Marketing: Analyzing customers and sorting them into the right target groups can be an enormous competitive advantage in marketing. Cluster analyses are used here to identify similar customers from the entire customer base and to develop individual advertising strategies for these customers.
  • Medicine and psychology: Behavioral patterns or disease patterns can also be grouped into clusters. Suitable therapies can then be developed on this basis.

Footnotes

  1. Everitt, Landau, Leese, Stahl, 2011, pp. 2-8.
  2. Aggarwal, Reddy, 2014, pp. 577-583.
  3. Aggarwal, Reddy, 2014, p. 124.
  4. Tian, Xu, 2015, pp. 166.
  5. Aggarwal, Reddy, 2014, pp. 89-105.
  6. Everitt, Landau, Leese, Stahl, 2011, pp. 9-13.


Cluster analysisrecommended articles
Descriptive statisticsMann-Whitney U testControl limitsSystematic sampling techniquesParametric analysisTwo-way ANOVADecision treeCUSUM chartMultiple regression analysis

References

Author: Max Bachmann