I am currently doing research on the software supply chain. To be more specific, for example, for TensorFlow, its supply chain has multiple layers: the core of the supply chain is TensorFlow, while the other layers are the projects that directly import and indirectly import TensorFlow.
Indirect import means that if project 'A' imports TensorFlow and 'A' has published a package 'PA', project 'B' imports this package, the relationship between 'TensorFlow' and 'B' will be called as an indirect import, as shown in the figure.
We wrote a script and analyzed all the projects in GitHub to built this supply chain. We found that the number of the layers in the supply chains is no more than five, no matter for TensorFlow, Pytorch, or Keras. The following figure is the supply chain of Pytorch.
Unfortunately, the above is only a superficial understanding of the structure of the supply chain. In addition, I am more curious about its structure from the perspective of functionality, e.g., is there a difference in the application areas of projects at different levels (e.g., most of the projects in layer 1 are about image classification and text classification, whereas layer 2 are tools. I have to admit this example is inappropriate but it can reflect what I want to know)? Eventually, I want to build the landscape of the machine learning supply chain like the following example Hadoop.
Do you have a general idea of what kind of projects are on each layer?