*This is a guest post written by Denis Rothman, writer of one of the very first word2matrix embedding solutions and the author of **Artificial Intelligence By Example**.*

We are 7.5 billion people breathing air on this planet. In 2050, there will be about 2.5 billion more. All of these people need to wear clothes. Just this activity involves classifying data into subsets for industrial purposes.

**Grouping** is a core concept for any field of production. Production requires grouping to optimize production costs. Imagine not grouping and delivering one t-shirt at a time from one continent to another instead of **grouping** t-shirts in a container and grouping many containers (not just two on a ship).

A brand of stores needs to replenish the stock of clothing in each store as the customers purchase their products. In this case, the corporation has 10,000 stores. The brand produces jeans, for example. Their average product is a faded jean. This product sells a slow 50 units a month per store. That adds up to **10,000 stores x 50 units = 500,000 units** or **stock keeping unit** (**SKU**) per month. These units are sold in all sizes grouped into average, small, and large. The sizes sold per month are random.

The main factory for this product has about 2,500 employees producing those jeans at an output of about 25,000 jeans per day. The employees work in the following primary fields: cutting, assembling, washing, laser, packaging, and warehouse.

The first difficulty arises with the purchase and use of fabric. The fabric for this brand is not cheap. Large amounts are necessary. Each pattern (the form of pieces of the pants to be assembled) needs to be cut by wasting as little fabric as possible.

Imagine you have an empty box you want to fill up to optimize the volume. If you only put soccer balls in it, there will be much space. If you slip tennis balls in the empty spaces, space will decrease. If on top of that, you fill the remaining empty spaces with ping pong balls, you will have optimized the box.

In the apparel business, if 1% to 10% of fabric is wasted while manufacturing jeans, the company will survive the competition. At over 10%, there is a real problem to solve. Losing 20% on all the fabric consumed to manufacture jeans can bring the company down and force it into bankruptcy.

Optimization of space through larger and smaller objects can be applied to cutting the forms which are patterns of the jeans, for example. Once they are cut, they will be assembled at the sewing stations.

The problem can be summed up as:

- Creating subsets of the 500,000 SKUs to optimize the cutting process for the month to come in a given factory
- Making sure that each subset contains smaller sizes and larger sizes to minimize loss of fabric by choosing six sizes per day to build 25,000 unit subsets per day
- Generating cut plans of an average of three to six sizes per subset per day for a production of 25,000 units per day

In mathematical terms, this means trying to find subsets of sizes among 500,000 units for a given day.

The task is to find six well-matched sizes among 500,000 units. At this point, most people abandon the idea and just find some easy way out of this even if it means wasting fabric.

That is not the right way to look at it at all. The right way is to look exactly in the opposite direction. The key to this problem is to observe the particle at a microscopic level, at the **bits of information** level. This is a fundamental concept of machine learning and deep learning. Translated into our field, it means that to process an image, ML and DL process pixels. So, even if the pictures to analyze represent large quantities, it will come down to small units of information to analyze.

You do not need to analyze the individual positions of each data point in a dataset but use the probability distribution.Today, Google, Facebook, Amazon, and others have yottabytes of data to classify and make sense of. Using the word **big** data does not mean much. It is just a lot of data, and so what?

To understand that, assume going to a store to buy some jeans for a family. One of the parents wants a pair of jeans, and so does a teenager in that family. They both go and try to find their size in the pair of jeans they want. The parent finds 10 pairs of jeans in size **x**. All of the jeans are part of the production plan. The parent picks one at **random**, and the teenager does the same. Then they pay for them and take them home.

Some systems work fine with random choices: random transportation (taking jeans from the store to home) of particles (jeans, other product units, pixels, or whatever is to be processed) making up that fluid (a dataset).

Translated into the factory, this means that a stochastic (random) process can be introduced to solve the problem.

All that was required is that small and large sizes were picked at random among the 500,000 units to produce. If six sizes from 1 to 6 were to be picked per day, the sizes could be classified as follows in a table:

**Smaller sizes= S={1,2,3} **

**Larger sizes=L=[4,5,6}**

Converting this into numerical subset names, **S=1** and **L=6. By selecting large and small sizes to produce at the same time, the fabric will be optimized:**

Size of choice 1 |
Size of Choice 2 |
Output |

6 | 6 | 0 |

1 | 1 | 0 |

1 | 6 | 1 |

6 | 1 | 1 |

Doesn’t this sound familiar? It looks exactly like the vintage FNN, with 1 instead of 0 and 6 instead of 1. All that has to be done is to stipulate that subset **S=value 0**, and **subset L=value 1**; and the previous code can be generalized.

If this works, then smaller and larger sizes will be chosen to send to the cut planning department, and the fabric will be optimized. Applying the randomness concept of Bellman’s equation, a stochastic process is applied, choosing customer unit orders at random (each order is one size and a unit quantity of 1):

w1=0.5;w2=1;b1=1 w3=w2;w4=w1;b2=b1 s1=random.randint(1,500000)#choice in one set s1 s2=random.randint(1,500000)#choice in one set s2

The weights and bias are now constants obtained by the result of the XOR training FNN. The training is done; the FNN is simply used to provide results. Bear in mind that the word **learning** in machine learning and deep learning does not mean you have to train systems forever.

In stable environments, training is run only when the datasets change. At one point in a project, you are hopefully using deep **trained** systems and are not just stuck in the deep **learning** process. The goal is not to spend all corporate resources on learning but on using trained models.

For this prototype validation, the size of a given order is random. 0 means the order fits in the S subset; 1 means the order fits in the L subset. The data generation function reflects the random nature of consumer behavior in the following six-size jean consumption model.

x1=random.randint(0, 1)#property of choice:size smaller=0 x2=random.randint(0, 1)#property of choice :size bigger=1 hidden_layer_y(x1,x2,w1,w2,w3,w4,b1,b2,result)

Once two customer orders have been chosen at random in the right size category, the FNN is activated and runs like the previous example. Only the result array has been changed because no training is required. Only a yes (1) or no (0) is expected, as shown in the following code:

#II hidden layer 1 and its output def hidden_layer_y(x1,x2,w1,w2,w3,w4,b1,b2,result): h1=(x1*w1)+(x2*w4) #II.A.weight of hidden neuron h1 h2=(x2*w3)+(x1*w2) #II.B.weight of hidden neuron h2 #III.threshold I,a hidden layer 2 with bias if(h1>=1):h1=1; if(h1<1):h1=0; if(h2>=1):h2=1 if(h2<1):h2=0 h1= h1 * -b1 h2= h2 * b2 #IV. threshold II and OUTPUT y y=h1+h2 if(y<1): result[0]=0 if(y>=1): result[0]=1

The number of subsets to produce needs to be calculated to determine the volume of positive results required.

The choice is made of six sizes among 500,000 units. However, the request is to produce a daily production plan for the factory. The daily production target is 25,000. Also, each subset can be used about 20 times. There is always, on average, 20 times the same size in a given pair of jeans available.

Each subset result contains two orders, hence two units:

**R=2 x 20 = 120**

Each result produced by the system represents a quantity of 120 for 2 sizes.

Six sizes are required to obtain good fabric optimization. This means that after three choices, the result represents one subset of potential optimized choices:

**R = 120 x 3 subsets of 2 sizes= 360**

The magic number has been found. For every 3 choices, the goal of producing 6 sizes multiplied by a repetition of 20 will be reached.

The production per day request is 25,000:

**The number of subsets requested = 25000/3=8333. 333 **

The system can run 8333 products as long as necessary to produce the volume of subsets requested. In this case, the range is set to 1000000 products because only the positive results are accepted. The system is filtering the correct subsets through the following function:

for element in range(1000000): if(result[0]>0): subsets+=1 print("Subset:",subsets,"size subset #",x1," and ","size subset #",x2," result:",result[0],"order #"," and ",s1,"order #",s2) if(subsets>=8333): break

When the 8333 subsets have been found respecting the smaller-larger size distribution, the system stops:

Subset: 8330 size subset # 1 and size subset # 0 result: 1 order # and 53154 order # 14310 Subset: 8331 size subset # 1 and size subset # 0 result: 1 order # and 473411 order # 196256 Subset: 8332 size subset # 1 and size subset # 0 result: 1 order # and 133112 order # 34827 Subset: 8333 size subset # 0 and size subset # 1 result: 1 order # and 470291 order # 327392

This prototype proves the point.

Two main functions, among some minor ones, must be added:

- After each choice, the orders chosen must be removed from the 500,000-order dataset. This will preclude choosing the same order twice and reduce the number of choices to be made.
- An optimization function to regroup the results by trained optimized patterns, through an automated planning program for production purposes must be added.

Application information:

- The core calculation part of the application is less than 50-lines long
- When a few control functions and dataset tensors are added, the program might reach 200 lines maximum
- This guarantees easy maintenance for a team

*If you find this article interesting, check out Denis Rothman’s **Artificial Intelligence By Example**. This book serves as a starting point for you to understand how AI is built, with the help of intriguing examples and case studies.*