Research diaries #17: About sparseness of data and training neural nets
Typically a neural net is trained on large amounts of data. It will after training hopefully have good performance on this training set. Specifically, the tools for training a neural net are made in such a way that it tries to improve its performance on the training data that it is fed. If the performance on the training data is good then we would also like it to see how well it performs on data outside the training set. This is a so-called test set. It ensures that it actually generalizes to stuff that is not inside the training set.
My cat scoring high performance on being cute
If the neural net is trained very long on the training set it actually just learns all the cases in the training set. In essence it is memorizing all the elements in the set. If this happens it cannot do anything new. More practically, it will fail to perform well on the test set. So there is some sweet-spot where you let it learn the training set long enough that it performs well on both sets but not too long so that it only performs well on the training set.
This way of training neural nets crucially depends on the data. The data has to have a clear pattern for the neural net to be able to identify it. For example, if we want the neural net to classify handwritten numbers the data set is less than 100.000 images of single numbers with pixel size 28x28 MNIST. 100.000 images might seem like a lot but it is only a small number with respect to all possible pictures. Assuming the pixels can be black or white 2(28*28)=2784 which many times larger than the image set as 2^17 is already bigger than 100.000.
You can also view language like that. A sentence of words is a very specific collection of letters. If we would consider all combinations of letters of sentence length we would already get a number which is greater than all the particles in the (observable) universe ~10^80.
Returning to neural nets, during the training process it is trying to find a pattern. Given a fixed neural net architecture all the possible neural nets are much larger than the actual data set. From that perspective the optimized neural net is sparse in the collection of all neural nets. The more philosophical point here is that the creation of patterns is generically not random as the time required to generate a pattern using randomness or even finding a pattern with randomness is practically infinite with respect to the size of the universe.
Leave Research diaries #17: About sparseness of data and training neural nets to:
Read more #stem posts
Best Posts From mathowl
We have not curated any of mathowl's posts yet. But you can encourage our curation team to review posts by visiting them regularly and by referring other readers. Because we give priority to frequently read content.
More Posts From mathowl
- Research diaries #21: Open-source generative AI for videos with Comfy
- Espresso diaries #2: bug fixing the ROK
- Espresso diaries #1: Manual shots
- Research diaries #20: Builing another PC
- Research diaries #19: Unequal but not that unequal
- Research diaries #18: About inflation returning to normal levels
- Research diaries #17: About sparseness of data and training neural nets
- Research diaries #16: Reservoir computing, an AI which does not live inside a computer
- Research diaries #15: Going down rabbit holes
- Research diaries #14: An optimization puzzle
- Research diaries #13: statistical DAGs
- Research diaries #12: Berkin's paradox and why the worst restaurants are in the most touristy spots
- Research diaries #11: rearranging sequences, weirdness at infinity
- Research diaries #10: The real line, gaps in the rational numbers and the axiom of completeness
- Research diaries #9: types of convergence
- Research diaries #8: Weakening differentiability
- Research diaries #7: Regular versus chaotic motion
- Research diaries #6: Parallel in python
- Inventory management made fun :D
- Brofi is alive again ^^