Machine learning, artificial intelligence (AI), and other advanced technical concepts are not new to Praecipio Consulting's engineers. In their spare time, they like to experiment, solve problems, and test ideas in a variety of areas. And the way they see it is they succeed, or they learn. Praecipio Labs, formalized in 2017, has really been around since the beginning. Whether it was a problem that needed solving, or it was just innate curiosity, Praecipio Labs was there to dig in and find a solution - or just have some fun! Most of the team's activity includes a variety of topics that may not be beneficial today, but are interesting nonetheless - like AI, improving advanced systems configurations, and much more.
So, who's the fearless leader?
Christopher Pepe, the Dragon of the West, oversees Praecipio Consulting's more technical endeavors. Having studied neural networks in college for robotic control systems, he has recently revisited the topic to enjoy some of the advancements that have been made. As artificial intelligence is held up as the greatest thing the universe has ever known it seemed like the right time to jump back in. Together with a ragtag team of interested engineers, Christopher is leading the Praecipio Consulting machine learning think tank to see if they can converge on a future that is better than a bag-o-if statements.
Some of Pepe and team's early projects included the Jira Toaster and Beer Me Jira. That was just the beginning. Today, Praecipio Labs is beginning to experiment with applications for machine learning.
Pepe's recent experiment with text generation with neural networks is one of the many learning opportunities. We explored with Pepe his most recent experiment.
Using neural networks to generate text is certainly not novel but is a fun exercise. It is also a fairly simple problem since there isn't much preprocessing to create training data. (In most data science and AI exercises one spends the majority of the time formatting and processing data.) The idea here is simple, we want to train a neural network on a given body of text (corpus) so that it can generate similar text. In this way one can generate text in the voice of the author.
I took on this experiment to build an AI that could speak in my voice. As with any worthwhile endeavor, I learned more than I accomplished.
People approach this problem with either character based or word based inputs. Character based means that if your input is "Hi there Bob" then the network is fed "H", "i", " ", "t", "h", "e", "r", "e" and so on. If word based then the network is fed "Hi", "there", "Bob." Character based approaches allow the network to do cool things like create new words. Word-based is an easier approach and in our approach was the more successful choice with less training. Our approach used a wide-ish, shallow network instead of a deep network. That means our model memorized the corpus rather than learning the meaning of it.
In all training problems, you need a large set of training data that has inputs and corresponding outputs. For instance to build a tweet sentiment model one needs 1+ million tweets with an associated sentiment label (0=mad, 1=annoyed, 2=flat, 3=happy, 4=overjoyed) and the quality of that training data determines how good your model is. That's a big task to build on a novel data set.
On the other hand training, a text generator is a simple process. The input is some number of words and the output is the next word in the corpus. Using this paragraph as an example one input might be "On the other hand training a text generator is a" and the associated output value would be "simple." The next input would be "the other hand training a text generator is a simple" and the output would be "process."
Stepping in this way over the corpus a number of times the network eventually converges to an acceptable point.
To get anywhere with deep learning one really needs to train on GPUs. There are some nuances to using GPUs on AWS, and to writing code that will take advantage of multiple GPUs. This project allowed us to figure out a successful approach to using AWS for training our models.
Intro to Recurrent Neural Networks
My intro to neural networks was in college using plain old fully connected, feed forward networks for non-linear control systems (it was a bit more state of the art back then but would still be a fun project). Life and career keep me focused elsewhere and I've only recently jumped back in. There are a dizzying array of new architectures, and approaches to interesting problems to solve. I have long been interested in time series problems and have been focused on recurrent neural networks. This was a simple but non-trivial challenge to write from scratch.
After 84 epochs the loss function was minimized to 0.1195. Providing a random sample of text from the corpus as a seed the network produced this output:
to merge completed stories and bugfixes as quickly as possible into develop so that integration and quality testing can begin sooner. merging a feature branch into the develop branch should start a build and potentially deployment to an integration testing environment. deployments can be manually triggered, but the if automatic then one is always testing the tip of the develop branch. integration and qa testing should always be occurring on the develop branch. feature branches are where developers write their code. this keeps their work isolated from other developers until it is stable. stories are sized so that a feature branch lives for 1 to 3 days. many feature branches make up a deliverable epic. features are integrated into develop by way of pull requests. pull requests are gated merges. before the feature branch can be merged into develop the code must go through a review. this can be forced with permissions or by a convention that the team uses. the merged pull request also provides a single commit to use for cherry picking features. cherry picking process again note that this process circumvents best practices
The source corpus content is listed here. You can see that the network simply memorized and regurgitated the source.
The Develop branch is the shared branch that developers use for integration testing and to accumulate features from the product backlog. The aim is to merge completed stories and bugfixes as quickly as possible into develop so that integration and quality testing can begin sooner. Merging a feature branch into the develop branch should start a build and potentially deployment to an Integration Testing environment. Deployments can be manually triggered, but the if automatic then one is always testing the tip of the develop branch. Integration and QA testing should always be occurring on the develop branch. Feature branches are where developers write their code. This keeps their work isolated from other developers until it is stable. Stories are sized so that a feature branch lives for 1 to 3 days. Many feature branches make up a Deliverable/Epic. Features are integrated into develop by way of Pull Requests.
Some generated samples varied or had incomplete sentences but overall this model did an excellent job of recreating the source document.
I, and the team, look forward to sharing more experiments and tests like this one soon.