Part 1 of 3: Input
Let’s take a little mystery out of AI. I promised early on to keep it high level. This won’t be any different, except it will get a little more technical. Don’t worry, I won’t dive deeply into the algorithms involved, a little mystery will remain.
Basic Technical Flow
- Input
- Process
- Output
Modified Technical Flow for AI with LLM
- Data Acquisition and Ingestion
- Data Preparation
- Processing (Neural Net with LLM component)
- Generated Response
- Output (channel)
What’s New?
Right from the start, we can see the process has doubled. This is technically true, however, as you’ll read below, it still fits nicely with the basic flow of input/process/output. So, what are new steps? Data acquisition and ingestion, data preparation and generated output. Let’s break that down.
Input – Updated for AI using Large Language Models
While data acquisition, ingestion and preparation are all new steps in our flow, there really are all part of the input.
Data acquisition is collecting the data. Sounds simple enough. This can be legacy data, capturing data at the source (think the collection of temperature data, etc.) This data is then ingested. This is the processing of getting the data acquired into a database or data store for future processing. While this is all part of input, in the technical world of AI, this has been split out due the importance of the data. An AI is only as smart as the data it was trained from. However, these steps only acquire the data and store it, or ingest it. There is however, a very important step, data preparation that we need to look at.
Data Preparation
Now that we’ve acquired our data and stored it, or ingested it, we need to turn it into something the AI can use. To begin that process, we need to understand where the data came from. This can be one of two ways.
- Raw Data – This can either be structured or unstructured.
- User Input – This can be the input you type or speak into your AI.
Why it’s important to know what type of data it is
We need to distinguish between raw and user data, because that will determine how this data is prepared.
Raw Data:
Raw data is input into something called machine learning. This is an entire field of it’s own and I’ll explain that in a later article. For now it’s enough to know that machine learning is used to clean the data up, transform and refine it so the AI will understand what it is.
User Input:
The second type of data is user input. Unlike above, this uses something called NLP or natural language processing. Like above, this too is an entire field of it’s own. The purpose here to to analyze the the syntax and semantics and prepare the data for our AI accordingly.
Quick Summary of the Input Step
As I’ve shown above, what was originally called the input step of our basic flow is still there! For a high level understanding, that input step remains. It’s now further broken down into data acquisition and ingestion, followed by determining the data type, raw or user input. Some may be thinking, wait a minute. This isn’t just input. You wrote about machine learning and natural language processing. Processing is right there in the name! How can this still be input?! I would respond by saying, good question. You’re paying attention. Being a high level of the flow, the ‘result’ of machine learning and natural language processing is still considered the input into the neural network, or the brain of the AI. So for those that deep dive the words, you’re right, but think of the result. The result is still going to be the input AI requires.
Summary of Part One – Input
To keep the article short, we’ll stop here. What I hope you take away from this is an understanding that our traditional technical flow or input/output/processing still remains. AI does complicate things, but the new steps still fit nicely into this basic flow type. You can still view above as the input step to the flow. However, we’ve seen that input gets a little more complicated by splitting it up into acquisition and ingestion and prepping. We’ve also learned that prepping depends on data type, raw or user input. In the next article we’ll talk about processing the data.
