A brief history of Artifical Intelligence

Where are we now and how did we get here?

Quest for creating a machine that can think like us - people, instead of following predefined set of instructions took a long time ~ 70 years.

History is both complicated and simple. Simple because we can easily trace back the roots of the current state of AI to all the back to Alan Turing's Turing test in 1950. Complicated because several researchers have built technologies over time that has brought us to the current state of Large Language Models which is the cutting edge of Artificial Intelligence now in 2026. The path does not seem like a straight highway, it was a messy web of competing ideas, sudden dead ends and unexpected breakthroughs.

Simple View (The Vision) 1950 Turing asks "Can machines think?" -> 2026: LLMs mimic human thought.

Complicated View (The Reality) Symbolic AI -> AI Winters -> Neural Networks -> Big Data -> GPUs -> Transformers -> Large Language Models

The AI Begins

It all started with the desire of making a machine that can think. And Turing test was about can you tell if the answers are coming from a machine or a human. If its a machine and if you can be convinced that its from a human, then the machine has achieved thinking. hard to agree but that was the idea.

But how do we make machines think - starting point was symbolic ai and expert systems. people created software programs in which they wrote a huge amount of if else conditions and captured the knowledge needed for a field. For example, they captured knowledge (words, statements) from an expert therapist and coded several if-else conditions in the program, so when you ask a question, program will use pattern matching and execute the closest conditional clauses and return response. but there was no real thinking. Naturally this was not scalable due to nuances in language and how knowledge is represented using language.

Birth of Neural Networks

In 1958, a researcher named Frank Rosenblatt built the "perceptron," the simplest possible neural network. It took some inputs (numbers), multiplied each one by a weight, added them up, and produced an output. The weights got adjusted as the perceptron saw more examples. But fellow researchers proved that one single neuron is not enough even to perform a trivial simple operation like XOR. Algorithm to make multiple neurons work together was not known.

Then came Back propagation to rescue in 1980. Look at the gap between 1960s and 1980. This period is commonly referred as AI winter in history. AI field did not have breakthroughs for like two decades, yet people did not give up.

Back propagation algorithm enabled multiple neurons stacked together and train by passing errors back to those layers, adjusting connection weight a little each time. This is still the core training algorithm today.

But computers were too slow still, so neural networks could not solve real problems. there were other methods in 2000s called statistical machine learning like Decision trees, Support vector machines, random forests, Bayesian networks which were solving real problems like spam filters, search ranking etc, but they all had limitations.

2012 seems to be year where things turned around for neural networks when GPUs were becoming fast enough and researchers found that with more compute and bigger datasets, neural networks could become more useful.

Major advancements happened in the form of Word2Vec in 2013 (google), Recurrent Neural Networks, Attention mechanisms in 2014 and later.

In 2002, I was in college and I was learning Neural Networks as a paper for 6 months and I dont remember what I learned or i did not learn anything to remember.

Attention is all you need

Major breakthrough happened with the now famous research paper - Attention is all you need in 2017 which introduced Transformers architecture written by research team in google. Transformers architecture enabled the pre-training process to utilize parallel processing capabilities of GPU efficiently and made it possible to train a model with large data set (whole internet for example).

Ironically google did not make a product out of the transformers architecture. A small silicon valley non profit research group (openai) took that transformer architecture, trained a large language model using that concept with the help of nvidia GPUs, created a user friendly way to interact with model and named it as Chatgpt.

Where are we now

Chatgpt grew to millions of active users in weeks, something that many did not see coming and changed the course of life for every human being on the planet.

Nvidia became the most valuable company in the world by market cap and people dared to use chatgpt over google for getting their questions answered.

Today the field is evolving faster than fashion, new tools being introduced every day, trillions of dollars being invested in building data centers to run models, countries competing with each other to build more powerful models.

History of AI is amazing to look back. Makes me realize that however impossible a task is, people with grit can make it possible in time.