Skip to main content

Tokens

Tokens in C++

I want all of you to take a moment to think back to your school days—specifically, your 8th-grade biology class. Do you remember studying about cells? Your teacher likely said something like, "Cells are the smallest unit of life, and every living organism is made up of these tiny building blocks." Without cells, life wouldn’t exist, right?

Well, programming is kind of similar! In C++, tokens are like the "cells" of a program. They’re the smallest building blocks of your code. Just like cells come together to form tissues, organs, and eventually a living organism, tokens work together to build a functional program. Cool, isn’t it?

If you're still not getting it, let me give you another example.

Imagine you're writing a sentence in a language like English or Hindi. Every meaningful word, punctuation mark, or symbol in that sentence can be compared to a token in C++. Let's see an example:

"You are learning C++ from LearnYard!"

In this sentence, the tokens would be:

{You, are, learning, C++, from, LearnYard, !}

Each word, symbol, or punctuation mark plays a role in conveying the meaning of the sentence. Similarly, in C++, tokens are the smallest elements that carry meaning in a program. They are the fundamental building blocks that help the computer understand your instructions.

Now, let’s look at C++ example. Suppose you write this line of code:

cout << "Hello World" << endl;


Here, the tokens would be:

{cout, <<, "Hello World", <<, endl, ;}

Each of these tokens has a specific purpose which we will get to know as we move further.

If you're unsure how sentences are divided into tokens or how they are categorized, don't worry—we’ll cover that in detail as you continue reading. For now, think of tokens as the smallest pieces of your code, separated by spaces, symbols, or special characters. Each piece that carries meaning, such as a keyword, name, or operator, is a token. Keep reading to learn more about keywords, operators, identifiers, and special characters.


Before diving into what are the different types of tokens in C++,Let’s talk about something simple yet important in programming—character sets.

Character Set in C++

Think of a character set as a collection of letters, digits, symbols, and special characters that the language understands and uses to create programs. Just like in English, where we have letters, numbers, and punctuation marks to form words and sentences, C++ has its own set of characters to create programs.

We are going to divide the character set into different categories:

Letters

C++ recognizes all the uppercase English letters (A-Z) and lowercase English letters (a-z). These are used to name variables, functions, and other identifiers in your program.

Digits

The digits 0 to 9 are part of the character set and are used to work with numbers.

Special Characters

These are the symbols used for various operations and punctuation in your code. They are further divided into categories:

    • Mathematical Operators: Symbols like +, -, *, and / are used for calculations.
    • Punctuation: Symbols like ',', ';', and ':' help structure your program by separating or ending parts of the code.
    • Other Special Characters: Symbols like @, #, &, and | serve unique purposes, such as referencing memory, combining conditions, and more.

Escape Sequences

Now, these are really interesting. Escape sequences are combinations of characters that start with a backslash (). They tell the program to perform specific actions. For example:

    • \n creates a new line.
    • \t adds a tab space.
    • \\ prints a backslash.

NOTE: An important thing to note here is that escape sequences like \n, \t, and \\ are treated as one single character by the program, even though they appear to be made up of two characters. This is important because it helps the program interpret these escape sequences in a special way, rather than just as normal text.

Open your editor like VsCode or Atom or use our editor: playground.learnyard.com and start typing this code

#include <iostream>
using namespace std;

int main() {
    cout << "Welcome to\tLearnYard\t";
    cout << "for learning C++\n";
    cout << "I hope you are enjoying your learning experience till now";
    return 0;
}

I hope you are now clear with the structure of the code let's see what is the output:

Welcome to    LearnYard    for learning C++
I hope you are enjoying your learning experience till now

Without knowing the character set, you might struggle to communicate your instructions to the computer correctly.

So, think of the character set as your toolbox—knowing what tools (characters) are in there and how to use them is the first step to mastering programming!

If you're interested in learning about other escape sequences, feel free to explore this section. Otherwise, you can skip ahead to the next section by scrolling past it.

Other Important Escape Sequence in C++

b: Backspace Escape sequence

This escape sequence represents a backspace. It moves the cursor one position backward, thus removing the last character from the output if it is supported by the system.Consider the following example:

#include <iostream>
using namespace std;

int main() {
    cout << "Hello\b World!";
    return 0;
}

Output:

Hell World

Here what happens is, the \b backspace removes the last character before it(o in "Hello"), thus the remaining text becomes Hell World.

\': Single quote character Escape sequence

This sequence in C++ is used to represent a single quote character(') inside a character or string literal. This is important because a single quote is also used to delimit character literals, so ' ensures the compiler interprets it as part of the literal rather than a delimiter.Consider the following example:

#include <iostream>
using namespace std;

int main() {
    char singleQuote = '\'';
    cout << "The character is: " << singleQuote << endl;
    cout << "She said, \'Hello!\' to everyone." << endl;
    return 0;
}

The output here would be:

The character is: '
She said, 'Hello!' to everyone.

This is important as it allows us to include the single quote inside the string literals without confusing the compiler and helps in avoiding ambiguity when dealing with special characters.

": Double quotation mark Escape sequence

This is used to represent double quote(") inside a string literal, since double quotes are used to define the boundaries of the string, using \" ensures the compiler interprets the double quotes as part of the string rather than as the boundary or the ending of the string.Consider the following example:

#include <iostream>
using namespace std;

int main() {
    cout << "You are studying \"C++\" from \"LearnYard\"." << endl;
    return 0;
}

Output:

You are studying "C++" from "LearnYard".

This helps in avoiding syntax errors, without this the compiler would think the string ends at the first unescaped double quotes.

\v: Vertical tab Escape sequence

This escape sequence represents a vertical tab. It moves the output to the next vertical tab stop, which typically is a vertical spacing in the console or ouptut device. Check out this coding example:

#include <iostream>
using namespace std;

int main() {
    cout << "Line 1\vLine 2\vLine 3" << endl;
    return 0;
}

Depending on the environment, it might display something like:

Line 1
       Line 2
              Line 3

?: Question mark Escape sequence

This escape sequence \? is used to represent the question mark character(?) inside a string or character literal. This is useful as the question mark is a special character in some contexts(such as ternary operator or conditional expressions),and using \? ensures the compiler treats it as regular character.Consider the following example:

#include <iostream>
using namespace std;

int main() {
    cout << "Is this a question\?" << endl;
    return 0;
}

Ouput:

Is this a question?

Using this \? helps in avoiding ambiguity by adding them into the string without confusing the compiler.


Let's now dive into several types of tokens each of which serves a specific purpose in the syntax of C++.

Types of tokens in C++

The main types of token in C++ are:

We will be discussing about every category of tokens one by one

Identifiers

Imagine you’re a librarian organizing books in a library. Each book needs to be labeled with a unique tag, such as "Book1", "Book2", or even "SciFi_001", to make it easier to find later. When a student borrows a book, you provide them with the book's label along with details like the student's name, the book's category, and the date it was issued.

In programming, these labels or tags we assign to variables, classes, and functions are known as identifiers.

Identifiers in programming work just like the labels in a library: they help you easily find and refer to specific parts of the code. By giving meaningful names to variables, functions, classes and other program elements, we can better organize and manage our programs, making them more readable and easier to maintain.

Let's understand the use of identifiers and how they work with an example:

Imagine you're sitting at the librarian’s desk, and a student comes to borrow a book. You enter the student's details—such as their name, age, and phone number—into the system. In this case, first_name, last_name, age, and phone_number are identifiers used to store and reference the student's data. Once this format is set up, if the student’s phone number changes, you don’t have to update it manually in every record. Instead, you only need to update the value of the phone_number identifier, and the change will automatically reflect wherever it's used in the system. This is how identifiers help manage and organize data efficiently in programming.

Now, here's how the code for it would look like in C++:

#include <iostream>
#include <string"
using namespace std;

int main(){
    // Defining the identifiers
    string first_name = "William";
    string last_name = "Jane";
    int age = 25;
    string phone_number = "123-456-7890";
    
    // Displaying the student's details:
    cout<<"Student's Name: "<<first_name<<" "<<last_name<<endl;
    cout<<"Phone_Number: "<<phone_number<<endl;
    
    // Updating the phone number
    phone_number = "987-345-2345";
    
    cout<<"Updated Phone Number: "<<phone_number<<endl;
    return 0;
}

The output for the following would be:

Student's Name: William Jane
Phone Number: 123-456-7890
Updated Phone Number: 987-345-2345

NOTE: If you're not quite clear on why we're defining specific data types when creating an identifier, or if you're unsure about strings at this point, don't worry! You'll gain a deeper understanding of these concepts as we progress further in the course. As we move forward

Identifiers are like tags or labels. They allow you to:

  • Improves Code Readability: When you assign meaningful names to identifiers, it makes your code easier to understand.For example: Instead of naming a variable x, naming it first_name to store the first_name of the person gives more idea of it's purpose and type of data it is storing.
  • Reuse the reference: Once you define an identifier, you can use it throughout your program to work with the associated value or function(which we will discuss in more detail in upcoming articles).For example:
string first_name = "William"; // 'first_name' is an identifier storing a specific name
cout << "Hello, " << first_name << "! Welcome to the program." << endl;
cout << "We hope you enjoy your time here, " << first_name << "." << endl;

Here the value stored in first_name is reused in both the sentences during the compile time.

There are specific rules you need to follow when naming identifiers, and we’ll dive into these rules later in the upcoming article when we discuss variables.

Here are examples of certain valid and invalid identifiers

  • VALID IDENTIFIERS: _name, myAge, userName, refresh_token
  • INVALID IDENTIFIERS: first name(contains space), #name(contains a special character), 2num(starts with a digit), string(a reserved keyword).

NOTE: C++ is case-sensitive language. This means that first_name , first_Name , First_Name would be treated as three different identifiers.

Now, let's discuss about Keywords.

Keywords

I’m sure many of you have played a video game at some point in your life. In video games, certain words like "Play" and "Pause" hold specific meanings—they are reserved actions with predefined functions. In games like Fortnite (Epic Games), for example, you cannot name yourself "Play" or "Pause" because these are reserved terms. Naming yourself with these reserved words could interfere with the game's mechanics or functions.

However, not all games have such restrictions. For instance, games like Minecraft and Roblox are more customizable, allowing players more freedom to choose their character names without interference. This depends on the specific game’s policies.

Nevertheless, we can agree that there are terms in programming that are reserved and cannot be used as identifiers—these are called keywords.

Keywords in C++ are special words that the language has reserved for specific purposes. These words have predefined meanings, and the compiler recognizes them as commands or instructions. Since they already serve fixed roles in the language, you cannot use them as names for variables, functions, or any other identifiers in your program.
For example, int declares a whole number, and return sends a value back from a function.

int return = 5;  // ❌ Error: 'return' is a keyword : cannot be used as variable name 

Think of keywords as magic words in a secret programming language. They have fixed roles and can’t be changed or repurposed.

In C++ keywords can be classified into various cateogries based on the purpose. They are classified as:

Data Keywords

  • These keywords are used to define the type of data a variable or object can hold.
    For example: void, char, string, bool, double.
    We will discuss each of them in great detail in the upcoming article, so if you don't understand them now, worry not.

Control Flow Keywords

  • These keywords are used to control the flow of execution in a program, such as loops and conditional statements.
    For example: if, else, switch, case, default, for, while, break, continue.
    We will discuss all of them in great detail when we cover control statements. For now, just remember that they are keywords.

There are other categories for the classification of keywords, which we will cover as we progress further.

C++ has around 95 reserved keywords. While we’ll explore these keywords in detail later.

NOTE: Keywords in C++ are case-sensitive and always written in lowercase.For example, int is a valid keyword, but Int or INT is not recognized as a keyword because C++ treats them differently due to case sensitivity.

Let's move to Constants now.

Constants

Imagine you're managing a library, and you want to keep track of important details, like the maximum number of books a shelf can hold. These values are fixed and should never change during the program. In programming, constants are like that—they represent values that can't be altered once they are set.

Now since you are managing a library and you have a rule that maximum number of books that can be stored on a shelf is 5000.This is a fixed rule—nothing about it should change. Now, if someone accidentally changes this number to 1000 or 10,000, it could cause problems, right? Maybe it would affect how much space the library has or how the system behaves.

So, how do we prevent this from happening in our program?

This is where constants come into play. When you use a constant in your code, you are locking in a value that cannot be changed by accident during the program’s execution. It’s like putting that maximum number of books (5000) in a vault. Once it’s inside the vault, nobody can mess with it, no matter what happens in the rest of the program.

How to define constants in C++?

In C++, you can define constants in two main ways:

  1. Using the const keyword

The const keyword is the most common way to define constants. It allows you to create constants of any data type—whether it’s an integer, float, boolean, or string. Once a value is assigned to a const variable, it cannot be changed.

Syntax:

  const data_type constant_name = value;

Example:

  const float PI = 3.14159;   // The mathematical constant Pi
  const string GREETING = "Hello, World!";  // A fixed greeting message

If you're not fully clear on what terms like float, string, or the meaning of this statement just yet, don't worry! We'll explain all of that in the upcoming article.

NOTE: Defining and declaring a constant at the same time is essential. For instance:

const int PI

When you leave the const variable unitialised, the program will throw compiler error.This happens because a const variable is meant to have a fixed value that cannot be changed after its initial assignment. Without an initial value, the compiler cannot enforce this rule, which defeats the purpose of using a constant in the first place.

If the terms like initialization, definition, or declaration seem unclear right now, don’t worry! We’ll break them down and explain them thoroughly in the next article.

  1. Using the #define Preprocessor Directive

Before discussing about #define directive, let's first understand

What a preprocessor is ?

In simple terms, the preprocessor is like a helper that gets your code ready before the compiler takes over and converts it into a working program. Think of it as setting the stage before a play begins—it ensures that all the props and actors (in this case, code dependencies) are in place and ready to go.

One of the most common task performed by preprocessor is including header files. For example in previous article we included

  #include <iostream>

This line tells the preprocessor to include the iostream library in your program. The iostream library provides essential tools like cin for taking input from the user and cout for displaying output. Without including this library, the compiler wouldn't understand what cin and cout mean, leading to errors.

So, when the preprocessor encounters #include <iostream>, it essentially copies all the necessary code from the iostream library and "pastes" it into your program before the compiler begins its work. This ensures the compiler has everything it needs to process input/output operations.

What is #define directive ?

define directive allows you to define symbolic constants or macros(which will be discussing later). These constants are then replaced with their defined values before the code is compiled.This means that instead of having a fixed value in your code, you can use a symbolic name (like MAX_BOOKS) to represent a value, and the preprocessor will replace the name with the value during the preprocessing stage.

Syntax for #define:

  #define CONSTANT_NAME value

Ex Let's consider an example

Before preprocessing:

#include <iostream>
#define MAX_BOOKS 5000

int main() {
    std::cout << "Value of Maximum number of Books to be kept into shelves: " << MAX_BOOKS << std::endl;
    return 0;

During the preprocessing, preprocessor sees the #define directive and replaces every occurence of MAX_BOOKS in the code with 5000. This subsitution happens before the code is passed to the compiler for the actual compilation.

After Preprocessing:

#include <iostream>

int main() {
    std::cout << "Value of Maximum number of Books to be kept into shelves: " << 5000 << std::endl;
    return 0;
  }

NOTE: The #define directive doesn’t create a variable or store a value like a normal constant—it just does a simple text replacement before the compilation starts.Now since now we are clear with what constants are in c++ and how to define them let's move to special symbols .

Special Symbols

In C++, special symbols are characters that have a specific role in how the program works. They're used to perform certain tasks like ending statements, organizing code, or connecting different parts of the program. Let's consider certain symbols which are important.

Semicolon(;) : The semicolon is like a period in a sentence. It tells the computer, "This is the end of this statement." Without it, the program won’t know where one statement ends and the next one begins.

cout<<"Hello World"; // semi-colon marks the end of the sentence

Square Brackets([]): Square brackets are used when you are working with arrays which will be seeing later.

date_type array[size]

Curly Braces({}): Curly braces are used to group multiple lines of code together. This is important when you have a block of code that should run together, like inside loops or functions.

int main(){
  cout<<"Hello World";
  return 0;
}

Doube-Quote("): They are used to enclose strings within the code. When you want to store or print text, you use double quotes.

std::cout<<"LearnYard"; // "LearnYard" is a string enclosed within double quotes.

Single-Quote('): They are used for character literals(single character). A character is different from string because it holds just one character.

char gender='F'; // Here 'F' is the character literal enclosed in single-quotes.

There are other special characters also but we will see them along the way of the course.

Now, let's move to Strings.

Strings

Imagine you have a bunch of letters or characters that you want to put together, like the word "Hello". A string is simply a way to store those letters in C++. So, a string is just a collection of characters that forms a word or sentence.

For example, if you want to store the word "Hello", you can use a string in C++ like this:

string my_string = "Hello";

This is saying: "I’m creating a string called myString, and it will hold the word Hello."Now, here is something that might confuse you a little: In C++, a string is not just like a regular number int or character char that you might be used to. It's not the basic built-in thing in C++.

They are actually part of something called Standard Template Library(STL). This library is a collection of special tools C++ gives you to make your life easier when programming. Think of STL as a box of useful tools.

Here are certain more examples of defining a string

string user_name = "william_jane_1405"; // the string enclosed in double quotes 
string full_name = "William Jane";

We will dive deeper into strings and their functionality within the Standard Template Library (STL) in the upcoming articles.

Last cateogory of tokens are operators.

Operators

In C++, operators are special symbols used to perform operations on variables, constants, or expressions (referred to as operands). They can be thought of as tools that allow you to manipulate values and make decisions in your code.Operators are divided widely into three categories, which is grouped based on the number of operands they work with and the type of operations they perform.The main categories are:

  1. Unary Operators(1 operand)
  2. Binary Operators(2 operand)
  3. Ternary Operators(3 operand)

Unary Operators

These operators work with only one operand. They are typically used to modify or operate on single variable.Let's consider certain examples:

  • Increment Operator: Increases the value of the operand by 1.
// Increment Operator : Increases the value of the operand by 1. 
int a = 5;
a++; // a now becomes 6
  • Decrement Operator: Decreases the value of the operand by 1.
int b = 10;
b--; // b now becomes 9

Binary Operators

These operators work with two operands. They perform various operations on two values. Commonly used binary operators are + : addition, - : substraction, == equal to and many more.

Look at this example:

int a = 5;
int b = 10;
int result = a+b; // the result would be 15

Binary operators are further divided into various categories like arthmetic operators, logical operatos, comparison operators which we will be studying in detail in upcoming articles.

Ternary Operators

The ternary operator is the only operator in C++ that works with three operands. It’s also called the conditional operator, and it’s often used as a shorthand for simple if-else statements.We will be discussing about them in detail once we are done with understanding control statements.

Look at this example:

#include <iostream>
using namespace std;

int main() {
    int a = 10, b = 20;

    // Using ternary operator to find the maximum
    int max = (a > b) ? a : b;

    cout << "The maximum of " << a << " and " << b << " is: " << max << endl;

    return 0;
}

If you don’t understand the code right now, don’t worry! We’ll explain ternary operators in detail when we talk about control flow in C++.

NOTE: Comments are used to explain the code and are ignored by the compiler, but they are still considered tokens during compilation, with each comment being treated as a single token.


NOTE: In C++ tokens are separated by spaces, punctuation marks, and sometimes newline. These elements act as delimiters that help the compiler identify where one token ends and another begins.

Here is an exercise for you to identify how many tokens are there in this C++ code and also categorise them into their respective category.

#include <iostream>
#include <string>
using namespace std;

int main(){
    // Declaration of all the user details
    string user_name = "josh_spector_345";
    int score = 452;
    
    // Display the user_name
    cout<<"User Name of the individual is: "<<user_name<<endl;
    
    // Display the score
    cout<<"Score of this user name is:"<<score<<endl;
    
    // Multiplying the score by 2
    score = score*2;
    
    return 0;
    
    
}

Hint: We start dividing each line of code written above like this:

Similarly, we will divide further other lines also into tokens like check this out:

Just like these two cases above you can divide each and every line of code into tokens.


Conclusion

To sum up, in this article, we’ve learned about tokens in C++, which are the basic building blocks of a program. We covered different types of tokens like identifiers, keywords, and constants, and how each of them helps your program work properly. Understanding these basic parts is key to writing clean and effective code.

In the next article, we’ll talk about variables. Variables are similar to identifiers but they can store and change values in your program. So, stay with us as we move on to the next important topic in C++ programming!