Saturday, September 25, 2010

Writing A simple Verilog Module

   In this article I will explain how to write a simple Verilog module with the help of an example. Any Verilog design block can be seen from outside as a black box with a set of inputs and outputs.The designer decides the inputs and outputs of the black box. He also decides how the outputs are related to the inputs(the functionality of the code). For writing a code in Verilog for this black box we use the keyword "module". Consider the below block diagram or black box with inputs a and b and output c.
Now we form the following truth table for output c in terms of inputs a and b:

A
B
C
0
0
1
0
1
0
1
0
0
1
1
0
The truth table can be written in the form of an equation C= not (A or B) = ~(A | B).
This is a NOR gate and this is the functionality to be implemented in the above shown black box.

/*
File Name : norgate.v
Design Name : NorGate
Engineer : Vipin
Function : Implements a NOR gate.
Date : 24-09-10
*/

//declaration of the module "norgate".
module NORGATE
        ( a,   //First input to NOR
          b,   //Second input to NOR
          c    //Output of NOR
        );  //End of port list

//Input and output declarations:
input a;
input b;
output c;

//NOR gate - "gate1" is the gate name and first signal name is output the rest being inputs.
nor gate1(c,a,b);

//End of the verilog code.Specified by "endmodule"
endmodule

Now we will analyse this code part by part.
In Verilog the comments can be added in two ways:
1) Single line comments by " // ".
2)Multiple line comments - starts with " /* " and ends with " */ ".
The name of the black box is given after the keyword "module".In our case it is "NORGATE".
Next the inputs and output names are written inside the brackets ( ). Normally we write all the input names first and then the output names.But note that this is not a rule.Output names can be mixed with input names but for readability we generally follow the conventional way.
Once that is done we close the bracket and declare the signal names, in port list, as either input or output.
Now we can define the functionality of the black box. We use the NOR gate primitive available in Verilog for implementing the truth table given above.
Once the functionality is coded, we have to END the black box by using the "endmodule" keyword.

As far as the simplicity is concerned this is the most it can go.It cant be simpler than this. In my future articles I will give more examples explaining the other features and keywords available in Verilog.

Introduction to Verilog - a Hardware description language

   Digital systems are very complex systems and when we go down, in the abstraction level from Algorithmic level/block level to register level or gate level the complexity increases even more.The design of such systems are always a challenge to the design engineer.To aid the design process of such systems, Hardware description languages(HDL) were evolved.

   Some of the popular HDL's are VHDL,Verilog etc. In this blog I have created a separate section for Verilog which will discuss various features and HOW TO in Verilog. In another blog I have shared many tips and tricks related to VHDL programming.

   Verilog describes a digital system as a group of modules. All these modules have an interface to other modules and contain a description of its contents. A module represents a logical unit that can be described either by specifying its internal logical structure or by describing its behavior in a program-like manner. These modules are then interconnected with wires, allowing them to communicate with each other.Verilog is easy to learn compared with other HDL like VHDL, if you know at least one programming language like C.

Abstraction Levels:
   Verilog allows you to design your digital design at various abstraction levels.Suppose you have an algorithm to be implemented in the form of a digital circuit then you can easily use Verilog constructs to do the same without worrying about the underlying circuit.Similarly lower abstraction levels are also possible with Verilog.

Few important abstraction levels are given below with some explanation:
1)Behavioral Model : This is the highest level of abstraction provided by Verilog.This is very similar to a high level programming language like C. The code doesn't contain any circuit elements at this level.We are only concerned about the algorithmic flow at this level.
2)Register-Transfer Level : In this level we describe the circuit operations by the transfer of data between the registers.Generally we can say that any code which is synthesis-able is at the RTL level of abstraction.
3)Gate Level : At this level we use the fundamental and derived versions of digital gates to implement the design.We will be only using primitives like AND,OR,NOT etc in the level.This level is not generally used since the design becomes very complicated and time consuming relative to RTL and behavior level models.

Tuesday, September 7, 2010

What is a Content-addressable memory?

   Content-addressable memory (CAM) is a special type of computer memory used in certain very high speed searching applications. It is also known as associative array.One of its application is in caches for faster tag searching.

   Normally memories are accessed using an address which is used to retrieve data(which is the output you get) from the memory.But in CAM the user supplies a data word and the CAM searches its entire memory to see if that data word is stored anywhere in it. If the data word is available, then CAM returns a list of one or more storage addresses where the word was found.

   Because CAM is used to search the entire memory in a single clock cycle it is much faster than RAM in all search applications.But this makes CAM more complicated.The extra hardware cost required for parallel searching within the hardware makes it more expensive.Also the comparators are ON all the time in a CAM and hence the power dissipation is high.This may be a problem in  mobile devices.

   There are two CAM's : Binary and Ternary CAM.Binary CAM is the simplest type which can search for words consisting only of ones and zeros. Ternary CAM  (TCAM) allows a third matching state of "X" or "Don't Care" for one or more bits in the stored dataword. For example, we can search for "1X1" in a ternary CAM which will search for "101" and "111" in the memory.This requires more hardware circuit and more expensive.
  CAM type memories are used in caches,routers,Data compression hardware,Artificial neural networks etc.

More about cache (cont from last post).

    In a previous article I have written about basic cache principles,average access time etc. In this article I will give some more details on the working of cache.

   There are two types cache writing: write back(copy-back) and write through.
When the data at a particular memory location is updated then this data must be written back to cache.If the data is updated only in the cache then it is called write back.If the updating of data happens both in cache and main memory then it is called write through.Write through keeps the cache and memory synchronized.In the write back operation since the cache data is not same as the main memory data it is marked as "dirty" data.These dirty data will be written back into main memory when the particular data is cleared from the cache.If a miss happens in a write-back cache it may sometimes require two memory accesses to service : one to first write the dirty location to memory and then another to read the new location from memory.

   The main memory locations may be altered without proper updating in cache by peripherals using DMA or by a multi core processor.This results in a out of date data in cache.These type of data is called "stale" data.To solve these stale data problems we have to use cache coherence protocols between the cache managers to keep the data consistent.

All caches are CAM(content accessible memory).And for efficiency we have to scan all the memory contents in one cycle.This requires parallel hardware.Also higher the memory size the more is the memory access time.Read more about CAM type memories here.

   Let us see now,how a cache is made.Say we have a 32 bit main memory in our system and the cache chip size is 4 Kb.Also say each line in cache stores 32 bytes so that there are totally 128 lines.Each line in cache
have two fields. Address(4 bytes) and Data.The address is further divided into two fields- Tag(27 bits) and offset(5 bits for indexing a particular byte among the data).Remember that the tag contain the MSB 27 bits of the address here.These kind of caches are called Fully associative caches.Since the tag is 27 bits(relatively long) it takes more time to read data from Fully associative cache.Also more hardware circuit is required for parallel reading of tags from the CAM type cache.So they are expensive but more efficient.
Fully associative cache
   Another type of cache architecture is known as direct mapped cache.In this the address is divided into three fields named tag(20 bits),index(7 bits used as an index the 128 lines in cache) and offset(5 bits).The problem with this type of cache is that the cache is less efficient since the main memory cannot be copied to any line in cache as in fully associative cache.This is because the addresses with the same index will be mapped to the same line in cache.But the cache access time is less here.In certain situations you may get a cache miss for almost every access.So they are cheap but less efficient.
Direct mapped Cache
     Another type of cache is called set associative cache which has the advantages of both direct mapped and fully associative caches.These are again subdivided based on the number of bits in the index field.
1)2 way set associative cache - In this type of cache we have two group of lines,each containing 64 lines.The cache has the same number of fields as direct mapped cache but tag has 21 bits and index has 6 bits here.
2)4 way set associative cache - Here we have 4 groups each contains 32 lines.index has 5 bits and tag has 22 bits.
2-way and 4-way set associative caches

What is cache memory?

   A cache is a memory device that improves performance of the processor by transparently storing data such that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere.

 


   Access to cache can result in either one of the following: cache miss or cache hit.Cache hit means that the requested data is contained in the cache and cache miss means data is not found there in cache.On cache hit processor takes data from cache itself for processing.On cache miss the data is fetched from the original memory location.Cache memories are volatile and small in storage size.Since the storage size is small the address decoding takes less time and hence caches are faster then normal physical memories(RAM's) in computers.

  As I said the data is stored transparently in cache.This means that the user who is requesting data from the cache need not know whether data is stored in cache or system memory.It is handled by the processor.The word cache means "conceal" in French.

A simple cache contains three fields.
1)An index which is local to the cache.
2)A tag which is the index with reference to the main memory.This will let the processor know the location in main memory where an exact copy of data is stored.
3)Data, which is actual data needed by the processor.

   When processor needs some data from the memory it first checks in cache.It sees all the tag fields in the cache to see whether same data is available in cache.If the tag is found then the corresponding data is taken.Otherwise a cache miss error is asserted and the main memory is accessed.Also the cache memory is updated with the recent memory access.This is called cache update on cache miss.

   During a cache update if the cache is full, then it has to delete a row.This is decided on a cache replacement algorithm.Some algorithms are:
1)LRU - Least recently used data is replaced.
2)MRU - Most recently used data is replaced.
3)Random replacement - Simple, used in ARM processors.
4)Belady's Algorithm - discards the data which may not be used for the longest time in future.Not perfectly implementable in practice.

The average memory access time of a cache enabled system can be calculated using the hit and miss ratio of a cache.
Average memory access time = (Time_cache * Hit_counts ) + ( (Time_cache + Time_mm) * Miss_counts)
where,
Time_cache and Time_mm is the time needed to access a location for cache and main memory respectively.Hit_counts and Miss_counts are the hit and miss probabilities.
For an example on this, see the 7th problem on this post.

Note:-In the next post I will update more details on cache.Subscribe to the blog feeds for constant updates.

Monday, September 6, 2010

Difference between Harvard and Von Neumann computer architectures

There are basically two types of digital computer architectures. The first one is called Von Neumann architecture and later Harvard architecture was adopted for designing digital computers.

Von Neumann Architecture:
  • It is named after the mathematician and early computer scientist John Von Neumann.
  • The computer has single storage system(memory) for storing data as well as program to be executed.
  • Processor needs two clock cycles to complete an instruction.Pipelining the instructions is not possible with this architecture.
  • In the first clock cycle the processor gets the instruction from memory and decodes it. In the next clock cycle the required data is taken from memory. For each instruction this cycle repeats and hence needs two cycles to complete an instruction.
  • This is a relatively older architecture and was replaced by Harvard architecture.
Harvard Architecture:

  •  The name is originated from "Harvard Mark I" a relay based old computer.
  • The computer has two separate memories for storing data and program.
  • Processor can complete an instruction in one cycle if appropriate pipelining strategies are implemented.
  • In the first stage of pipeline the instruction to be executed can be taken from program memory.In the second stage of pipeline data is taken from the data memory using the decoded instruction or address. 
  • Most of the modern computing architectures are based on Harvard architecture.But the number of stages in the pipeline varies from system to system.
 These are the basic differences between the two architectures.A more comprehensive list can be found here with respect to ARM class of processors.