# Analysis of Mesh Topology of NoC for Blocking and Non-blocking Techniques Ashish Valuskar Research Scholar ECE Department MANIT Bhopal M.P, India Madhu Shandilya Professor ECE Department MANIT, Bhopal M.P, India Arvind Rajawat Professor ECE Department MANIT, Bhopal M.P, India ## **ABSTRACT** Network on Chip is efficient on-chip communication architecture for system on chip architectures. It enables the integration of a large number of computational and storage blocks on a single chip. The router is the basic element of NoC. The router architecture can be used for building a NoC with standard topology with low latency and high speed. In this paper, we implement and analyze a 3x3 mesh network configuration with routers which can support simultaneous routing requests, with blocking and non blocking inputs. **Keywords:** Network on Chip (NoC), Flit, Finite State Machine (FSM), Router Architecture, packet. #### 1. Introduction Network on Chip (NoC) is a new paradigm for integrating a large number of IPs cores for implementing a SoC [1-2]. In NoC paradigm a router-based network is used for packet switched communication among on chip cores. Since the communication infrastructure as well as the cores from one design can be easily reused for a new product, NoC provides high possibility for reusability. Networks are composed of routers, links between routers, and Network Interfaces (NI), which implement the interface to the IP modules Network on chip [3] is described in terms of the interconnection network topology, switching mechanism, routing, flow control, queuing (buffering) and scheduling. Network topology refers to the arrangement and type of interconnection of the nodes. Various network topologies include mesh, torus, hypercube and fat-tree. Switching refers to the mechanism of moving data from a source to a destination node. The two extremes are circuit switching and packet switching. In NoCs, the packet transfer happens between the cooperating routers, with no channel reservation along the path and independent routing decisions are made along the entire path. Flow control deals with the allocation of channel and buffers to a message as it travels from source to destination. The popular flow control mechanisms include store and forward, wormhole and virtual channel. Store and forward is the simplest routing mechanism and its latency is proportional to packet size. Although there are certain buffer requirements depending on specific applications, store and forward does not reserve channels, and hence it does not lead to idle physical channels [4]. The rest of the paper is organized as follows: Section 2 describes the network topology. Section 3 describes the routing algorithm. In section 4 we present the overview of router and its architecture. In section 5 we present simulation results. Section 6 concludes the paper. ## 2. Topology Network topology refers to the shape of the on-chip network. The network topology influences the router design in a NoC [4]. The way of different nodes in a network are connected and communicate to each other are determined by the network topology. The example topologies for NoC are mesh, 2D-torus etc. This architecture is based on an m x n mesh network where every router, except those at the edges, is connected to four neighboring routers and one computation resource (IP) through communication channel [9]. This topology allows integration of large number of IP cores in a regular shape structure. A channel consists of two unidirectional links between two routers or between a router and a resource. Fig. 1 shows a 3 x 3 mesh NoC with nine functional IP blocks. Fig 1:- A 3x3 mesh topology network [6] # 3. Utilized Routing Algorithm The XY routing algorithm is one kind of deterministic routing algorithm. For a 2-Dimension NoC mesh topology, each router can be identified by its coordinate (X, Y) (fig.1). We use XY routing due to its simplicity and low area overhead. Queuing refers to the strategy of storing the packets received in case of output contention. Different queuing (buffering) strategies include input buffering, output buffering and both input output buffering. We use both input and output buffering to decrease congestion. In XY routing, In the Input Channel, once the FIFO is filled, the X-coordinate of the destination router (say Hx) is compared with the locally stored X coordinate of the Router first to decide on the horizontal displacement. If Hx > X then the packet is forwarded to the East port of the Router, and if Hx < X then the packet goes out through the West port of the router. If Hx is equal to X then the Y-coordinate of the destination router (Hy) is compared with local Y coordinate of the Router to decide on the vertical displacement. If Hy > Y then the packet is forwarded to the North port and if Hy < Y the packet is forwarded to the South port. When Hy equals Y it indicates that the packet is at the destination router and so the packet is forwarded to the local port. ## 4. Router Architecture A router has a set of ports, namely, Local (L), North (N), East (E), South(S) and West (W), to communicate with the local logic element and the neighboring routers. It receives the incoming packets and forwards them to the appropriate port. Buffers are present at various ports to store the packets temporarily. A control logic will be present to take routing decisions and arbitration decisions. In this work, we design a 2D mesh noc router. The motivation is to reduce the area which also reduces power consumed. We choose one of the popular methods of buffering called store and forward as it has the simplest decoding logic. The router switches with a set of inter communicating ports; define the physical layer of the NoC system. There are two types of ports to establish communication, namely the input and output port. Here, two handshake signals (Req /Ack) is used to communicate between the input and output ports and forms data link layer. ## 5. Packet Description In our router implementation, we have used FIFO of size 16 available in Xilinx [5]. The fig.2 shows the structure of the packet format. Switching strategies define the granularity of data transfer and the applied switching technique. PEs (nodes) generates messages that are partitioned into possibly several data packets. A packet is further divided into multiple flits (flow control unit). A flit is an elementary packet on which link flow control operations are performed, and is essentially a synchronization unit between routers. Each flit is made up of one or more phits (physical units). A phit is a unit of data that is transferred on a link in a single cycle. The size of a phit is typically the width, in bits, of the communication link here The flit size is 8 bits. The first flit is always the header having the coordinates of the destination router. In this work, we fixed the number of X and Y bits at 2 each. Hence it can support a maximum of 4x4 NoC system. There is no trailer flit and hence the maximum data size is 120 bits per packet (for the FIFO of depth 16) which is sufficient for communication between the cores. Fig 2: - Structure of Packet Format [6] # **6. Router Implementation** The router has three main blocks - input channel, output channel and crossbar switch. (Fig 3) **6.1. Input Channel** -There is one input channel at each port, each running its control logic. Each input channel has a FIFO of depth 16 and data width of 8 bits and control logic which has been implemented as a FSM. The input channel accepts request from other neighboring router. On receiving the request, if it is free, it will acknowledge the request. The first flit is the header and following flits constitute the data. It will accept data as long as the request signal is held high. **6.2.** Crossbar Switch- The Crossbar Switch is a set of multiplexers and demultiplexers having an interconnection allowing all possible connection between the 5 input and 5 output channels. The output channel while granting the request of an input channel configures the multiplexers and demultiplexers of the cooperating input and output channels thereby establishing the connection between them for the transfer of the packet. **6.3. Output channel** - There is one output channel at each port which has an 8-bit FIFO of depth 16 and a control logic (FSM) making arbitration decisions. The output channel gets requests from the different input channels and grants one using Round Robin Arbiter (RRA) and sets the control bit lines of crossbar switch. It accepts the packet into its FIFO as long as the sending input FIFO is not empty thereby providing a simple decoding logic. **6.4. Round-Robin Arbiter (RRA)** - Round Robin Arbiter is implemented as FSM at each output channel. RRA arbitrates and decides which input channel is to be given access to that output channel when many channels are requesting the same output. Generally, the output channel must follow priority based arbitration. If a fixed priority scheme is followed, the same input channel may get access repeatedly. Hence in round robin arbiter, the priorities of the input ports are changed dynamically taking the last input port serviced into account. The round robin arbiter priority implemented clockwise i.e., if the last input port serviced was East, then the next priorities will be South, West, Local and North. Fig 3:- Router Architecture [11] ## 7. Simulation and Results The NoC router is implemented in VHDL and simulated in Xilinx ISE 13.2. The data width and the FIFO depth are parameterizable. In this work, the data width is fixed at 8 bits (flit size). The coordinates of the router are designed to be fed from primary I/O. Hence, it is necessary to initialize the routers with their coordinate values at the start of simulation. We use the Synchronous FIFO (ver. 8.1) from Xilinx Logic CORE (ver. 13.2). The parameters of the FIFO are customizable and can be appropriately set to meet the system requirements. The flow control mechanism is handshake. Both the input and output channels are buffered, so as to minimize the blockages in a store-and-forward buffering scheme. The arbitration scheme is dynamic, as there is round-robin arbiter implemented with a dynamic priority scheme. The **fig.4** shows the simulation result of 3x3 mesh network. **7.1 Router without blocking mechanism** - In this case five simultaneous requests are accepted by router. As the inputs coming to the five input channels of the router, request five different outputs thus the router is able to service at the same time and send them out to the output channels at the same time. **7.2 Router with blocking mechanism** – In this case all the packets arriving at different input channels simultaneously request the same output channel. This leads to blocking as one output channel cannot service all of them at the same time. Here the packets coming in via the north, east, south and west channels request the same local output channel of the router. So the Round Robin Arbiter at the local output port decides the order in which the input requests will be serviced. In this case for the local output port the order of service is north, east, south followed by west according to the priorities given in the arbiter. ## 7.3 3x3 mesh Network without blocking:- In this, the routing of two packets of data, one going from the local port of $0^{th}$ router (0,0) to the local port of $8^{th}$ router (2,2) and the other going from the local port of $8^{th}$ router (2,2) to the local port of $0^{th}$ router (0,0). These are the two of the four longest paths possible in the $3\times3$ matrix of routers. ## 7.4 3x3 mesh network with blocking:- In this case, the routing of two packets in 3x3 mesh network with blocking inputs, as two inputs from router (0, 0) and from router (0, 2) serviced simultaneously to the output port of local port of router coordinates (2, 1) and here the packets coming from router (0, 0) is serviced first according to round robin arbiter at the local port and then of router (2, 1). As it can't be possible to use the same route for both blocking and non blocking, because blocking happens when the packets arrive at a router at the same time, for this case to occur we need the 00 to 21 route. Fig 4:- 3x3 mesh network The Timing report for non blocking inputs of single router and blocking and non blocking inputs of 3x3 mesh network is given below in Table 1. After experimentation for different packet size it is observed that the non blocking 3x3 mesh consumes more cycles as compared to 3x3 mesh having blocking inputs. Because for blocking 3x3 mesh, the packets are routed from router (0,0) to router (2,1) i.e. there are two intermediate routers in between i.e. router (1,0) and router (2,0). In case of non blocking 3x3 mesh the packets are routed from router (0,0) to router (2,2), so in this packet has to travel from 3 intermediate routers i.e. router (1,0) router (2,0) and router (2,1), that is why it consumed more cycles to store the packet. | Packet<br>(in bit) | Single<br>Router<br>non<br>blockin<br>g (in<br>cycles) | 3x3 mesh<br>non<br>blocking<br>(in<br>cycles)[As<br>packet<br>travel via<br>three<br>intermedia<br>te router] | 3x3 mesh<br>blocking<br>(in<br>cycles)[As<br>packet<br>travel via<br>two<br>intermedi<br>ate<br>router] | |--------------------|--------------------------------------------------------|---------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------| | 16 | 10 | 54 | 44 | | 32 | 16 | 72 | 58 | | 64 | 24 | 114 | 78 | Table 1:- Timing report of single router and 3x3 mesh of non blocking inputs To calculate the power taken by the router design, we used the Xpower Analyzer tool [5]. The power consumed by the single router is 0.081 watt. The table 2 given below shows the device utilization summary. | Logic<br>Utilization | Used | Available | Utilization | |------------------------------|------|-----------|-------------| | Number of<br>Slices | 595 | 4656 | 12% | | Number of<br>Slice flip flop | 663 | 9312 | 7% | | Number of 4<br>Input LUTs | 971 | 9312 | 10% | | Number of bonded IOBs | 111 | 232 | 47% | | Number of<br>BRAMs | 10 | 20 | 50% | **Table 2:- Device Utilization Summary** #### 8. Conclusion In Implementation of NoC architecture in FPGAs, the area occupied by the network should be kept to a minimum. This ensures that the maximum area can be utilized by the logic while maintaining the performance of the router network. Reducing area also reduces the power consumption. In this paper, implementation of both input output buffering is done which can support five simultaneous routing requests at the same time. The header overhead is 8 bits per packet and the packet size can vary between 16 and 128bits. It also characterizes the implementation of a $3\times3$ mesh network with blocking and non blocking mechanism. # 9. REFERENCES - [1] L. Benini and G. De Micheli, *Network on Chips: A New SoC Paradigm*, IEEE Computer, Jan.2002, Pages: 70-78. - [2] S. Kumar, A Network on Chip Architecture and Design Methodology, Proc. Of IEEE Annual Symposium on VLSI, 2002, Pittsburgh, USA, Pages: 117-124. - [3] J. Duato and L. Ni S. Yalamanchili. Interconnect Networks: An Engineering Approach. In IEEE CS Press, 1998. - [4] Nikolay Kavaldjiev and Gerard J.M. Smit. A survey of efficient on-chip communications for SoC. In PROGRESS 2003 Embedded Systems Symposium, October 2003. - [5] Xilinx Inc. www.xilinx.com - [6] Sudeep Pasiricha, Nikil Dutt, "On chip Communication Architectures, System on chip Interconnect", Morgan Kauffmann, 2008. - [7] Y.Salah, M.Atri and R.Tourki "Design of a 2D Mesh-Torus Router for Network on Chip", IEEE International Symposium on Signal Processing and Information Technology", 2007, Pages 626-631. - [8] S.Amir Asghari, H.Pedram and M.Khademi "A flexible design of Network on Chip router based on Handshaking Communication Mechanism", Proceedings of the 14<sup>th</sup> International CSI Computer Conference (CSICC'09), 2009, pp.225-230, 2009. - [9] Attia,B;Chouchene,W;Zitouni,A;Abid,N;Tourki,R; "A Modular Router Architecture Design For Network on Chip", 8<sup>th</sup> International Multi Conference on Systems, Signals & Devices,2011.pp.1-6, 2011. - [10] Khalid Latif, Tiberiu Seceleanu, Hannu Tenhunen, "Power and Area Efficient Design of Network-on-Chip Router through Utilization of Idle Buffers", 17th IEEE International Conference and Workshops on Engineering of Computer-Based Systems, pp.131-138, 2010. - [11]B.Sethuraman, "Novel Methodologies for performance and power efficient Reconfigurable Network on Chip", IEEE International conference on Field Programmable Logic and Applications, 2006, pp.1-2, 2006.