| |
DATE 2001 Abstracts
Sessions:
[Keynote]
[1A]
[1B]
[1C]
[1E]
[2A]
[2B]
[2C]
[2E]
[3A]
[3B]
[3C]
[3E]
[4A]
[4B]
[4C]
[4E]
[4F]
[5A]
[5B]
[5C]
[5E]
[5F]
[6A]
[6B]
[6C]
[6E]
[6F]
[7A]
[7B]
[7C]
[7E]
[7F]
[8A]
[8B]
[8C]
[8E]
[8F]
[9A]
[9B]
[9C]
[9E]
[9F]
[9L]
[10A]
[10B]
[10C]
[10E]
[10F]
[Posters]
Plenary -- Keynote Session
Moderator: A. Jerraya, TIMA, Grenoble, F
-
The Semiconductor Dynamic in the Information Age -- Driving New
Technologies, Trends and Markets
-
U. Schumacher, CEO, Infineon, Munich, D
Moderators: T. Kropf, Robert Bosch GmbH, D; H. Eveking, TU Darmstadt, D
-
Abstraction of Word-Level Linear Arithmetic Functions from Bit-Level Component
Descriptions [p. 4]
-
P. Dasgupta, P. Chakrabarti, A. Nandi, S. Krishna, and A. Chakrabarti
RTL descriptions for word-level arithmetic components
typically specify the architecture at the bit-level of the registers.
The problem studied in this paper is to abstract
the word-level functionality of a component from its bit-level
specification. This is particularly useful in simulation
since word-level descriptions can be simulated much faster
than bit-level descriptions. Word-level abstractions are also
useful for reducing the complexity of component matching,
since the number of words is significantly smaller than the
number of bits. This paper presents an algorithm for abstraction
of word-level linear functions from bit-level component
descriptions. We also present complexity results for
component matching which justifies the advantage of performing
abstraction prior to component matching.
-
Biasing Symbolic Search by Means of Dynamic Activity Profiles [p. 9]
-
G. Cabodi, P. Camurati, and S. Quer
We address BDD based reachability analysis, which is the core technique
of symbolic sequential verification and Model Checking.
Within this framework, non purely breadth-first and guided traversals
have shown their value to improve efficiency by reducing memory
consumption for BDD representation.
We propose a guided search strategy exploiting performance statistics.
These activity figures are gathered through a continuous and dynamic
learning process on a variable-by-variable basis. This technique is
completely integrated with the reachability analysis routine, as it is fully
compatible with dynamic reordering and allows multiple partial traversal
phases. We thus move away from the static and manual schemes, which
are one of the main limitations of previous approaches.
Experiments are given to demonstrate the efficiency and robustness
of the approach.
Moderators: W. Rosenstiel, FZI/Tuebingen U, D; E. Villar, Cantabria U, ES
-
A Methodology for Interfacing Open Source SystemC with a Third Party Software
[p. 16]
-
L. Charest, M. Reid, E. Aboulhamid, and G. Bois
SystemC is a new open source library in C++ for
developing cycle-accurate or more abstract models of
software algorithms, hardware architecture and system-level
designs. SystemC is meant to be an interoperable,
modeling platform allowing seamless tool integration.
Our objective is to evaluate the feasibility of linking a
third party software to SystemC without modifying the
SystemC source. We chose the development of a GUI as
such an application. This application illustrates a set of
applications following the observer pattern defined
recently in software engineering. This class of
applications can be loosely coupled to a platform
designed following specific rules of software reuse.
-
Behavioral Synthesis with SystemC [p. 21]
-
G. Economakos, P. Oikonomakos, I. Panagopoulos, I. Poulakis, and G.
Papakonstantinou
Having to cope with the continuously increasing complexity
of modern digital systems, hardware designers
are considering more and more seriously language based
methodologies for parts of their designs. Last year, the introduction
of a new language for hardware descriptions, the
SystemC C++ class library, initiated a closer relationship
between software and hardware descriptions and development
tools. This paper presents a synthesis environment and
the corresponding synthesis methodology, based on traditional
compiler generation techniques, which incorporate
SystemC, VHDL and Verilog to transform existing algorithmic
software models into hardware system implementations.
Following this approach, reusability of software
components is introduced in the hardware world and time-to-market
is decreased, as shown by experimental results.
-
SystemCSV -- An Extension of SystemC for Mixed Multi-Level
Communication Modeling and Interface-Based System Design [p. 26]
-
R. Siegmund and D. Müller
An extension of SystemC for mixed-multi level communication
modeling and Interface-based system design is proposed
in this paper. SystemC SV provides a new design unit,
the interface, which enables specification, design and verification
of system communication separately from system
functionality, thus introducing a new quality of system design
into SystemC. The concepts and computational model
of SystemC SV interfaces are presented together with a design
example, the digital part of a wireless SmartCard
transponder-reader/writer system.
Organizer: Y. Zorian, LogicVision, USA
Moderator: P. Prinetto, Politecnico di Torino, IT
Speakers: J. Teixeira, IST/INESC, PT; I. Teixeira, IST/INESC, PT; C. Pereira,
UFRGS, BR; O. Dias, IST/INESC, PT; J. Semiao, IST/INESC, PT; P. Muhmenthaler,
Infineon, D; Y. Zorian, LogicVision, USA; W. Radermacher, Agilent, USA
-
Test Resource Partitioning: A Design and Test Issue [p. 34]
Product development economics and specs drive
the need for on chip embedded test functionality.
However, optimal partitioning of test functionality
between a tester and a SOC is a non-trivial task,
which must be solved during the system analysis
phase. Hence, at system level, a trade-off analysis
must be performed, in order to evaluate the costs and
benefits of different partitioning schemes. The purpose
of this contribution is to present a methodology
and tools, using the Object Oriented (OO) Paradigm
and UML, and a set of architectural Quality Metrics
(QMs), to analyze the impact of different TRP
schemes on system's architecture. A 4-core SOC case
study is presented to guide the discussion.
Organizer and Moderator: P. van Staa, Robert Bosch GmbH, D
Speaker: T. Beck, ETAS GmbH, D
-
Current Trends in the Design of Automotive Electronic Systems [p. 38]
Future developments in the automotive industry will
be governed by a variety of different requirements. Our
vision of a modern vehicle includes comprehensive
safety, a high degree of comfort, low energy
consumption, and minimal pollutant emission. These
demands can only be accomplished by employing
interconnected intelligent electronic devices, capable of
processing and sharing information about the car, the
driver, the environment, and others sources of data. The
implementation of such features will be critical for the
manufacturer's success and puts a high pressure on the
development process itself and the hardware- and
software-tools used for every step in this process.
Moderators: G. Martin, Cadence, USA; R. Seepold, FZI, D
-
Component Selection and Matching for IP-Based Design [p. 40]
-
T. Zhang, L. Benini, and G. De Micheli
Intellectual Property (IP) reuse is one of the most promising
techniques addressing the design complexity problem. IP reuse assumes
that pre-designed components can be integrated into the design
under development, thereby reducing design complexity and
time. On the other hand, as the number of IP providers increases,
the selection of the best IP block for a given design becomes more
challenging and time-consuming. In this paper, we present an
IP component matching system targeting automatic component
searching and matching across the Internet. The system is based
on Extensible Markup Language (XML) specification both for IP
libraries (a repository of pre-designed IP components indexed by
their corresponding specifications) and an IP user queries (specifications
with incomplete/uncertain attributes). An IP query is
parsed into a document object model (DOM) and the DOM is
transformed to an internal tree-structured model. Fuzzy logic
scoring and aggregation algorithms are applied to the internal
tree structure to provide a set of candidate approximate matches
ranked by proximity between the query and IP specification.
-
A Universal Communication Model for an Automotive System Integration Platform [p. 47]
-
T. Demmeler and P. Giusto
In this paper, we present a virtual integration platform
based design methodology for distributed automotive systems.
The platform, built within the 'Virtual Component
Co-Design' tool (VCC), provides the ability of distributing
a given system functionality over an architecture so as
to validate different solutions in terms of cost, safety requirements,
and real-time constraints. The virtual platform
constitutes the foundation for design decisions early
in the development phase, therefore enabling decisive and
competitive advantages in the development process. This
paper focuses on one of the key-enablers of the methodology,
the Universal Communication Model (UCM). The
UCM is defined at a level of abstraction that allows accurate
estimates of the performance including the latencies
over the bus network, and good simulation performance.
In addition, due to the high level of reusability and parameterization
of its components, it can be used as a
framework for modeling the different communication protocols
common in the automotive domain.
-
An Efficient Architecture Model for Systematic Design of Application-Specific
Multiprocessor SoC [p. 55]
-
A. Baghdadi, D. Lyonnard, N. Zergainoh, and A. Jerraya
In this paper, we present a novel approach for the
design of application specific multiprocessor systems-on-chip.
Our approach is based on a generic architecture
model which is used as a template throughout the design
process. The key characteristics of this model are its great
modularity, flexibility and scalability which make it
reusable for a large class of applications. In addition, it
allows to accelerate the design cycle. This paper focuses on
the definition of the architecture model and the systematic
design flow that can be automated. The feasibility and
effectiveness of this approach are illustrated by two
significant demonstration examples.
Moderators: N. Fristacky, Slovak TU, SLK; F. Rammig, C-LAB/Paderborn U, D
-
The Simulation Semantics of SystemC [p. 64]
-
J. Ruf, D. Hoffmann, J. Gerlach, T. Kropf, W. Rosenstiehl, and W. Mueller
We present a rigorous but transparent semantics definition
of SystemC that covers method, thread, and clocked
thread behavior as well as their interaction with the simulation
kernel process. The semantics includes watching
statements, signal assignment, and wait statements as they
are introduced in SystemC V1.0. We present our definition
in form of distributed Abstract State Machines (ASMs) rules
reflecting the view given in the SystemC User's Manual and
the reference implementation. We mainly see our formal
semantics as a concise, unambiguous, high "level specification
for SystemC" based implementations and for standardization.
Additionally, it can be used as a sound basis to investigate
SystemC interoperability with Verilog and VHDL.
-
MetaRTL: Raising the Abstraction Level of RTL Design [p. 71]
-
J. Zhu
The register transfer abstraction (RTL) has been established
as the industrial standard for ASIC design, soft IP exchange
and the backend interface for chip design at higher
level. Unfortunately, the "synthesizable" VHDL/Verilog incarnation
of the RTL abstraction has problems which prevent
it from more productive use. For example, the confusion
as the result of using simulation semantics for synthesis
purpose, the lack of facility for component reuse at the
"protocol" level, and the lack of memory abstraction. After
a detailed discussion of these problems, this paper proposes
a new RTL abstraction, called MetaRTL, which can
be implemented by a modest extension to the traditional imperative
programming languages. The productivity gain is
further demonstrated by the description of a synthesis tool,
called MetaSyn, which provides the "added-value". Experiments
on the benchmark set show that MetaRTL is far more
concise than the "synthesizable" HDL specification, and incurs
no overhead for synthesis result.
-
A Model for Describing Communication between Aggregate Objects in the
Specification and Design of Embedded Systems [p. 77]
-
K. Svarstad, G. Nicolescu, and A. Jerraya
The elevation of design description abstractions is a well
accepted technique for handling the complexity and shortening
the design time of modern embedded systems. It is
shown that abstractions for communication are as important
as for behaviour for specification and system level abstractions,
and an extension on a novel higher level communication
mechanism which has features for supporting
the description of complex aggregate associations between
objects in specifications such as UML is investigated. The
communication primitives have been implemented as extensions
to SystemC, and a comprehensive example from a UML
specification through functional specification down to an
executable SystemC decription is included.
Moderators: P. Harrod, ARM, UK; B. Becker, Freiburg U, D
-
Circuit Partitioning for Efficient Logic BIST Synthesis [p. 86]
-
A. Irion, G. Kiefer, H. Vranken, and H. Wunderlich
A divide-and-conquer approach using circuit
partitioning is presented, which can be used to
accelerate logic BIST synthesis procedures. Many
BIST synthesis algorithms contain steps with a time
complexity which increases more than linearly with the
circuit size. By extracting sub-circuits which are
almost constant in size, BIST synthesis for very large
designs may be possible within linear time. The
partitioning approach does not require any physical
modifications of the circuit under test. Experiments
show that significant performance improvements can
be obtained at the cost of a longer test application time
or a slight increase in silicon area for the BIST
hardware.
Keywords: circuit partitioning, deterministic BIST,
divide-and-conquer
-
Deterministic Software -Based Self-Testing of Embedded Processor Cores [p. 92]
-
A. Paschalis, D. Gizopoulos, N. Kranitis, M. Psarakis, and Y. Zorian
A deterministic software-based self-testing methodology
for processor cores is introduced that efficiently tests the
processor datapath modules without any modification of
the processor structure. It provides a guaranteed high
fault coverage without repetitive fault simulation
experiments which is necessary in pseudorandom
software-based processor self-testing approaches. Test
generation and output analysis are performed by utilizing
the processor functional modules like accumulators
(arithmetic part of ALU) and shifters (if they exist)
through processor instructions. No extra hardware is
required and there is no performance degradation.
-
Memory Fault Diagnosis by Syndrome Compression [p. 97]
-
J. Li and C. Wu
In this paper we present a data compression technique
that can be used to speed up the transmission of diagnosis
data from the embedded RAM with built-in self-diagnosis
(BISD) support. The proposed approach compresses the
faulty-cell address and March syndrome to about 28% of
the original size under the March-17N diagnostic test algorithm.
The key component of the compressor is a novel
syndrome-accumulation circuit, which can be realized by
a content-addressable memory. Experimental results show
that the area overhead is about 0.9% for a 1Mb SRAM with
164 faults. The proposed compression technique reduces
the time for diagnostic test, as well as the tester storage capacity
requirement.
-
Diagnosis for Scan-Based BIST: Reaching Deep into the Signatures [p. 102]
-
I. Bayraktaroglu and A. Orailoglu
For partitioning-based diagnosis in a scan-based BIST
environment, an exact analysis scheme, capable of identifying
all scan cells that receive incorrect data, is proposed.
In contrast to previously suggested approaches, the scheme
we propose identifies all failing scan cells with no ambiguity
whatsoever. Not only do we resolve failing scan cells
unambiguously, but we do so at the earliest possible instance
through reexamination of already computed signatures.
Intensive utilization of this highly precise diagnostic
state information leads to prognostic information regarding
the usefulness of running upcoming tests which in turn leads
to reductions in diagnosis time in excess of 30% compared
to previous approaches.
Organizer: P. van Staa, Robert Bosch GmbH, D
Moderator: S. Reiniger, DaimlerChrysler, D
-
Vehicle Electric/Electronic Architecture -- One of the Most Important
Challenges for OEM's [p. 112]
-
G. Hettich and T. Thurner
One of the most important challenge of a vehicle
manufacturer is the management of the increasing number
of networked E/E-Systems and their complex functional
dependencies.
To master this challenge, sophisticated E/E-architecture
approaches will be presented which cover
both, the vertical functional orientation, as well as the
horizontal integration aspects of a vehicle manufacturer.
Therefore we will present architectures and methods to
support the development of future
E/E-Systems, whereby the typical requirements of a
vehicle system integrator will be considered, such as
composability, hardware and software independence,
network-wide distribution of software components, and
the ability for separation between indication, operation
and behavior.
The paper describes the motivation, the system
integration requirements, actual existing solutions, future
technical challenges, and some detailed architecture
approaches itself. Furthermore the impacts of the
architecture on the development process and the OEM-supplier
relationship will be highlighted.
-
AIL: description of a global electronic architecture at the vehicle scale
-
Arjun Panday, Damien Couderc, Simon Marichalar
This paper introduces the Architecture
Implementation Language; a description language
that allows for an internal representation of the
architecture and acts as a connection with tools to
simplify the construction, planning, verification,
capitalisation, and documentation of an architecture.
The objective of AIL is to describe a vehicle
architecture from the level of the desired services
down to the level of physical implementation,
rendered concrete in one or more resulting
operational architectures. The proposed methodology
introduces the concepts of high level component
based architectures to the highly constrained
automotive world.
-
Methods and Tools for Systems Engineering of Automotive Electronic
Architectures
- Jakob Axelsson
The latest generations of road vehicles have seen a tremendous
development in on-board electronic systems, which
control increasingly large parts of the functionality. In this
paper, we discuss how the vehicle manufacturers need to
adjust their methods and tools to handle the increasing
complexity. The key issue is the system integration aspect,
which calls for increasing systems engineering capabilities.
Moderators: W. Damm, Oldenburg U/OFFIS, D; C. Delgado Kloos, U Carlos III de
Madrid, ES
-
Using SAT for Combinational Equivalence Checking [p. 114]
-
E. Goldberg, M. Prasad, and R. Brayton
This paper addresses the problem of combinational
equivalence checking (CEC) which forms one of the key
components of the current verification methodology for digital
systems. A number of recently proposed BDD based
approaches have met with considerable success in this area.
However, the growing gap between the capability of current
solvers and the complexity of verification instances necessitates
the exploration of alternative, better solutions. This
paper revisits the application of Satisfiability (SAT) algorithms
to the combinational equivalence checking (CEC)
problem. We argue that SAT is a more robust and flexible
engine of Boolean reasoning for the CEC application
than BDDs, which have traditionally been the method of
choice. Preliminary results on a simple framework for SAT
based CEC show a speedup of up to two orders of magnitude
compared to state-of-the-art SAT based methods for
CEC and also demonstrate that even with this simple algorithm
and untuned prototype implementation it is only moderately
slower and sometimes faster than a state-of-the-art
BDD based mixed engine commercial CEC tool. While SAT
based CEC methods need further research and tuning before
they can surpass almost a decade of research in BDD
based CEC, the recent progress is very promising and merits
continued research.
-
Combinational Equivalence Checking Using Boolean Satisfiability and Binary
Decision Diagrams [p. 122]
-
S. Reda and A. Salem
Most recent combinational equivalence checking
techniques are based on exploiting circuit similarity. In
this paper, we focus on circuits with no internal
equivalent nodes or after internal equivalent nodes have
been identified and merged. We present a new technique
integrating Boolean Satisfiability and Binary Decision
Diagrams. The proposed approach is capable of solving
verification instances that neither of both techniques was
capable to solve. The efficiency of the proposed approach
is shown through its application on hard to prove
industrial circuits and the ISCAS'85 benchmark circuits.
-
An Efficient Learning Procedure for Multiple Implication Checks [p. 127]
-
Y. Novikov and E. Goldberg
In the paper, we consider the problem of checking
whether cubes from a set S are implicants of a DNF
formula D, at the same time minimizing the overall time
taken by the checks. An obvious but inefficient way of
solving the problem is to perform all the checks
independently. In the paper, we consider a different
approach. The key idea is that when checking whether a
cube C from S is an implicant of D we can deduce (learn)
implicants of D that are not implicants of C. These cubes
can be used in the following checks for search pruning.
Experiments on random DNF formulas, DIMACS
benchmarks and DNF formulas describing circuits show
that the proposed learning procedure reduces the overall
time taken by checks by up to two orders of magnitude.
Organizers: D. Gajski, UC Irvine, USA; E. Villar, Cantabria U, ES
Moderator: E. Villar, Cantabria U, ES
Panellists: W. Rosenstiel, FZI/Tuebingen U, D; V. Gerousis, Infineon, D; D. Barton, Averstar, USA; J. Plantin, Ericsson, SE; P. Cavalloro, Italtel, IT; D.
Gajski, UC Irvine, USA; G. de Jong, Telelogic, B
-
C/C ++ : Progress or Deadlock in System-Level Specification [p. 136]
The lack of a general methodology and notation has
been identified as one of the main obstacles bedeviling
system-on-chip designers. Nevertheless, there is a lot of
confusion about what SLD (System Level Design) means
and which SLDL (System Level Design Language) is the
most appropriate.
With SOC demands there has been recently high
interest in system level design, particularly, HW/SW co-design.
In order to accommodate SW, the system
companies as well as EDA vendors would like to use C
as the language for System level Design. Many people
are trying with subset of C and others with C++ by
introducing classes that correspond to HW
(VHDL/Verilog) concepts. C/C++ syntax has become the
most popular for defining new C/C++ language
extensions for system-level specification and design. A
wide community of system designers and EDA suppliers
believe that C/C++ is the most appropriate vehicle to use
as a next-generation language. However, there are many
challenges and open problems.
Moderators: P. Muhmenthaler, Infineon Technologies, D; E.J. Marinissen, Philips Research, NL
-
An Integrated System-On-Chip Test Framework [p. 138]
-
E. Larsson and Z. Peng
In this paper we propose a framework for the testing of
system-on-chip (SOC), which includes a set of design
algorithms to deal with test scheduling, test access
mechanism design, test sets selection, test parallelization,
and test resource placement. The approach minimizes the
test application time and the cost of the test access
mechanism while considering constraints on tests, power
consumption and test resources. The main feature of our
approach is that it provides an integrated design
environment to treat several different tasks at the same time,
which were traditionally dealt with as separate problems.
Experimental results shows the efficiency and the usefulness
of the proposed technique.
-
Efficient Test Data Compression and Decompression for System-on-a-Chip Using
Internal Scan Chains and Golomb Coding [p. 145]
-
A. Chandra and K. Chakrabarty
We present a data compression method and decompression
architecture for testing embedded cores in a system-on-a-chip
(SOC). The proposed approach makes effective use
of Golomb coding and the internal scan chains of the core
under test, and provides significantly better results than a
recent compression method that uses Golomb coding and a
separate cyclical scan register (CSR). The use of the internal
scan chain for decompression obviates the need for a
CSR. In addition, the novel interleaving decompression architecture
allows multiple cores in an SOC to be tested concurrently
using a single ATE I/O channel. We demonstrate
the effectiveness of the proposed approach by applying it to
the ISCAS 89 benchmark circuits.
-
Testing TAPed Cores and Wrapped Cores with the Same Test Access Mechanism
[p. 150]
-
M. Benabdenbi, W. Maroufi, and M. Marzouki
This paper describes a way of testing both wrapped cores
and TAPed cores within a System On a Chip (SoC) with the
same Test Access Mechanism (TAM). The TAM's architecture,
which is dynamically reconfigurable, scalable and flexible,
is named CAS-BUS and have a central controller. All
the cores can be tested this way in the same session through
a modified Boundary Scan Test Access Port.
-
On Applying the Set Covering Model to Reseeding [p. 156]
-
S. Chiusano, S. Di Carlo, P. Prinetto, and H. Wunderlich
The Functional BIST approach is a rather new BIST
technique based on exploiting embedded system
functionality to generate deterministic test patterns during
BIST. The approach takes advantages of two well-known
testing techniques, the arithmetic BIST approach and the
reseeding method.
The main contribution of the present paper consists in
formulating the problem of an optimal reseeding
computation as an instance of the set covering problem.
The proposed approach guarantees high flexibility, is
applicable to different functional modules, and, in general,
provides a more efficient test set encoding then previous
techniques. In addition, the approach shorts the
computation time and allows to better exploiting the trade-off
between area overhead and global test length as well
as to deal with larger circuits.
Organizer: P. van Staa, Robert Bosch GmbH, D
Moderator: H. Heidbrink, Descon GmbH, D
Panellists: B. Potock, Mentor Graphics Corp, USA; J. Mueller, Rosemann
&Lauridsen GmbH, D; U. Ahle, Siemens Business Services, D; C. Basille,
Aerospatiale Matra Missiles, F; W. Kisselmann, Infineon Technologies, D;
W. Herden, Robert Bosch GmbH, D
-
Data Management -- Limiter or Accelerator for Electronic Design Creativity
[p. 162]
Data Management is the key to introduce concurrent
engineering, configuration management and work in
progress control throughout the entire design process.
That has been recognized by MCAD and ERP/MRP
Software vendors years ago. Product Data Management
(PDM) solutions are used and accepted for mechanical
designs but not in electronic design departments.
The EDA industry has not been focusing on strategies
to fill the gap between business processes and design
activities. Therefore today proprietary processes on a
directory file level mostly manage variant handling and
configuration management. Standard database
management solutions or Product Data Management
applications could not reach major market shares up to
now.
Moderators: H. Gräb, TU Munich, D; J. Eckmüller, Infineon
Technologies, D
-
Efficient Bit-Error-Rate Estimation of Multicarrier Transceivers [p. 164]
-
G. Vandersteen, P. Wambacq, Y. Rolain, J. Schoukens, S. Donnay,
M. Engels, I. Bolsens
Multicarrier modulation schemes are widely used in several
digital telecommunication systems, such as Asymmetric
Digital Subscriber Lines (ADSL) and Wireless Local Area
Network (WLAN) based on Orthogonal Frequency Domain
Multiplexing (OFDM). An estimate of the Bit-Error-Rate
(BER) degradation due to non-idealities in the transceiver
(e.g. nonlinear distortions in the analog front-ends, digital
clipping,...) is much more complicated in a multicarrier
system than in a single-carrier system due to the large
number of carriers and the huge number of possible
transmitted symbols. This paper proposes a method for
estimating the BER of such OFDM modulation schemes in
a CPU time that is two orders of magnitude smaller than a
Monte-Carlo method, as confirmed by simulations on a 5
GHz IEEE 802.11 WLAN receiver front-end.
-
Efficient Time -Domain Simulation of Telecom Frontends Using a Complex Damped
Exponential Signal Model [p. 169]
-
P. Vanassche, G. Gielen, and W. Sansen
This paper presents an efficient time-domain simulation
approach for telecommunication frontends at architectural
level. It is based upon the use of complex damped exponential
modeling functions. These allow to construct accurate
signal models for digitally modulated telecom signals, requiring
only few modeling functions. Since these models
are valid over a long range of time, they allow for a large
timestep, which greatly speeds up time-domain simulation
of the telecom frontends. Details of a simulation approach
based upon this signal model are discussed. The approach
is verified by experimental results.
-
Simulation Method to Extract Characteristics for Digital Wireless Communication Systems [p. 176]
-
L. Nguyen and V. Janicot
In all wireless standards involving digital
modulation, new fundamental characteristics have to be
extracted for quantifying the linearity/distortion in RF
designs. This paper describes a simulation technique,
Modulated Steady State, and its use to extract these
specifications. An example of its application to a typical
RF transmitter with a p/4-DQPSK modulator is
presented.
Moderators: G. Stamoulis, Intel, USA; K. Roy, Purdue U, USA
-
Microprocessor Power Analysis by Labeled Simulation [p. 182]
-
C. Hsieh, L. Chen, and M. Pedram
In many applications, it is important to know how power is
consumed while software is being executed on the target
processor. Instruction-level power microanalysis, which is
a cycle-accurate simulation technique based on instruction
label generation and propagation, is aimed at answering
this question for a superscalar and pipelined processor.
This technique requires the micro-architectural details of
the CPU and provides the power consumption of every
module (or gate) for each active instruction in each cycle.
To validate this approach, a Zilog digital signal processor
core was designed by using a 0.25 u TSMC cell library, and
the power consumption per instruction was collected using
a Verilog simulator specially written for the DSP core.
-
Power Aware Microarchitecture Resource Scaling [p. 190]
-
A. Iyer and D. Marculescu
In this paper we present a strategy for run-time profiling to optimize
the configuration of a microprocessor dynamically so as to
save power with minimum performance penalty. The configuration
of the processor changes according to the parallelism in the running
program. Experiments on some benchmark programs show
good savings in total energy consumption; we have observed a decrease
of up to 23% in energy/cycle and up to 8% in energy per
instruction. Our proposed approach can be used for energy-aware
computing in either portable applications or in desktop environments
where power density is becoming a concern. This approach
can also be incorporated in larger power management strategies
like ACPI.
-
Extending Lifetime of Portable Systems by Battery Scheduling [p. 197]
-
L. Benini, G. Castelli, A. Macii, E. Macii, M. Poncino, and R. Scarsi
Multi-battery power supplies are becoming popular in electronic
appliances of the latest generations, due to economical and
manufacturing constraints. Unfortunately, a partitioned battery
subsystem is not able to deliver the same amount of charge as
a monolithic battery with the same total capacity. In this
paper, we define the concept of battery scheduling, we investigate
policies for solving the problem
of optimal charge delivery, and we study the relationship
of such policies with different configurations of the battery
subsystem. Results, obtained for different workloads,
demonstrate that the choice of the proper scheduling can
make, in the best case, system lifetime as close as 1% of
that guaranteed by a monolithic battery of equal capacity.
Moderators: R. Galivanche, Intel, USA; B. Straube, FhG IIS/EAS Dresden, D
-
Efficient Spectral Techniques for Sequential ATPG [p. 204]
-
A. Giani, S. Sheng, M. Hsiao, and V. Agrawal
We present a new test generation procedure for sequential
circuits using spectral techniques. Iterative processes of
filtering via compaction and spectral analysis of
the filtered test set are performed for each primary input,
extracting inherent spectral information embedded within
the test sequence. This information, when viewed in the
frequency domain, reveals the characteristics of the input
spectrum. The filtered and analyzed set of vectors is then
used to predict and generate future vectors. We also developed
a fault-dropping technique to speed up the process.
We show that very high fault coverages and small vector
sets are consistently obtained in short execution times for
sequential benchmark circuits.
-
On the Test of Microprocessor IP Cores [p. 209]
-
F. Corno, M. Sonza Reorda, S. Squillero, and M. Violante
Testing is a crucial issue in SOC development and
production process. A popular solution for SOCs that
include microprocessor cores is based on making them
execute a test program. Thus, implementing a very
attracting BIST solution. This paper describes a method
for the generation of effective programs for the self-test of
a processor. The method can be partially automated, and
combines ideas from traditional functional approaches
and from the ATPG field. We assess the feasibility and
effectiveness of the method by applying it to a 8051 core.
-
Sequence Reordering to Improve the Levels of Compaction Achievable by Static
Compaction Procedures [p. 214]
-
I. Pomeranz and S. Reddy
We describe a reordering procedure that changes the order of test
vectors in a test sequence for a synchronous sequential circuit
without reducing the fault coverage. We use this procedure to
investigate the effects of reordering on the ability to compact the
test sequence. Reordering is shown to have two effects on compaction.
(1) The reordering process itself allows us to reduce the
test sequence length. (2) Reordering can improve the effectiveness
of an existing static compaction procedure. Reordering also
provides an insight into the detection by test generation procedures
of faults that are detected by relatively long subsequences.
-
SEU Effect Analysis in an Open-Source Router via a Distributed Fault Injection
Environment [p. 219]
-
A. Benso, S. Di Carlo, G. Di Natale, and P. Prinetto
The paper presents a detailed error analysis and
classification of the behavior of an open-source router,
when affected by Single Event Upsets (SEUs). The
experimental results have been gathered on a real
communication network, resorting to an ad-hoc Fault
Injection system. The injector has been designed to
corrupt the router during its normal service and to analyze
the SEU injection effects on the overall distributed system.
The performed experiments allowed the authors to
identify the most critical memory regions and to cluster the
router variables according to their impact on system
dependability.
Organizer: A. Lock, Synopsys, USA
Moderator: R. Camposano, Synopsys, USA
Panellists: R. Camposano, Synopsys, USA; A. Cuomo, STMicrolectronics, IT;
R. Subramanian, MorphICs., USA; H. Meyr, TU Aachen, D
-
The Programmable Platform: Does One Size Fit All? [p. 226]
This special panel session brings together several
leading technologists representing organisations within
the telecom and system-on-chip design communities.
The panel will discuss the trend in platform-based
design, where new products are increasingly based on
re-programmability or re-configuration of more
general-purpose devices. Particular emphasis will be
placed on the need to meet the requirements of the
Telecom market, where flexibility is a key concern, but
with the shift towards third-generation wireless
systems, so too is performance.
Moderators: F. Johannes, TU Munich, D; R. Otten, TU Delft, NL
-
Slicing Tree is a Complete Floorplan Representation [p. 228]
-
M. Lai and D. Wong
Slicing tree has been an effective tool for VLSI floorplan design.
Floorplanners using slicing tree representation take
full advantage of shape and orientation flexibility of circuit
modules to find highly compact slicing floorplans. However,
slicing floorplans are commonly believed to suffer from poor
utilization of space when all modules are hard. For this reason,
a large body of literature has recently been devoted to
various new representations of non-slicing floorplans to improve
space utilization. In this paper, we prove that by using
slicing tree representation and compaction, all maximally
compact placements of modules can be generated. In conclusion,
slicing tree is a complete floorplan representation
for all non-slicing floorplans as well.
-
Further Improve Circuit Partitioning Using GBAW Logic Perturbation Techniques
[p. 233]
-
C. Cheung, Y. Wu, and D. Cheng
Efficient circuit partitioning is gaining more importance
with the increasing size of modern circuits. Conventionally,
circuit partitioning is solved by modeling a circuit as a hypergraph
for the ease of applying graph algorithms. However,
there exist rooms for further improvement on even optimum
hypergraph partitioning results, if logic information
can be applied for perturbation. In this paper, we present a
multi-way partitioning framework which can couple any excellent
hypergraph partitioner and a noval logic perturbation
based (GBAW) technique for further improvement over
very excellent partitioning results. Our approach can integrate
with any graph partitioner. We performed experiments
on 2-, 3-, 4-, and 5-way partitionings for various circuits of
different sizes from MCNC benchmarks. We have chosen
the state-of-the-art hMetis-Kway to obtain high quality initial
solutions for the experiments. Our experiments showed
that this partitioning approach can achieve a further 15%
reduction in cut size for 2-way partitioning with an area
penalty of only 0.33%. The good results demonstrated the
effectiveness of this new partitioning technique.
-
Clustering Based Fast Clock Scheduling for Light Clock-Tree [p. 240]
-
M. Saitoh, M. Azuma, and A. Takahashi
We introduce a clock schedule algorithm to obtain a
clock schedule that achieves a shorter clock period and that
can be realized by a light clock tree. A shorter clock period
can be achieved by controlling the clock input timing
of each register, but the required wire length and power
consumption of a clock tree tends to be large if clock input
timings are determined without considering the locations
of registers. To overcome the drawback, our algorithm
constructs a cluster that consists of registers with the same
clock input timing located in a close area. In our algorithm,
first registers are partitioned into clusters by their
locations, and clusters are modified to improve the clock
period while maintaining the radius of each cluster small.
In our experiments for an industrial data of 888 registers,
the clock period achieved is 27% shorter than that achieved
by a zero-skew clock tree, and 1% longer than the theoretical
minimum. The computational time is about 24.9 seconds
and the wire length and power consumption of the clock tree
is comparable to these of a zero skew tree.
Moderators: N. Wehn, Kaiserslautern U, D; M. Bolle, Systemonic, D
-
Power-Efficient Layered Turbo Decoder Processor [p. 246]
-
J. Dielissen, J. van Meerbergen, M. Bekooij, F. Harmsze,
S. Sawitzki, J. Huisken, and A. van der Werf
Turbo decoding offers outstanding error correcting
capabilities, that will be used in wireless applications
like the Universal Mobile Telecom Standard[4]
(UMTS). However, the algorithm is very computational
intensive, and therefore an implementation on
a general purpose programmable DSP results in a
power consumption which reduces the applicability
of turbo decoding in hand-held applications. In
this paper we present a solution based on a layered
processing architecture. This architecture includes
an application specific Very Long Instruction Word
(VLIW) processor, a data flow processor, and hardwired
execution units in a hierarchical way. The
power consumption of this solution is an order of
magnitude better than the implementation on a current
state of the art, power efficient general purpose
DSP.
-
Exploiting Data Forwarding to Reduce the Power Budget of VLIW Embedded Processors [p. 252]
-
M. Sami, D. Sciuto, C. Silvano, V. Zaccaria, and R. Zafalon
In this paper, a low-power approach to the design
of embedded VLIW processor architectures is proposed.
To solve the most part of data hazards in the pipeline,
processors use forwarding (or bypassing) hardware to
provide the required operands from the inter-stage pipeline
registers directly to the inputs of the function units. The
operands are then stored in the Register File during the
write-back pipeline stage. In this paper, we propose a power
optimization technique based on the exploitation of the
forwarding paths in the processor to avoid the power cost
of writing/reading short-lived variables to/from the Register
File. In application-specific embedded systems, experimental
evidence has shown that a significant number of variables
are short-lived, that is their liveness (from first definition to
last use) spans only few instructions. Values of short-lived
variables can be accessed directly through the forwarding
registers, avoiding write-back. An application example of
our solution to a VLIW embedded core, when accessing the
Register File, has shown a power saving up to 35% with
respect to the unoptimized approach on the given set of
target benchmarks. The performance overhead is equal to
one-gate delay to be added on the processor critical-path.
Keywords: Low-Power, Pipeline Processors, VLIW
Embedded Architectures, Forwarding.
-
Design of Low-Power High-Speed Maximum a Priori Decoder Architectures [p. 258]
-
A. Worm, H. Lamm, and N. Wehn
Future applications demand high-speed maximum a posteriori
(MAP) decoders. In this paper, we present an in-depth
study of design alternatives for high-speed MAP architectures
with special emphasis on low power consumption.
We exploit the inherent parallelism of the MAP algorithm
to reduce power consumption on various abstraction
levels. A fully parameterizable architecture is introduced,
which allows to optimally adapt the architecture to the application
requirements and the throughput. Intensive design
space exploration has been carried out on a state-of-the-art
0.2 um technology, including efficient parallelism
techniques, a data flow transformation for reduced power
consumption, and an optimized FIFO implementation.
Moderators: E. Macii, Politecnico di Torino, IT; D. Marculescu, Carnegie
Mellon U, USA
-
Low Complexity FIR Filters Using Factorization of Perturbed Coefficients
[p. 268]
-
C. Neau, K. Muhammad, and K. Roy
This paper presents a factorization based technique to
reduce the computational complexity of implementing
Finite Impulse Response (FIR) digital filters. It is possible
to design FIR filters in which all of the filter coefficients
are products of the first seven prime numbers. For such
filters, factorization of the filter coefficients allows the
reuse of intermediate results among computations
involving common factors. Since the coefficients are
products of only small prime numbers, it is also possible to
generate each of the partial products with a single shift
and add operation. Compared to a traditional
implementation, this results in a 35-50% reduction in
computational complexity, which is shown to translate into
lower power consumption.
-
An Adaptive Algorithm for Low-Power Streaming Multimedia Processing [p. 273]
-
A. Acquaviva, L. Benini, and B. Riccó
This paper addresses the problem of power consumption
in multimedia system architectures and presents an algorithmic
optimization technique to achieve the goal of power
reduction in the context of real time processing. The technique
is based on a mixed speed-setting and shutdown policy.
We address the problem from both a theoretical and
practical point of view, by presenting a power efficient implementation
of a MPEG-layer3 real-time decoder algorithm
designed for wearable devices as a case study. The
target system is the Hewlett-Packard's SmartBadgeIII prototype
of wearable system based on the StrongARM1100
processor. Theoretical analysis as well as quantitative results
of power measurements are provided to show the effectiveness
of this technique. The experimental set-up is also
described.
-
A Static Power Estimation Methodology for IP-Based Design [p. 280]
-
X. Liu and C. Papaefthymiou
This paper proposes a novel system-level power estimation
methodology for electronic designs consisting of intellectual
property (IP) components. Our methodology relies
on analytical output and power macromodels of the
IP blocks to estimate system dissipation without performing
any simulation. We derive upper bounds on the estimation
error of our methodology and demonstrate the relation
of this error to the sensitivities of the macromodeling
functions. For circuits without feedback, we give a sufficient
condition for the worst-case power estimation error
to increase only linearly with the length of the IP cascades.
We also give a tighter sufficient condition that ensures error
boundedness in IP systems of any topology. Experiments
with signal processing and data encryption systems validate
the accuracy and efficiency of our approach. For designs of
up to 576 IP blocks, power estimates are obtained within
0.2 seconds. In comparison with switch-level simulation results,
the average error of our power estimates is 7.3%.
Moderators: C. Metra, DEIS-Bologna U, IT; R. Leveugle, TIMA, Grenoble, F
-
Optimization of Error Detecting Codes for the Detection of Crosstalk Originated Errors [p. 290]
-
M. Favalli and C. Metra
This work applies weight based codes [1] to the detection
of crosstalk originated errors. This kind of faults, whose
importance grows with device scaling, may originate errors
that are undetectable by the mostly used error detecting
codes in VLSI ICs. Conversely, such errors can be easily
detected by weight based codes that, however, have smaller
encoding capabilities. In order to reduce the cost of these
codes, a graph theoretic optimization is used. Moreover, new
applications of these codes are explored regarding the synthesis
of self-checking FSMs, and the detection of errors related
to the clock distribution network.
-
System Safety through Automatic High-Level Code Transformations: An Experimental Evaluation [p. 297]
-
P. Cheynet, B. Nicolescu, R. Velazco, M. Rebaudengo, M. Sonza Reorda,
and M. Violante
This paper deals with a software modification strategy
allowing the on-line detection of transient errors. Being
based on a set of rules for introducing redundancy in the
high-level code, the method can be completely automated,
and is particularly suited for low-cost safety-critical
microprocessor-based applications. Experimental
results from software and hardware fault injection campaigns
are presented and discussed, demonstrating the
effectiveness of the approach in terms of fault detection
capabilities.
-
From DFT to Systems Test -- A Model Based Cost Optimization Tool [p. 302]
-
M. Wahl, T. Ambler, C. Maaß and M. Rahman
Long lasting systems like airplanes have a cost structure
where the maintenance costs are larger than the
purchasing costs. Testing is required, both for preventive
maintenance as well as repair and a majpor source for
cost. Previously we have analysed test and Design for
Testability for digital systems, covering ASICs, boards
and systems. Besides, the continuous development of
technology requires cost models that can grow dynamically
and, because we will never have all information,
can work with incomplete data sets. In this paper we
present a tool that is well suited for a wide range of
applications. Previously developed cost models can be
incorporated and new elements can be added to the
model as needed. Due to the generic approach the tool
allows modelling general systems. It is not bound to the
digital domain, although it has a strong background
there.
-
Efficient On-Line Testing Method for a Floating-Point Adder [p. 307]
-
A. Drozd and M. Lobachev
In this paper we present a residue method for on-line
testing of the floating-point adder. This circuit contains
arithmetic shifter which executes an abridged operation.
In the method the problem of the abridged operation
checking with the reduced hardware amount is solved.
Organizer: J. Rabaey, UC Berkeley, USA
Moderator: M. Engels, IMEC, B
-
Design Methodology for PicoRadio Networks [p. 314]
-
J. da Silva Jr., J. Shamberger, M. Ammer, C. Guo, S. Li,
R. Shah, T. Tuan, M. Sheets, J. Rabaey,
B. Nikolic, A. Sangiovanni-Vincentelli, and P. Wright
One of the most compelling challenges of the next decade is
the "last-meter" problem, extending the expanding data network
into end-user data-collection and monitoring devices. PicoRadio
supports the assembly of an ad hoc wireless network of self-contained
mesoscale, low-cost, low-energy sensor and monitor
nodes. While technology advances have made it conceivable to
deploy wireless networks of heterogeneous nodes, the design of a
low-power, low-cost, adaptive node in a reduced time to market is
still a challenge. We present a design methodology for PicoRadio
Networks, from system conception and optimization to silicon
platform implementation. For each phase of the design, we
demonstrate the applicability of our methodology through
promising experimental results.
Moderators: W. John, Fraunhofer Institute Berlin/Paderborn, D;
F. Sabath, Armed Forces Institute for Protection Technologies, USA
-
High-Level Simulation of Substrate Noise Generation from
Large Digital Circuits with Multiple Supplies [p. 326]
-
M. Badaroglu, M. van Heijningen, V. Gravot, S. Donnay, H. De Man, G. Gielen
M. Engels, and I. Bolsens
Substrate noise generated by large digital circuits degrades
the performance of analog circuits sharing the
same substrate. Existing approaches usually extract the
model of the substrate from the layout information and
then simulate the extracted transistor-level netlist with this
substrate model using a transistor-level simulator. For
large digital circuits, the substrate simulation is however
not feasible with a transistor-level simulator. In our previous
work, it has been demonstrated that efficient and accurate
simulation of substrate noise generation at gate-level
is feasible. In this paper several important extensions
to our previous work are introduced: modeling of IO cells,
modeling of input transition time and load dependency
and the extraction methodology of an equivalent substrate
model within multiple supply domains. Experimental results
show an improved accuracy (6.3% error on RMS
substrate voltage with respect to a full SPICE level simulation)
with these extensions, while maintaining a large
speedup with respect to SPICE simulations.
-
Crosstalk Noise in Future Digital CMOS Circuits [p. 331]
-
C. Werner, R. Göttsche, A. Wörner, and U. Ramacher
This paper presents simulation results for crosstalk noise
in future CMOS generations down to 35 nm features. The
noise voltage is calculated from circuit simulations with
lumped RLC networks and static CMOS cells. A static
noise margin is derived from inverter characteristics of
NAND and NOR gates and a critical wire length is
calculated from considering statistical variations in the
chip manufacturing process. The model agrees well with
measurements on a quarter micron testchip and predicts a
drastic drop of critical wirelengths to 50-60 um after the
100 nm technology generation.
-
Modeling Electromagnetic Emission of Integrated Circuits for System Analysis [p. 336]
-
P. Kralicek, W. John, and H. Garbe
In this contribution a new methodology for modeling electromagnetic
emission of integrated circuits in system analysis
is shown. By using a physical model based on a multipole
expansion, the emitted fields can be well approximated
in the space outside a component. This allows a convenient
representation with a low number of model parameters
which can be determined by measurement or simulation.
To show the applicability, the developed models are
used in a system level printed circuit board simulator. The
results are compared with reference calculations.
-
Analysis of EME Produced by a Microcontroller Operation [p. 341]
-
F. Fiori and F. Musolino
This paper deals with the characterization of integrated circuits
electromagnetic emissions. The TEM cell method is employed in order
to identify primary emissions sources of complex digital devices.
An 8-bit microcontroller, realized by a 0.8 um HCMOS process is
considered. It is composed of several building blocks like the
central processing unit, the analog to digital converter and the
EPROM memory. Emission measurements are performed by operating a
specific program code stored in the microcontroller memory and
emissions due to each building block are identified.
Moderators: A. Kaiser, IEMN-ISEN, F; P. Wambacq, IMEC, B
-
Top-Down Design of a xDSL 14-bit 4MS/s Sigma-Delta Modulator in Digital CMOS Technology [p. 348]
-
R. del Río, J. de la Rosa, F. Medeiro, B. Pérez-Verdú, and A.
Rodríguez-Vázquez
This paper describes the design of a Sigma-Delta modulator
aimed for A/D conversion in xDSL applications, featuring
14-bit@4Msample/s in a 0.35mm mainstream digital
CMOS technology. Architecture selection, modulator sizing
and cell sizing tasks where supported by a CAD methodology,
thus allowing us to obtain a power efficient implementation
in a short design cycle.
-
Analog Design for Reuse -- Case Study: Very Low-Voltage Sigma-Delta Modulator [p. 353]
-
M. Dessouky, A. Kaiser, M. Louërat, and A. Greiner
This paper presents the complete design methodology
of a very low-voltage DS third-order modulator from high-level
specifications down to layout. Behavioral models taking
into account cell non-idealities are developed and used
to map performance specifications to lower levels. Emphasis
has been made on eventual design reuse through design
plans and layout templates in a layout-oriented circuit design
approach. The modulator has been designed for two
different technologies demonstrating the suitability of the
methodology for very high performance mixed-signal circuits.
Moreover, the same design knowledge has been successfully
reused in another fourth-order modulator.
-
A Design Strategy for Low-Voltage Low-Power Continuous-Time
Sigma-Delta A/D Converters [p. 361]
-
F. Gerfers and Y. Manoli
This paper presents a design strategy for low-voltage
low-power Sigma-Delta analog-to-digital (A/D) converter using a
continuous-time (CT) lowpass loopfilter. An improved
method is used to find the optimal Sigma-Delta modulator implementation
with respect to a minimal power consumption on
the one hand and to fulfill a rapid prototyping approach on
the other hand. The influence of the low supply voltage
as well as circuit nonidealities on the overall Sigma-Delta modulator
determined and verified by behavioral simulations.
Transistor-level simulation results of a 1:5 V CT Sigma-Delta A/D
converter show a 75 dB dynamic range in a bandwidth of
25kHz.
Moderators: R. Murgai, Fujitsu Labs of America, USA; S. Minato, NTT, JP
-
Minimizing Stand-By Leakage Power in Static CMOS Circuits [p. 370]
-
S. Naidu and E. Jacobs
In this paper we concern ourselves with the problem of
minimizing leakage power in CMOS circuits consisting of
AOI (and-or-invert) gates as they operate in stand-by mode
or an idle mode waiting for other circuits to complete their
operation. It is known that leakage power due to subthreshold
leakage current in transistors in the OFF state is
dependent on the input vector applied. Therefore, we try to
compute an input vector that can be applied to the circuit in
stand-by mode so that the power loss due to sub-threshold
leakage current is the minimum possible. We employ a
integer linear programming (ILP) approach to solve the
problem of minimizing leakage by first obtaining a good
lower bound (estimate) on the minimum leakage power and
then rounding the solution to actually obtain an input
vector that causes low leakage. The chief advantage of this
technique as opposed to others in the literature is that it
invariably provides us with a good idea about the quality of
the input vector found.
-
In-Place Delay Constrained Power Optmization Using Functional Symmetries
[p. 377]
-
C. Chang, B. Hu, and M. Marek-Sadowska
In-Place Optimization (IPO) has become the backend
methodology of choice to resolve the gap between logic
synthesis and physical design as the optimization can be
guided by accurate physical information. To perform optimization
without perturbing too much the placed netlist,
only buffer insertion and gate sizing are commonly used in
current design tools. In this paper, we address the problem
of delay-constrained power optimization by introducing
another degree of freedom: functional symmetry based
rewiring. Theoretical results on the effect of using functional
symmetry on transition density for power estimation
is also derived. Experimental results show that, under the
same delay constraint, our technique achieves much better
power reduction as compared to the discrete gate sizing
only technique.
-
High-Quality Sub-Function Construction in Functional Decomposition Based on
Information Relationship Measures [p. 383]
-
L. Józwiak and A. Chojnacki
Functional decomposition seems to be the most effective
circuit synthesis approach for look-up table (LUT)
FPGAs, (C)PLDs and complex gates. In the functional
decomposition that targets LUT FPGAs, the circuit is
constructed by recursively decomposing a given function
and its sub-functions until each of the resulting sub-functions
can be directly implemented with a LUT. The
choice of sub-functions constructed in this process
decides the quality of the resulting multi-level circuit
expressed in terms of the logic block count and speed. In
this paper, we propose a new effective and efficient
method for the sub-function construction, and we consider
its application in our circuit synthesis tool that targets
LUT-based FPGAs. The method is based on the
information relationship measures. The experimental
results demonstrate that the proposed approach leads to
extremely fast and very small circuits.
-
Generalized Reasoning Scheme for Redundancy Addition and Removal Logic Optimization [p. 391]
-
J. Espejo, L. Entrena, E. San Millán, and E. Olías
In this work a generalization of the structural
Redundancy Addition and Removal (RAR) logic
optimization method is presented. New concepts based on
the functional description of the nodes in the network are
introduced to support this generalization. Necessary and
sufficient conditions to identify all the possible structural
expansions are given for the general case of multiple
variable expansion. Basic nodes are no longer restricted
to simple gates and can be any function of any size. With
this generalization, an incremental mechanism to perform
structural transformations involving any number of
variables can be applied in a very efficient manner.
Experimental results are presented that illustrate the
efficiency of our scheme.
Moderators: J. Teixeira, IST/INESC, PT; M. Sonza Reorda, Politecnico di Torino, IT
-
LPSAT: A Unified Approach to RTL Satisfiability [p. 398]
-
Z. Zeng, P. Kalla, and M. Ciesielski
LPSAT is an LP-based comprehensive infrastructure designed
to solve the satisfiability (SAT) problem for complex RTL
designs containing both word-level arithmetic operators and
bit-level Boolean logic. The presented technique uses a mixed
integer linear program to model the constraints corresponding
to both domains of the design. Our technique renders the
constraint propagation between the two domains implicit to
the MILP solver, thus enhancing the overall efficiency of the
SAT framework. The experimental results are quite promising
when compared with generic CNF-based and BDD-based SAT
algorithms.
-
Functional Test Generation for Behaviorally Sequential Models [p. 403]
-
F. Ferrandi, G. Ferrara, D. Sciuto, A. Fin, and F. Fummi
Functional testing of HDL specifications is one of the
most promising approaches for the verification of the functionalities
of a design before synthesis. The contribution of
this work is the development of a test generation algorithm
targeting a new coverage metric (called bit-coverage) that
provides full statement coverage, branch coverage, condition
coverage and partial path coverage for behaviorally
sequential models.
The behavioral test sequences can be also the only way
to evaluate testability of VHDL model for which a gate-level
representation is not available (e.g third-party cores), since
the behavioral error model is characterized also by a high
correlation with the RT and gate-level stuck-at fault model.
Moreover, the preciseness of the proposed coverage metric
makes the identified test sequences more effective in identifying
design errors, than other test patterns developed by
following standard coverage metrics.
-
High Quality Behavioral Verification Using Statistical Stopping Criteria [p. 411]
-
A. Hajjar, T. Chen, I. Munn, A. Andrews, and M. Bjorkman
In order to improve the efficiency of behavioral model
verification, it is important to determine the points of deminishing
return for a given verification strategy. This paper
compares the existing stopping rules and presents a new
stopping rule based on static Bayesian technique. The new
stopping rule was applied to verifying 14 complex VHDL
models. We used the figure of merit to compare the efficiency
of the stopping rules. The results in terms of coverage and
verification time were shown to consistently outperform existing
stopping rules.
Keywords: Behavioral Model Verification, VHDL, Statistical
Stopping Rules.
Organizers: P. Bromley, F. Karim, and P. Paulin, STMicroelectronics, F
Moderator: P. Paulin, STMicroelectronics, F
-
Network Processors: A Perspective on Market Requirements,
Processor Architectures and Embedded S/W Tools [p. 420]
-
P. Paulin, F. Karim, and P. Bromley
With the projected explosion of low-cost bandwidth
availability, the intensive processing tasks and service
hosting will move close to consumers on the "intelligent
edge" of the network, where a significant portion of the
future storage, processing and network management will
take place. We address the rationale for this change, the
characteristics of the network processor architecture
required to address it, and the software development tools
needed in order to improve time-to-market without
sacrificing embedded software performance.
Moderators: L. Silveira, IST/INESC, PT; H. Grabinski, Hannover U, D
-
Efficient Inductance Extraction via Windowing [p. 430]
-
M. Beattie and L. Pileggi
We propose a new, efficient and accurate localized inductance modeling
technique via windowing in a manner that is analogous to localized
capacitance extraction. The stability and accuracy of this process
is made possible by twice inverting the localized inductance models,
and in the process exploit properties of the magnetostatic interactions
as modeled via the susceptance (inverse inductance). Application of these
localized double-inverse inductance models to actual IC bus examples
demonstrates the significant improvement in simulation efficiency and
overall accuracy as compared to alternative methods of approximation
and simplification.
-
Efficient and Passive Modeling of Transmission Lines by Using Differential Quadrature Method [p. 437]
-
Q. Xu and P. Mazumder
This paper introduces a new transmission line modeling
approach that employs an efficient numerical approximation
technique called the Differential Quadrature Method
(DQM). The transmission line has been discretized and
the approximation framework is constructed by using the
5th order differential quadrature method, consequently an
improved discrete equivalent-circuit model is developed
in the paper. The DQM-based modeling requires far fewer
intervening grid points for building an accurate discrete
model of the transmission line than numerical methods
like FD requires. It introduces far less state variables than
FD-based models; therefore, it has higher efficiency. The
DQM technique can be integrated in a circuit simulator
since it preserves the passivity.
-
Explicit Formulas and Efficient Algorithm for Moment Computation of Coupled RC
Trees with Lumped and Distributed Elements [p. 445]
-
Q. Yu and E. Kuh
In today's deep submicron technology, the coupling capacitances
among individual on-chip RC trees have essential
effect on the signal delay and crosstalk, and the interconnects
should be modeled as coupled RC trees. We provide
simple explicit formulas for the Elmore delay and higher
order voltage moments, and a linear order recursive algorithm
for the voltage moment computation for lumped and
distributed coupled RC trees. By using the formulas and
algorithms, the moment matching method can be efficiently
implemented to deal with delay and crosstalk estimation,
model order reduction and optimal design of interconnects.
-
On the Impact of On-Chip Inductance on Signal Nets under the Influence of Power Grid Noise [p. 451]
-
T. Chen
It has been well recognized that the impact of on-chip inductance
on some critical nets, such as clock nets, is significant and cannot be
ignored in delay modeling for these nets. However, the impact of
on-chip inductance on signal nets in general is still not well understood.
We present results of analyzing inductive effects on signal nets for
ultra-deep submicron technologies. The analysis is based on a Al-based
0.18 um CMOS process and a Cu-based 0.13 um CMOS process. The impact
of on-chip inductance is shown to be insignificant if we assume a perfect
power supply network around the interconnect routes. Otherwise, the
impact of on-chip inductance can be significant. Furthermore, the
results presented in this paper illustrate the impact of on-chip
inductance one would expect from transitioning from an Al-based
interconnect technology to a Cu-based interconnect technology.
Moderators: S. Yoo, TIMA, Grenoble, F; F. Wagner, UFRGS, BRZ
-
Timing Simulation of Digital Circuits with Binary Decision Diagrams [p. 460]
-
R. Ubar, A. Jutman, and Z. Peng
Meeting timing requirements is an important
constraint imposed on highly integrated circuits, and the
verification of timing of a circuit before manufacturing is
one of the critical tasks to be solved by CAD tools. In this
paper, a new approach and the implementation of several
algorithms to speed up gate-level timing simulation are
proposed where, instead of gate delays, path delays for
tree-like subcircuits (macros) are used. Therefore timing
waveforms are calculated not for all internal nodes of the
gate-level circuit but only for outputs of macros. The
macros are represented by structurally synthesized binary
decision diagrams (SSBDD) which enable a fast
computation of delays for macros. The new approach to
speed up the timing simulation is supported by
encouraging experimental results.
-
HALOTIS: High Accuracy LOgic TIming Simulator with Inertial and Degradation
Delay Model [p. 467]
-
P. Vazquez, J. Juan-Chico, M. Bellido, A. Acosta, and M. Valencia
This communication presents HALOTIS, a novel high
accuracy logic timing simulation tool, that incorporates a
new simulation algorithm based on different concepts for
transitions and events. This new simulation algorithm is
intended for including the inertial and degradation delay
models. Simulation results are very similar to those
obtained by electrical simulators, and show a higher
accuracy compared to conventional delay models
implemented in current logic simulators.
-
dlbSIM -- A Parallel Functional Logic Simulator Allowing Dynamic Load Balancing [p. 472]
-
K. Hering, J. Löser, and J. Markwardt
To meet the demanding time-to-market requirements in
VLSI/ULSI design, the acceleration of verification processes
is inevitable. The parallelization of cycle-based simulation
at register-transfer- and gate level is one facet in
a series of efforts targeted at this objective. We introduce
dlbSIM, a parallel compiled code functional logic simulator
that has been developed to run on loosely-coupled systems.
It has the ability to balance the application-specific load of
cooperating simulator instances in dependence of the overall
load situation on involved processor nodes. Thereby,
the load of a simulator instance is expressed in terms of
a set of circuit model parts which are to be simulated by
the corresponding instance. The centralized load management
runs simultaneously with a parallel simulation. Both
processes interact after a controllable number of simulated
clock-cycles to transmit load information and realize load
modifications. dlbSIM is successfully used to simulate IBM
S/390 processor models.
-
Architecture Driven Partitioning [p. 479]
-
J. Küter and E. Barke
In this paper, we present a new algorithm to partition
netlists for logic emulation under consideration of the
targeted emulator architecture. The proposed algorithm
allows the flexible use for a wide variety of applications
because the description of the architecture is part of the
input data. It combines a new approach of finding and
improving an initial solution with existing algorithms to
cluster the netlist and optimize the number of cut nets
between blocks. As a result, the algorithm ensures that the
cut nets between the created blocks can be connected
within the emulation system, even without a full interconnect
structure. Experiments on a number of designs and
architectures demonstrate that the algorithm is competitive
for architectures with full interconnect and that it is
unique for architectures with limited interconnect resources.
Moderator: C. Piguet, CSEM, Neuchatel, CH
-
Low-Power Systems on Chips (SOCs) [p. 488]
-
C. Piguet, M. Renaudin, and T. Omnès
For innovative portable products, Systems on Chips (SoCs)
containing several processors, memories and specialised
modules are obviously required. Performances but also
low-power are main issues in the design of such SoCs.
Are these low-power SoCs only constructed with low-power
processors, memories and logic blocks? If the latter are unavoidable,
many other issues are quite important for low-power
SoCs, such as the way to synchronise the communications
between processors as well as test procedures, on-line
testing, software design and development tools. This
paper is a general framework for the design of low-power
SoCs, starting from the system level to the architecture level,
assuming that the SoC is mainly based on the re-use of low-power
processors, memories and logic peripherals.
Moderators: H. Kerkhoff, Twente U, NL; J. Pineda de Gyvez, Philips Research, NL
-
Static and Dynamic Behavior of Memory Cell Array Opens and Shorts in Embedded
DRAMs [p. 496]
-
Z. Al-Ars and A. van de Goor
Fault analysis of memory devices using defect
injection and simulation is becoming increasingly important
as the complexity of memory faulty behavior increases.
In this paper, this approach is used to study the effects
of opens and shorts on the faulty behavior of embedded
DRAM (eDRAM) devices produced by Infineon Technologies.
The analysis shows the existence of previously defined
memory fault models, and establishes new ones. The
paper also investigates the concept of dynamic faulty behavior
and establishes its importance for memory devices.
Conditions to test the newly established fault models are
also given.
Key words: Embedded DRAM, functional fault models,
fault primitives, defect simulation, opens, shorts.
-
Definitions of the Numbers of Detections of Target Faults and their
Effectiveness in Guiding Test Generation for High Defect Coverage [p. 504]
-
I. Pomeranz and S. Reddy
The number of times a fault f in a combinational circuit is
detected by a given test set T was shown earlier to affect the
defect coverage of the test set. The earlier definition counted
each test in T, that detects f, as a distinct detection of f. This
definition counts two tests as distinct detections even if they
differ only in the values of inputs that do not affect the activation
or propagation of the fault. In this work, we introduce a stricter
definition that requires that two counted tests would be different
in the way they activate and/or propagate the fault. We describe
procedures for constructing test sets based on the stricter
definition, and compare them to test sets for the earlier, less strict
definition. The results show a simple criterion to decide when it
may be necessary to combine the two definitions in order to
obtain a high quality test set.
-
CMOS Open Defect Detection by Supply Current Test [p. 509]
-
M. Hashizume, M. Ichimiya, H. Yotsuyanagi, and T. Tamesada
In this paper, a new test method is proposed for
detecting open defects in CMOS ICs. The method is based
on supply current of ICs generated by applying time-variable
electric field from the outside of the ICs. The
feasibility of the test is examined by some experiments.
The empirical results promised us that by using the
method, open defects in CMOS ICs can be detected by
measuring supply current which flows when time-variable
electric field is applied.
-
Full Chip False Timing Path Identification: Applications to the
PowerPCTM Microprocessors [p. 514]
-
J. Zeng, M. Abadir, J. Bhadra, and J. Abraham
Static timing analysis sets the industry standard in the
design methodology of high speed/performance microprocessors
to determine whether timing requirements have
been met. Unfortunately, not all the paths identified using
such analysis can be sensitized. This leads to a pessimistic
estimation of the processor speed. Also, no amount of engineering
effort spent on optimizing such paths can improve
the timing performance of the chip. In the past, we demonstrated
initial results of how ATPG techniques can be used
to identify false paths efficiently[1]. Due to the gap between
the physical design on which the static timing analysis of
the chip is based and the test view on which the ATPG techniques
are applied to identify false paths, in many cases
only sections of some of the paths in the full-chip were analyzed
in our initial results. In this paper, we will fully analyze
all the timing paths using the ATPG techniques, thus
overcoming the gap between the testing and timing analysis
techniques. This enables us to do false path identification
at the full-chip level of the circuit. Results of applying our
technique to the second generation G4 PowerPCTM
will be presented.
Moderator: P. Wambacq, IMEC, B
-
CAD for RF Circuits [p. 520]
-
P. Wambacq, G. Vandersteen, J. Phillips, J. Roychowdhury, W. Eberle, B. Yang,
D. Long, and A. Demir
Wireless transceivers for digital telecommunications are
heterogeneous systems that combine digital hardware,
software and analog circuitry. The pressure to miniaturization
and lower power consumption for these transceivers
imposes tight specifications on their analog RF parts.
Many aspects of RF circuits cannot be simulated accurately
and efficiently with a classical circuit-level SPICE
approach. In this paper three important simulation problems
for RF circuits are addressed:
1. high-level simulation of analog and RF blocks for the
determination of the specifications of the circuits
2. accurate circuit-level simulation of nonlinear circuits
with time constants that differ largely,
3. efficient and accurate computation of phase noise in RF
oscillators
For each of these problems, solutions are proposed. These
solutions illustrate that accurate and efficient simulations
of RF communication circuits need a heterogeneous variety
of advanced algorithms.
Moderators: J. Lienig, Robert Bosch GmbH, D; A. Takahashi, Tokyo IT, JP
-
Modeling Crosstalk Noise for Deep Submicron Verification Tools [p. 530]
-
P. Bazargan-Sabet and F. Ilponse
In deep submicron technologies, the verification task has
to cover some new issues to certify the correctness of a
design. The noise produced by crosstalk couplings is one
of these emerging problems. In this paper, we propose a
model to evaluate the peak value of the noise injected on
a signal when its neighboring signals make their
transitions. This model has been used in a prototype
verification tool and has shown a satisfying performace-accuracy
ratio.
-
A Graph Based Algorithm for Optimal Buffer Insertion under Accurate Delay Models [p. 535]
-
Y. Gao and D. Wong
Buffer insertion is an efficient technique in interconnect optimization.
This paper presents a graph based algorithm for
optimal buffer insertion under accurate delay models. In our
algorithm, a signal is accurately represented by a finite ramp
which is characterized by two parameters, shift time and transition
time. Any accurate delay model, such as delay models based on
the transmission line model and SPICE simulations, can be incorporated
into our algorithm. The algorithm
determines the optimal number of buffers and their locations
on a wire such that some optimization objective is satisfied.
Two typical examples of such optimization objectives are minimizing
the 50% threshold delay and minimizing the transition time. Both
can be easily determined in our algorithm.
We show that the buffer insertion problem can be reduced to
a shortest path problem. The algorithm can be easily extended
for simultaneous buffer insertion and wire-sizing, and complexity
is still polynomial. The algorithm can also be extended to
deal with problems such as buffer insertion subject to transition
time constraints at any position along the wire.
-
Repeater Block Planning under Simultaneous Delay and Transition Time Constraints [p. 540]
-
P. Sarkar and C. Koh
We present a solution to the problem of repeater block planning
under both delay and signal transition time constraints for a given
floorplan. Previous approaches have considered only meeting the
target delay of a net. However, it has been observed that the repeater
planning for meeting the delay target can cause signals on
long interconnects to have very slow transition rates. Experimental
results show that our new approach satisfies both timing constraints
for an average of 79% of all global nets for six MCNC benchmark
floorplans studied (at 1GHz frequency), compared with an average
of 22% for the repeater block planner in [11].
Moderators: V. Meyer zu Bexten, Atmel Germany GmbH, D; E. Barke, Hannover U, D
-
On-The-Fly Layout Generation for PTL Macrocells [p. 546]
-
L. Macchiarulo, L. Benini, and E. Macii
Pass transistor logic (PTL) has been recently proposed as
an alternative to standard MOS for aggressive circuit design.
Even though PTL has been successful in a few handcrafted designs,
its acceptance into mainstream digital design critically depends
on the availability of tools for logic
and physical synthesis and optimization. The automatic
synthesis of pass transistor circuits starting from BDDs has
been intensively studied in the past with promising results,
but back-end tools for PTL cell generation are still missing.
We describe an automatic layout generator that has
been designed for seamless integration in a library-free PTL
design flow. The generator exploits the distinctive characteristics
of pass transistor networks produced by synthesis
to achieve quality of results comparable with state-of-the
art commercial cell generation tools in a fraction of the
execution time.
-
Automatic Datapath Tile Placement and Routing [p. 552]
-
T. Serdar and C. Sechen
We report the very first fully automatic datapath tile
layout flow. We subdivided the placement process into
two steps: a global placement step using simulated annealing,
and a new detailed placement step based on extensive
modifications we made to the O-tree algorithm.
The modifications have enabled the extended O-tree algorithm
to handle the rectilinearly shaped transistor
chains and gates common in datapath tile layout. We
show that datapath tiles can be placed and routed automatically
at the transistor level or at the mixed transistor/
gate level, achieving results for the very first time that
are competitive to those obtained manually by a skilled
designer.
-
A Boolean Satisfiability-Based Incremental Rerouting Approach with Application
to FPGAs [p. 560]
-
G. Nam, K. Sakallah, and R. Rutenbar
Incremental redesign is an increasingly essential step in
any complex design. Late changes or corrections in
functional specifications (so-called "engineering change
orders" or ECOs) force us to search for a minimal
perturbation that achieves the desired repair. In
reconfigurable design scenarios, these incremental
repairs may be in response to physical faults: the goal is
to "design around" the fault. For FPGAs, incremental
rerouting is an essential component of this repair
problem. We develop a new incremental rerouting
algorithm for FPGAs using techniques from Boolean
Satisfiability (SAT). In this application, these techniques
have the twin virtues that they (1) represent all possible
routing (and rerouting) constraints simultaneously and
exactly, and (2) search for rerouting solutions by
perturbing all nets concurrently. Preliminary results are
promising. For several FPGA benchmarks, we were able
to reroute fault reconfigurations that perturb up to 5.74%
of all nets for a small number of fault sets (one to four
faults) with only 1.55 track overhead per channel on
average, with CPU time 0.76 to 4.91 seconds/fault.
Moderators: J. Plantin, Ericsson Radio Systems, SE; L. Lavagno, Udine U, IT
-
Dual Transitions Petri Net Based Modelling Technique for Embedded Systems
Specification [p. 566]
-
M. Varea and B. Al-Hashimi
This paper presents a new modelling technique capable of modelling
both control and data information using a single unified
approach. This is achieved by modifying the classical
Petri Net structure, allowing it to have two types of transitions
and arcs. As a consequence, loops and conditional operations
within complex specifications are easily identified. The system
dynamic behaviour is modelled using a new marking scheme
of the net consisting of a new element called value for data
representation in addition to classical tokens used for control
purpose. Structural definitions, behavioural rules and graphical
representation of the new modelling technique are given.
One potential application of the proposed modelling technique
is the internal representation of embedded systems specification.
Two examples are included illustrating the applicability
and efficiency of the proposed modelling technique.
-
Probabilistic Application Modeling for System-Level Performance Analysis
[p. 572]
-
R. Marculescu and A. Nandi
The objective of this paper is to introduce the Stochastic
Automata Networks (SANs) as an effective formalism
for application modeling in system-level analysis. More precisely,
we present a methodology for application modeling for
system-level power/performance analysis that can help the
designer to select the right platform and implement a set of
target multimedia applications. We also show that, under various
input traces, the steady-state behavior of the application
itself is characterized by very different 'clusterings' of the
probability distributions. Having this information available,
not only helps to avoid lengthy profiling simulations for predicting
power and performance figures, but also enables efficient
mappings of the applications onto a chosen platform.
We illustrate the benefits of our methodology using the
MPEG-2 video decoder as the driver application.
Keywords: system-level design, performance analysis, application
modeling, stochastic automata networks, embedded
multimedia systems.
-
Reliable Estimation of Execution Time of Embedded Software [p. 580]
-
P. Giusto, G. Martin, and E. Harcourt
Estimates of execution time of embedded software play
an important role in function-architecture co-design. This
paper describes a technique based upon a statistical approach
that improves existing estimation techniques. Our
approach provides a degree of reliability in the error of the
estimated execution time. We illustrate the technique using
both control-oriented and computational-dominated benchmark
programs.
Moderators: M. Renovell, LIRMM, F; B. Kruseman, Philips Research, NL
-
Implementation of a Linear Histogram BIST for ADCs [p. 590]
-
F. Azaïs, S. Bernard, Y. Bertrand, and M. Renovell
This paper validates a linear histogram BIST scheme for
ADC testing. This scheme uses a time decomposition
technique in order to minimize the required hardware
circuitry. A practical implementation is described and
the structure together with the operating mode of the
different modules are detailed. Through this practical
implementation, the performances and limitations of the
proposed scheme are evaluated both in terms of
additional circuitry and test time.
-
Test Generation Based Diagnosis of Device Parameters for Analog Circuits [p. 596]
-
S. Cherubal and A. Chatterjee
With the increasing complexity of manufacturing processes and
the shrinking of device geometries, the performance metrics of
integrated circuits (ICs) are becoming increasingly sensitive to
random fluctuations in the manufacturing process. We propose a
diagnosis methodology that can be used to infer the cause(s) of
variations in performance of analog ICs. The methodology consists
of (a) a device parameter computation technique which is
used to compute the device parameters of an IC from measurements
made on it and (b) a cause-effect analysis module that is
used to compute the cause of the variation in performance metrics
of a given set of ICs. Simulation results to demonstrate the effectiveness
of the technique are presented.
-
Generation of Optimum Test Stimuli for Nonlinear Analog Circuits Using
Nonlinear Programming and Time -Domain Sensitivities [p. 603]
-
B. Burdiek
In this paper a novel approach for the generation of an
optimum transient test stimulus for general analog circuits
is proposed. The test stimulus is optimal with respect to the
detection of a given fault set by means of a predefined fault
detection criterion. The problem of finding an optimum test
stimulus detecting all faults from the fault set is formulated
as a nonlinear programming problem. A functional
describing the differences between the good and all faulty
test responses of the circuit serves as a merit functional for
the programming problem. A parameter vector completely
describing the test stimulus is used as the optimization
vector. The gradient of the merit functional required for the
optimization is computed using time-domain sensitivities.
Since in this approach the evaluation of the fault detection
criterion represented by the merit functional flows directly
into the computation of the test stimulus, optimal test stimuli
for hard to detect faults can be generated. If more than one
input terminal is used for testing, several test stimuli can be
generated simultaneously.
Organizer: D. Davis, Actel, USA
Moderator: R. Wilson, EETimes, USA
Panellists: T. Kambe, Sharp, JP; B. Gupta, STmicroelectronics, USA;
C. Balough, Triscend, USA; Y. Tanurhan, Actel, USA
-
Managing the SoC Design Challenge with "Soft" Hardware [p. 610]
-
R. Wilson
Panel members will discuss, from their individual
perspectives, why embedded reconfigurability has
become critical to the future success of systems-on-a-chip
and how they are attempting to implement solutions.
The Opportunity: Implementing reconfigurable logic
within SoCs will also help to expand and differentiate
members of product families as well as extend product
lifecycles and reduce design and test cycles, thus
shortening product time to market. Having
reconfigurability in system-on-a-chip silicon will increase
design flexibility by allowing re-use of design elements to
create differentiated products. Changing or revising logic
elements on the fly via reconfigurability to meet changes
in standards or features or to fix design errors will help
avoid increasingly expensive NRE re-spins.
Moderators: J. Henkel, NEC, USA; R. Leupers, Dortmund U, D
-
Integrated Hardware-Software Co-Synthesis and High-Level Synthesis for Design i
of Embedded Systems under Power and Latency Constraints [p. 612]
-
A. Doboli
This paper presents an integrated approach to hardware-software
co-synthesis and HLS for design of low-power embedded
systems. The main motivation for this work is that
fine trade-offs between latency and power can be explored at
the system level only with a detailed knowledge of used hardware
resources. Integrated method was realized as a simulated
annealing based solution-space exploration. Exploration
is guided by Performance Models, that exactly capture
the relationship between performances i.e. power consumption
and latency and design decisions i.e. binding and
scheduling. The proposed approach permits not only a more
accurate latency and power estimation but also the exposure
of RTL-level design decisions at the system level. As a result,
more effective power-latency trade-offs are possible during
co-synthesis as compared to traditional task-level methods.
-
Allocation and Scheduling of Conditional Task Graph in Hardware/Software
Co-Synthesis [p. 620]
-
Y. Xie and W. Wolf
This paper introduces an allocation and scheduling algorithm
that efficiently handles conditional execution in
multi-rate embedded system. Control dependencies are introduced
into the task graph model. We propose a mutual
exclusion detection algorithm that helps the scheduling
algorithm to exploit the resource sharing. Allocation
and scheduling are performed simultaneously to take advantage
of the resource sharing among those mutual exclusive
tasks. The algorithm is fast and efficient,and so is suitable
to be used in the inner loop of our hardware/software
co-synthesis framework which must call the scheduling routine
many times.
-
Code Placement in Hardware Software Co -Synthesis to Improve Performance and
Reduce Cost [p. 626]
-
S. Parameswaran
This paper introduces an algorithm for code placement in
cache, and maps it to memory using a second algorithm. The
target architecture is a multiprocessor system with 1st level
cache and a common main memory. These algorithms
guarantee that as many instruction codewords as possible of
the high priority tasks remain in cache all of the time so that
other tasks do not overwrite them. This method improves the
overall performance, and might result in cheaper systems if
more powerful processors are not needed. Amount of memory
increase necessary to facilitate this scheme is in the order of
13%. The average percentage of highest priority tasks always
in memory can vary from 3% to 100% depending upon how
many tasks (and their sizes) are allocated to each processor.
-
System-On-A-Chip Processor Synchronization Support in Hardware [p. 633]
-
B. Saglam and V. Mooney III
For scalable-shared memory multiprocessor System-on-a-Chip
implementations, synchronization overhead
may cause catastrophic stalls in the system. Efficient
improvements in the synchronization overhead in terms of
latency, memory bandwidth, delay and scalability of the
system involve a solution in hardware rather than in
software. This paper presents a novel, efficient, small and
very simple hardware unit that brings significant
improvements in all of the above criteria: in an example,
we reduce time spent for lock latency by a factor of 4.8,
the worst-case execution of lock delay in a database
application by a factor of more than 450. Furthermore,
we developed a software architecture together with RTOS
support to leverage our hardware mechanism. The worst-case
simulation results of a client-server example on a
four-processor system showed that our mechanism
achieved an overall speedup of 27%.
Moderators: K. Buchenrieder, Infineon Technologies, D; H. Grünbaecher,
Carinthia Tech. Inst., Villach, A
-
A Decade of Reconfigurable Computing: A Visionary Retrospective [p. 642]
-
R. Hartenstein
The paper surveys a decade of R&D on coarse
grain reconfigurable hardware and related CAD, points out
why this emerging discipline is heading toward a dichotomy
of computing science, and advocates the introduction of a
new soft machine paradigm to replace CAD by compilation.
-
Hierarchical Memory Mapping during Synthesis in FPGA -Based Reconfigurable
Computers [p. 650]
-
I. Ouaiss and R. Vemuri
One step in the synthesis for FPGA-based Reconfigurable
Computers (RCs) involves mapping the design data
structures onto the physical memory banks available in the
hardware. The advent of Xilinx Virtex-style FPGAs and of
hierarchical memory schemes on reconfigurable boards introduced
an added complexity to this mapping. The new
RC boards offer a wealth of memory banks many of them
on-chip (such as the BlockRAMs available in the Virtex architecture)
and many of them offering variable number of
ports and several depth/width configurations. Along with
the external RAMs, a hierarchy of memories with varying
access performances are available in a reconfigurable computer.
It becomes critical to perform a good mapping to
achieve optimal design performance. This paper presents
an automatic memory mapping methodology which takes
into account: the number of words and word size of design
data segments and physical memory banks, number of
ports on the banks, access latency of the banks, proximity of
the banks to the processing unit, life cycle analysis of data
segments, and it also incorporates configuration selection
from the multiple configurations available in BlockRAMs of
Virtex series FPGAs. In the case of multiple processing elements
on board, the paper also provides a framework in
which the task of memory mapping interacts with spatial
partitioning to provide the best implementation.
-
Optimal FPGA Module Placement with Temporal Precedence Constraints [p. 658]
-
S. Fekete, E. Köhler, and J. Teich
We consider the optimal placement of hardware modules
in space and time for FPGA architectures with reconfiguration
capabilities, where modules are modeled as
three-dimensional boxes in space and time. Using a graph-theoretic
characterization of feasible packings, we are able
to solve the following problems:
(a) Find the minimal execution time of the given problem
on an FPGA of fixed size,
(b) Find the FPGA of minimal size to accomplish the tasks
within a fixed time limit.
Furthermore, our approach is perfectly suited for the treatment
of precedence constraints for the sequence of tasks,
which are present in virtually all practical instances. Additional
mathematical structures are developed that lead to a
powerful framework for computing optimal solutions. The
usefulness is illustrated by computational results.
Moderators: P. Marwedel, Dortmund U, D; Z. Peng, Linkoping U, SE
-
Generation of Minimal Size Code for Schedule Graphs [p. 668]
-
C. Passerone, Y. Watanabe, and L. Lavagno
This paper proposes a procedure for minimizing the code
size of sequential programs for reactive systems. It identifies
repeated code segments (a generalization of basic blocks to directed
rooted trees) and finds a minimal covering of the input
control flow graphs with code segments. The segments are disjunct,
i.e. no two segments have the same code in common.
The program is minimal in the sense that the number of code
segments is minimum under the property of disjunction for the
given control flow specification.
The procedure makes no assumption on the target processor
architecture, and is meant to be used between task synthesis
algorithms from a concurrent specification and a standard
compiler for the target architecture. It is aimed at optimizing
the size of very large, automatically generated flat code,
and extends dramatically the scope of classical common sub-expression
identification techniques.
The potential effectiveness of the proposed approach is
demonstrated through preliminary experiments.
-
Generating Production Quality Software Development Tools Using a Machine
Description Language [p. 674]
-
A. Hoffmann, A. Nohl, S. Pees, G. Braun, and H. Meyr
This paper presents a methodology to automatically generate
production quality software development tools for
programmable architectures using the machine description
language LISA. Various architectures presenting diverse
architectural originalities will be presented and the feasibility
of automatically generating simulator, assembler, linker
and graphical debugger frontend will be discussed. The
presented approach is not limited to a fixed abstraction level
-- case studies of the Texas Instruments C62x and C54x, the
Analog Devices ADSP2101 as well as the ARM7 will show
the applicability of the methodology from cycle/phase to instruction
accurate models.
-
Automatic Generation and Targeting of Application Specific Operating Systems
and Embedded Systems Software [p. 679]
-
L. Gauthier, S. Yoo, and A. Jerraya
We propose a method of automatic generation of application
specific operating systems (OS's) and automatic targeting
of application software. OS generation starts from a
very small but yet flexible OS kernel. OS services, which are
specific to the application and deduced from dependencies
between services, are added to the kernel to construct the
whole OS. Communication and synchronization functions
in the application code are adapted to the generated OS. As
a preliminary experiment, we applied the proposed method
to a system example called token ring system.
-
Cache Conscious Data Layout Organization for Embedded Multimedia Applications
[p. 686]
-
C. Kulkarni, C. Ghez, M. Miranda, F. Catthoor, and H. De Man
Cache misses form a major bottleneck for real-time multimedia applications
due to the off-chip accesses to the main memory. This results in both a
major access bandwidth overhead (and related power consumption) as well
as performance penalties. In this paper, we propose a new technique for
organizing data in the main memory for data dominated multimedia applications
so as to reduce majority of the conflict cache misses. The focus of this
paper is on the formal and heuristic algorithms we use to steer the data
layout decisions and the experimental results obtained using a prototype
tool. Experiments on real-life demonstrators illustrate that we are
able to reduce up to 82% of the conflict misses for applications that
are already aggressively transformed at the source-level. At the same
time, we also reduce the of-chip data accesses by up to 78% and combined
with address optimizations we are able to reduce the execution time. Thus
out approach is complimentary to the more conventional way of reducing misses
by reorganizing the execution order.
Organizer and Moderator: G. Gielen, KU Leuven, B
Panellists: B. Sorensen, Atrium Design Solutions; H. Casier, Alcatel Microelectronics, B;
P. Magarshack, STMicroelectronics, F; J. Rodriguez, Anacad; J. Pollet, Dolphin, F
-
Design Challenges and Emerging EDA Solutions in Mixed-Signal IC Design [p. 694]
With increasing integration levels, more and more ICs
and systems-on-chip turn into mixed-signal designs.
Typical examples are telecom (Bluetooth, WLAN,
xDSL...and multimedia (digital video, MP3 audio...)
systems. This hot topic session will explore the
challenges that designers face with these mixed-signal
designs, covering both technical and methodological
challenges as well as engineering resource and skill
shortage problems. On the technical side, basic challenges
are in incorporating analog design in a digital-oriented
system design flow, signal integrity problems (supply and
substrate noise, crosstalk...), trailing analog design
productivity and test. In addition, the session will discuss
the emerging progress in the methodology and EDA field,
ranging from new software startups to analog and mixed-signal
IP providers.
The session will start with a brief tutorial overview
about the problems and emerging solutions in the mixed-signal
domain, for the audience to get an update of the
current state of the art in mixed-signal. This will be
followed by a panel discussion, where the goal for the
audience is to really explore where the unaddressed
problems are in mixed-signal design and which problems
are today close to being solved commercially in this
dynamically moving market. Issues addressed by the
panel members include the integration of analog and
mixed-signal IP, the emergence of mixed-signal CAD
tools including behavioral modeling and simulation as
well as analog synthesis, the challenge of rapid
technology changes and analog design retargeting, the
mixed-signal signal integrity nightmare, the rise of
specialized mixed-signal design companies, single-chip
versus single-package integration, the trimming of analog
courses in many recently restructured EE curricula and
the shortage of analog designers.
Organizers/Moderators: W. Rosenstiel, FZI/Tübingen U, D; Y. Nakamura,
Kyoto U, JP
Speakers: H. Tago, System LSI R&D Center, Toshiba Semiconductor Company;
A. Mandapati, ATI Research Inc (Subsidiary of Nintendo in the US);
S. Narita, Advanced Microcomputer Business Operation, System LSI Business Division, Hitachi Ltd.
-
CPU for PlayStation®2 [p. 696]
-
H. Tago, K. Hashimoto, N. Ikumi, M. Nagamatsu, M. Suzuoki, and Y. Yamamoto
Processors designed for computer entertainment must
perform 3D graphics calculations, especially geometry
and perspective transformations. In the PlayStationR2, we
introduced the new idea of synthesizing emotion called
Emotion Synthesis and devised a new processor
architecture to support its graphics demands. The
architecture is embodied in the PlayStationR2's "Emotion
Engine" CPU, which uses vector units (VUs) as the key
units for floating-point calculations. Emotion synthesis
means the real-time synthesis of a computer graphics
animation scene that projects a great deal of atmosphere.
For example, when a female character walks into a video
game scene, her motion must be determined by solving
physical equations in response to interactive events
instead of replaying prerecorded data. Moreover,
differential equations with a large number of variables
must be used to describe, for example,
the waving motions of her hair in a breeze. For
authenticity in emotion synthesis, the CPU must execute
these calculations in real time. "Emotion Engine" ("EE")
is a system LSI including a 300MHz 128-bit 2-way
superscalar RISC core, two Vector Units ("VU"s), Image
Processing Unit ("IPU") for MPEG-2 stream decode, a
10-channel memory access (DMA) controller, two
channel RambusR memory controller (RAC) and other
peripheral modules. 13.5M transistors are integrated on
15.02mm x 15.04mm die with 0.25um device technology
with 0.18um gate length. Design strategy and LSI design
methodologies and CAD for "Emotion Engine" LSI are
presented with emphasis on practical aspects of
verification and timing closure. A combination of
simulation, emulation and formal verification ensured the
functional first silicon for system evaluation. In order to
control wire delay in early design stage, floor-plan based
synthesis and wire load estimation are adopted for quick
timing closure.
-
Implementation of the ATI Flipper Chip [p. 697]
-
A. Mandapati
The Nintendo GameCube(tm) video game console
system is designed to outpace all other such systems
when released. Formerly known by the codename
Dolphin, this system includes an IBM PowerPC(tm)
processor and specialized hardware from ATI. This
specialized hardware is embodied in ATI's Flipper chip,
the centerpiece in the Dolphin design. Flipper functions
as the graphics processor, audio processor, host
controller, memory controller, and I/O processor of the
Dolphin system. Such a complex chip requires a very
robust design flow to get to functioning silicon in as little
time as possible. Here we will describe that design flow,
developed by ATI engineers to implement the Flipper
design. The goal was to develop a flow to implement the
best gaming hardware on a chip that needed to be as cost-effective
as possible. There were many challenges the
design offered, requiring optimal use of a small design
team with a minimal budget to achieve aggressive
schedules. The biggest challenge the team was presented
was that of area. With high volumes, chips for consumer
devices can benefit greatly from smaller die sizes, due in
part to higher yields and also in part to lower power and
cheaper packages. Another daunting challenge the design
offered was that of the use of embedded DRAM. The
Dolphin architecture called for the use of an embedded
frame buffer and texture memory buffer for fast access.
-
SH-4 RISC Microprocessor for Multimedia, Game Machine [p. 699]
-
S. Narita
The SH-4 is a 2-issue superscalar 32-bit RISC
microprocessor for SEGA's game machine, Dreamcast.
In order to extend the floating-point performance, a
graphic FPU and graphic-oriented instructions are
provided. The performance is 360 VAX MIPS, 6.0M
Polygons/sec, 1.4G FLOPS(peak with the new
instructions) at 200MHz.
Moderators: A. Oliveira, IST/INESC, PT; E. Macii, Politecnico di Torino, IT
-
Streaming BDD Manipulation for Large-Scale Combinatorial Problems [p. 702]
-
S. Minato and S. Ishihara
We propose a new BDD manipulation method that never
ca |