\ EEE4120F - High Performance Embedded Systems

Hall of Fame

This page provides some of the best YODA projects from the previous few years. The projects are presented here with permission from their developers. Since 2016 the projects have been given awards according to: Best Prototype (what the judges agreed to be the most impressive and functional prototype), Best Concept (the most innovative or creative concept), and Best Paper (the paper/report that was the best quality, explained the project and experiments well). Note that these award are not necessarily always who got the highest marks (for best report it may be on the mark) but for best prototype and best concept while the project would have got an overall high mark it wouldn't necessarily have got the highest mark.

2020

Project Image Revenge of the Synth
Polyphonic Audio Synthesis using a Field-Programmable Gate Array

BEST PROTOTYPE
Nicolas Reid, Callum Tilbury and Justin Wylie

This paper details the implementation of a classic direct digital synthesis (DDS) algorithm on a Xilinx Artix-7 field-programmable gate array (FPGA), specifically looking at its performance in synthesizing polyphonic audio. After a brief history of these topics is presented, a more robust description of the relevant signal theory is discussed. Thereafter, the implementation details using Verilog - a hardware description language—are unpacked in depth, eventually leading to a simple prototype of a digital synthesizer, using solely the FPGA board. A so-called ‘golden measure’ is also developed, which runs on a conventional microprocessor, the Teensy 3.6. By contrasting these two approaches, a thorough analysis of the advantages and drawbacks of hardware acceleration in this context is explored— looking at a range of factors, including the polyphonic ‘voice’- count capabilities, the achievable output frequencies, power consumption, efficiency, and cost. Through these analyses, it is concluded that the FPGA offers a attractive—perhaps commercially untapped—solution to modern audio synthesis, boasting benefits of scalability, reconfigurability, and high parallelization

Downloads:  Paper (PDF)   Youtube Demonstration  

Project Image PRNG
Parallel Random Number Generator

BEST REPORT
Noel J Loxton, Keenan Robinson and Mauro G Borrageiro

The purpose of this paper is to investigate a pseudo-random number generation technique known as Linear Feedback Shift Registers (LFSR). This random number generator is implemented using a standard serial program coded in C++ and subsequently produced in a hardware-accelerated version of the algorithm, which would be run on a Field Programmable Gate Array (FPGA), namely the NEXYS A7 Artix 100-t from Digilent®. The latter is implemented using a form of parallel programming in Verilog Code and simulation. The two implementations are tested and compared to evaluate process speed-up. Further details regarding statistical tests are given, which are performed to ensure optimal performance of the algorithm. The digital accelerated parallel implementation of the LFSR random number generator produced random data-sets with huge speed-ups compared to the Golden Standard. The data-sets were acceptably random but their entropy needs to be improved in order to widen the applications of this parallel random number generator

Downloads:  Paper (PDF)
LEIA LEIA
Lane Exit Identification and Alert
 
BEST CONCEPT
Samantha Ball and Jason Pilbrough

This paper aims to develop an embedded vision prototype to detect road lanes and markings in real-time. The computational complexity of lane detection is a significant barrier for real-time detection on traditional CPUs. A FPGA based architecture is presented to address this bottleneck by accelerating the image processing pipeline. Such a system has application in the field of autonomous vehicles and could form part of a Lane Keep Assist System (LKAS) or a larger Advanced Driver Assistance System (ADAS). The proposed design utilised the Hough Transform for straight line detection and was successfully implemented on the FPGA. A working hardware prototype was developed and achieved a processing time of less than 23.1 ms for each frame, sufficient for real-time application at 43.3 fps. This resulted in a speedup of 10.1 over an equivalent software implementation. The prototype was also able to correctly predict lane departures by analysing the slope of the detected lane markings in the image to identify when the vehicle was about to drift into a neighbouring lane. The robustness of the hardware design was demonstrated by testing the prototype at night in difficult conditions.

Downloads: Paper (PDF)


2017

Project Image FPGA Accelerated Neural Network:
  The FANN(tom) Menace  (P20)

BEST PROTOTYPE
  Kiuran Naidoo, Othniel Konan and Luke Schwartzkopff

The FANNtom Menace project is an attempt to create an FPGA accelerated neural network. The concern is not to facilitate the training of neural networks on the FPGA but rather to use traditional methods to train a model that can be verified and run on a PC. This model, once trained to an acceptable level of accuracy, should then be used to configure the FPGA using appropriate synthesisable code. Due to the highly parallel nature of a neural network a significant speedup is expected here over traditional computational hardware. Not only do we expect the FPGA to improve on latency due to each layer of the neural network being able to be computed in parallel, we also expect a great increase in throughput due to each layer of the neural network acting as a pipeline stage on the FPGA. The example used to test our implementation will be the classification of an image as either a cat or a dog (a binary classifier). The speedup of this solution will be compared to a golden measure implemented on a PC - specifically our C++ implementation setup and compiled for an ordinary x86 CPU. Both latency as well as throughput will be considered as well as the resource consumption on the FPGA (amount of BRAM required etc).

Downloads:  Paper (PDF)   Description   project_unavail

Project Image PADAWAN:
Parallel Accelerator for Digitising Audio with Attenuation of Noise (P18)

BEST REPORT
  James Cushway, George De Kock
  and
Johan Jansen Van Vuuren

This paper details the design and methodology for building a hardware accelerator for parallel digital audio processing. The Nexys 4 DDR evaluation board with a Xilinx Artix 7 FPGA is used for the implementation of the accelerator. The hardware accelerator is used to digitise an analog audio signal using a 12-bit ADC, and perform digital filtering thereon to create numerous audio effects. A PWM output, obtained by combining the results of the parallel filtering operations, is sent to a mono audio output port. Effects include echo, chorus, reverberation and distortion, specifically overdrive. The intensity of effects are designed to be dynamically adjustable using switches found on the Nexys 4 DDR board. The filters designed for the board are additionally implemented in Octave as a golden measure, and were tested on various audio signals to quantify performance, with which the FPGA performance is compared and evaluated. The paper also details the effects of bit truncation on audio, and the use of dithering and noise shaping to rectify said effects.

Downloads:  Paper (PDF)   Description   project_unavail
Project P19 Image Direct Digital Synthesis (DDS) (P19) 
BEST CONCEPT
 Warren Fletcher, Kyle Harrison,
 Sean Le Roux and Roan Song

A brief overview of Direct Digital Synthesis systems is described as well as the background to some DDS systems and DDS systems in general. A modified version of the traditional DDS system is presented for use within the audio range of frequencies. Every western note (12-tone) is designed and implemented on the Nexys4 DDR FPGA. Multiple waveforms (Sinusoidal, Square, Sawtooth, Triangle) are created with 1024 Samples, each at 9-bit resolution. The resulting waveforms are presented to the listener via PWM coupled with a Low Pass Filter. The output is compared against simulated results and was found to be successful.

Downloads: Paper (PDF)  Description  project_unavail


2016

Project Image Motion Estimated Frame Interpolation (G3)
BEST PROTOTYPE
Shaylin
Chetty, Ross Macarthur and
Michael Wood

A concept design description for an implementation, within  the  realm  of  high  performance  computing  hardware, of motion estimated frame interpolation. The resulting concept design is embedded in a capable output screen and receives input from  a  sub-par  frame-rate  HDMI  video  stream.  A  prototype is  developed  that  demonstrates  a  simplified  proof  of  concept involving  frame  averaging  as  a  method  of  interpolation.  The prototype is employed on an FPGA with a VGA output.

Downloads: Paper  Slides  Project (ISE)

Project Image Message Digest 5 (MD5) Hash Reversal System (G2)
BEST CONCEPT
Gareth Callanan, Matthew Smith and Jean Swart

This  paper  describes  the  design  and  testing  of  a device  created  to  reverse  the  effects  of  the  Message-Digest  5 (MD5) Hash Function in a massively parallel manner. The device will  be  used  to  find  the  original  data  used  to  generate  the 128-bit  MD5  hash.  The  system  is  initially  tested  on  a  Nexys  4 FPGA platform with the understanding that the project can later be  implemented  on  an  ASIC  platform  for  massively  improved performance.
It  was  found  that  the  prototype  FPGA  system  resulted  in significantly  faster  hash  reversal  than  the  golden  measure,  a parallel system  running  on  a  CPU.  This  result  was  obtained using only one solving module and can be expanded in parallel.

Downloads: Paper  Slides  Project (Python + ISE)

Project Image Arithmetic Sequence Generator (G8)
BEST REPORT
Asif Parker and Philippus Scholtz

In this project, an accelerator is built that generates an arithmetic series of the form a(n) = a1 + (n-1)d . The parameters are specified as inputs (a1, d and n). The device writes the first n entries of the sequence into memory after which it asserts a 'Done' output.

Downloads: Paper  Slides  Project (C, ISE)


2014

Project Image PADAWAN - Parallel Accelerator for Digitising Audio with Attenuation of Noise P18
Hankyu Kim, Sarah Newnham and Lloyd Kammies

*Abstract*

Project Image VADER - Versatile Accelerated Digital Encryption Recovery P17, 2014
Anurag Arnab and Michael Seymour

*Abstract*

Project Image BSS - Bit Sequence Sniffer P09, 2014
Anant Dole and Nicholas Hoernle

*Abstract*

Project Image PSA - Pattern Seek Accelerator P15, 2014
Adam Todes, Devin Norman and Glenn Madzonga

*Abstract*

2013

Project Image IMA - Image Masking Accelerator P13, 2013
Justin Coetser and Daniela Massiceti

*Abstract*

Project Image DE - Data encryption accelerator P15, 2013
Matthew Cawood, Kholofelo Moyaba and Olof Tingstam Peterson

*Abstract*

Project Image BCDC - Binary Coded Decimal Converter P09, 2013
Nyiko Ndhambi and Penlope Yaguma

*Abstract*

Project Image SPA - Shortest Path Accelerator P19, 2013
Justin Coetser and Daniela Massiceti

*Abstract*

2012

Project Image DEATHSTAR - Data Encryption Accelerator That Handles Security Tasks Anytime Readily P15, 2012
Daniel Galasko, Lynn Assimwe and Victor Mushabe

The objective the Data Encryption Accelerator That Handles Security Tasks Anytime Readily (DEATHSTAR) is to accelerate the process of encrypting a data stream. First the rst (reset) control line is clocked, this causes the DEA to reset any internal buffers and storage. Next, an encryption key is set by writing the (a single byte value) of the key to the din (data input) 8-bit data input bus, and then sending a positive edge to the kset (key set) control line. The dclk (data clock) is kept low. After this, the actual encryption starts, which involves iterating through all the data, one byte at a time, by writing a byte of data to the din input lines, then clock dclk clock line. The encrypted data is (pretty much) immediately available on the DEA's dout 8-bit output lines. The latter process is repeated for each item of data. To start with a simple XOR encryption is used as the encryption method. If time permits, this is taken further using a repeating pattern based on the key input.

Project Image PADAWAN - Parallel Accelerator for Digitising Audio With Attenuation of Noise P17, 2012
Lloyd Hughes, Stephen Jermy and Ross Engers

The problem to be solved is performing filtering and processing on audio in near real-time using a FPGA. Real-time filtering can be achieved in software, however when many filters are cascaded one after the other there will be a delay between audio input and output. For many applications this is an undesirable side-effect, stopping the system from being real time. Some types of filters that can be implemented are: Low-pass, high-pass, band-pass and band-stop. Some effects that can be implemented are: Echo, flange, chorus, reverb, vibrato, phaser, delay and distortion. (Why the acronym you might ask? Kind of obvious from this reference.)

Project Image VADER - Versatile Accelerated Digital Encryption Recovery P16, 2012
Daniel Donaldson and Gregory Ireland

The VADER system is a digitally accelerated add-on hardware device designed to recover passwords using an acquired hashed password and hashing function. Uses of such a device could be to speed up computationally expensive recovery of passwords for forensic purposes such as instances where a victim or suspects password protected information could assist in an investigation. In order for the system to begin the specific hashing function used to create the password hashes would need to be acquired; it is assumed that this is available and many commonly used hashing functions are indeed widely available. The hashed version of the password itself would also need to be acquired. The system will run parallelized functions on an FPGA to accelerate the recovery of the password. The system will first run a dictionary type cracking attempt and following this if it is unsuccessful a brute force algorithm will be applied. Click on the VADER project details to find out more, or download the whole project zip file from the link below.