Saturday, July 13, 2013

Evaluating new tools/systems for project development.

Tag: SVN vs Perforce

I have to make a decision to select between SVN, Git and Perforce when creating a new project in Assembla. I know SVN well in previous work. I have been using Perforce extensively for work in the past 9 months. I have never type in a single Git command yet.

The final decision is Perforce. I concern a bit about the network requirement of Perforce due to it centralised workflow. I know how to make local changes without network and reconcile the workspace  later when network is available. Still this is not normal in the Perforce world.

Git is the first one out. The documents and terminologies are just not fit for my brain. And I don't need all this distributed feature. I am foreseeing only a handful of developers collaborating in the project. I don't think this Git thing is going to work for my future projects either. Just my personal opinion.

SVN is good enough. I don't think, as a version control backend, there is anything missing in SVN for me. The issue is on the front end side. I am now so used the the pretty UI in Perforce when tracking changes. I perform all the actual version control work in command line. But when it comes to 'Who had done what in where at when?' and about 'Where is the origin of this code and how it gets here?', the Perforce UI is a clear win.

There are front ends for SVN providing similar feature but the good ones are all paid software. Then why should I not using Perforce as a single, well integrated package?

During the decision making process, I come across somebody's suggestions which should also be valid for most tool/system selection situations. I will keep these in mind in the future.


  1. performance on typical usage, especially for remote sites
  2. resource requirements
  3. importance of the changes needed in the working habit
  4. support availability and cost

Friday, July 12, 2013

A serial bus for core configuration

In many of my pervious work using FPGA as an accelerator, I came across this situation.


  1. I need to configure the cores (usually parallel core with same or similar architecture) by writing configuration/parameter registers to the core.
  2. The same type of registers are use to sample the results or status of the cores.
  3. The master controlling the cores (by reading/writing the registers) is usually a soft CPU or a interface to the host PC.


Given the fact that all cores are in the same FPGA and we have balanced clock tree within the FPGA, de-skew and resynchronisation between the the master and the cores. This make things much easier. NOT!

In FPGA design, these registers are usually implemented in distributed memory (i.e. DFF primitive). This is sometime necessary since the contents of these registers are requires simultaneously (e.g. the parameters of a FIR filter). Then it is straight forward to allow them to be written simultaneously.

In an example design, we have 10 cores. Each core has 10 configuration/parameter/result registers. All registers are 32-bit. This is a design with 32*10*10=3200 signals (ignoring the read/write controls) just for the purpose of infrequent data communication.

Here comes the problem: The connection between the master and the cores make it difficult to meet the timing constraints or something impossible to pass P&R. We simply spend too much routing resources for something not relate to design performance.

Now you are thinking using a bus system to connect these registers to the master. And you create the address decoder in each core. This will relax the routing channel congestion since the number of signals now depends of the number of cores but not number of registers.

But the timing issue is still not solved. Putting a global decoder in the master and connect all registers in a single parallel bus will not work. First, some FPGAs have no tristate buffers for internal connection. So you end up in using more signals (for sending the read data back to the master). Also, the fan-out of the master output will be too high since it is going to drive all the cores. More importantly, we usually fill up the FPGA with parallel cores as much as possible. So these cores are all over the chip in different location. Thus the connection to the furthest away core dominate the critical path.

You are upset by the fact that you have a highly optimised design for computation but slowed down by a configuration bus. There are two options from here: to pipeline the bus between master and cores; or to use a slower clock for the configuration part.

The second approach will solve the issue once for all. But we have limited clocking resources in FPGA which is more valuable than DFFs and LUTs. It is also not easy to implement the synchroniser in FPGA. Using async BlockRAM for a handful of registers is another big waste. At the end, you also need to set special constraints for the STA tool to get the timing closure correct. If you are willing to go through all these troubles, why not just set the bus as multi-cycle path in timing constraints? Anyway, you cannot avoid asynchronous design in FPGA following this path.

The first approach is actually easier, with the help of auto retiming from the EDA tools. All it costs are some DFFs which should not be an issue in modern FPGAs. The issue is the latency which impacting the bus protocol. Since there is a latency between the write signal and the actual update of the registers in the cores. For example, the master must wait until the parameters are actually updated before sending the 'go' signal (usually a single wire) to the cores. It get more complicated if any kind of acknowledgement is required for the write operation (e.g. the full signal in FIFO interface, etc.).

What I am proposing here is a serial bus which has the following advantages:


  1. Running under the same clock shared by the master and the cores.
  2. Minimum number of signals (only two) between master and each core.
  3. Minimum control logic required for implementation.

It is going to have similar disadvantages as in the first approach above. But it can still save a lot of routing resources. It is also suitable for ASIC design where flip-flops are more valuable than in FPGA and the cost of retiming every buses is too high.

The details of this serial bus will be presented in the next post.