Light_WS2812 library V2.0 – Part II: The Code

After investigating the timing of the WS2812 protocol in the previous part, the question is now how to use this knowledge for an optimized software implementation of a controller. An obvious approach would be to use an inner loop that uses a switch statement to branch into separate functions to emit either a “0” symbol or a “1” symbol. But as it is often, there is another solution that is both more elegant and more simple.


The image above shows the timing of both the “0” and the “1” code. The cycle starts at t0, the rising edge, for both symbols. The output has to be set high regardless of the symbol. At t1, the output has to be set to low for a “0” and can be unchanged for a “1”. At t2 the output goes low for the “1”. Since it is already low for a “0” we can set the output to low, regardless of the symbol. Finally, at t3 the complete symbol has been sent and the output can be left unchanged.

So, in the end there is only one point in time were the output is influenced by the symbol type, t1. Everything else remains unchanged. This means that special case handling can be limited to a very small part of the code.

This is what I ended up with in AVR assembler code:

       ldi   %0,8                         Loop 8 times for one byte
       out   %2,%3     // [01]    - t0    Set output Hi       
       sbrs  %1,7      // [02/03] -       Skip t1 if bit 7 is set
       out   %2,%4     // [03]    - t1    Set output Low       
       lsl   %1        // [04]    -       Shift out next bit
       out   %2,%4     // [05]    - t2    Set output Low       
       dec   %0        // [06]    
       brne  loop      // [08]    - t3    Loop

This code outputs one byte of data, which has to be loaded into %1 (The C compiler will take care of this). Since the protocol sends data msb first, bit 7 is tested. If it is “1”, the out instruction at t1 is skipped. That’s it, as simple as that, only 7 instructions needed in the inner loop.

What is left now is to correct the timing. To do that, nops have to be inserted at positions wait1..wait3. As shown in the previous part, the most critical timing is that of the “0” where
the delay between t0 and t1 may not exceed 500 ns. The minimum achievable delay, when no nops are inserted at wait1, is two cycles. This equals 500 ns at 4 MHz and less at higher clock speeds. All other timings may exceed the minimum timing required from the data sheet.

This means that even this simple loop is able to control WS2812 LEDs at only 4 MHz! This is quite an achievement, since it was previously considered to be difficult to control WS2812 LEDs even at 8 MHz. Note that the 500 ns is safe on the WS2812B, but may be critical on the WS2812(S). It worked with my devices, though.

To make the final implementation as flexible as possible, I opted to calculate the exact number of nops to insert at compile time from the F_CPU define, which is usually set to the CPU clock speed in the AVR-GCC toolchain. You can find the implementation here. The C-code tries to adjust the timing according to the following rules, which considers at least 150 ns margin for both the WS2812 and the WS2812B timing:

 350 ns <  t1-t0 <= 500 ns
 900 ns <= t2-t0
1250 ns <= t3-t0

The outer loop is implemented in pure C, since it can be safely assumed not to take more than 5 µs. This way maximum flexibility is retained.

Link to Github repositiory.


7 thoughts on “Light_WS2812 library V2.0 – Part II: The Code

  1. Looks like using a microcontroller with SPI and a DMA engine should make it trivial to implement this — just encode 3 bits for every input bit and DMA the whole mess out the SPI port at a fixed clock rate.

  2. I am glad I stumbled on this today. I have been toying with these LED strips recently, making LED scrolling sign programs and such. My frustration with the Adafruit (and other) implementations is the memory usage and how that limits the number of LEDs you can control. Since 21-bit color is a bit overkill for many uses, I propose using a palette and then compressing the data (one byte per pixel instead of three, for 256 colors). Then, I’d pull each byte and look up the 3-byte value to send out.

    While I programmed 6809 assembly, I have yet to touch AVR. Finding articles and projects like yours is a good starting point for me. Hopefully I can figure out enough to do what I am wanting.

    Thanks for sharing!

    • Thanks!

      Sure, you could use a look-up table. The existing code could easily be altered to use one. However, the LUT will already use 768 bytes of memory, you could have 256 LEDs instead.

      • Very good point – though one could go with a 16 or 30 color table to save space (for an LED sign, maybe 256 colors would be overkill anyway), and I was thinking of having the palette table compiled in to Eeprom.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s