Sector codes on Zipline

by Peter Harrington

Posted on Nov. 6, 2017, 4:08 p.m.

When Quantopian pulled the plug on the free lunch that was hosted trading some of us turned to Zipline. As I talk about in previous posts I had already been using Zipline to do things I couldn't do on Quantopian's platform. One thing that Quantopian did not open source with their fundamental code. I spent some time writing what I believe is the greatest implementation of fundamental data from sparse data. The code you have to write is less than the Quantopian code and I believe it is much faster, but since their code is not open source I can not measure speed. I will get into that in a subsequent post, but as an intro I would like to show code for getting Pipeline data for a single value. This is used to get sector codes for US equities. There is no time component. If you wanted to get sector codes on Quantopian you would need the following code:

from import morningstar

# while setting up your Pipeline: 
grouping = Grouping()
pipe.add(grouping, "grouping")

# finally a class to tie those two together
class Grouping(CustomFactor):
    sectors_in = morningstar.asset_classification.morningstar_sector_code.latest
    sectors_in.window_safe = True
    inputs = [sectors_in]
    window_length = 1

    def compute(self, today, assets, out, sectors):
        out[:] = sectors[-1]

Now let me show you how it is done with the version I have written:

from import NASDAQSectorCodes

# while setting up your Pipeline: 
grouping = NASDAQSectorCodes()
pipe.add(grouping, "grouping")

That's it. You can argue that I moved the class to another file, and that is fair, but take a look at that class:

class NASDAQSectorCodes(CustomFactor):
    """Returns a value for an SID stored in memory."""
    inputs = []
    window_length = 1

    def __init__(self, *args, **kwargs): = np.load("/path/to/the/file")

    def compute(self, today, assets, out):
        out[:] =[assets]

All it does is output the data. The reason is that the data is stored in an array that is organized the same way that Zipline is feeding the data to the algorithm. No HashMap lookups, no reading from files, no funny business. What goes on behind the scenes is that when the data file is built we have some knowledge of all the assets that will be used by Zipline, and those assets have SIDs. So we can pack the corresponding sector codes for each asset as an index in an array. That makes for outputting extra values quick and easy. This idea is further extended with a second dimension of sparse dates to get fundamental data. I will show that in a later post. These ideas could be used for any data not just fundamentals and sector codes.

I will probably abstract this further and make a generic Factor for any "aligned" data. What I'm calling aligned is data that has the same order in the array as the assets fed from Zipline.

You can find the code here.

Blog Search


Hi this is Peter Harrington's spot for discussing all things related to quantitative finance. Mostly focusing on how to build your own system and strategy. I focus on Long/Short equity and futures, but am open to learning about other assets and strategies.