My Ph.D. was in City and Regional Planning from UC Berkeley. Last year, I finished a postdoc in the UC Berkeley Urban Analytics Lab. Most of that work was about integrated transportation and land use modeling. This work was in the classic MPO (metropolitan planning organization) style of figuring out where people live, where firms locate, what decisions people make for travel patterns, times, and modes.
My primary research focuses on two different pillars. First, I look at transportation network modeling, and in particular building graph-based models and circulation systems with OpenStreetMap data. For that, I developed a software toolkit called OSMnx, using NetworkX, a network analysis package developed by Los Alamos National Lab. Basically, it pulls raw OpenStreetMap data into the API, and turns it into a NetworkX graph model. That lets us automatically query OpenStreetMap and then construct a theoretically sound model automatically, without having to do it all ad-hoc.
The empirical part of that pillar is that the tool opens up scalable street network analysis that we couldn’t do before. What OSMnx lets you do is create a script to look at 30,000 study sites, even 100,000 study sites, let that run for a couple days, and you have all your graphs there. You can run the same algorithms on all of them, and you can quickly look cross-sectionally at different resilience indicators, indicators of where the grid exists or not, to look at urban planning histories of different kinds of places. So, where traditionally we would look at sample sizes of 10 to maybe 50 street networks and cross-sectional studies and run a regression model, there’s a scalability component here. We can get better statistical significance, but we can also look at populations rather than samples. We don’t have to take a sample of 50 American cities and make a claim about American cities; we can look at every incorporated city and town, according to the US Census, in less time than having to produce your own script.
The second pillar of my work is looking at how new forms of housing data and housing technology platforms can tell us different kinds of stories about housing affordability. Basically, asking what sources like Craigslist or Zillow tell us about the housing market and the current state of affordability tell us, different from what we can get from the most recent decennial Census, American Community Survey (ACS), or American Housing Survey (AHS) data. In what ways do they tell different stories, what sampling biases exist, which populations are being underrepresented in something like Craigslist data, and which populations are being sampled very well so that you can make a lot of claims about them? According to the most recent AHS, sites like Craigslist were the most common form of how people found their current housing unit. So, at a minimum it’s the primary mode of information exchange in these markets. It at least tells us a story about everyone who uses that mode of apartment seeking.
City Street Network Orientation is a fairly simple information visualization–there’s no theory under this, there’s nothing we’re going to do differently in city planning because of it. It is the first piece of a larger project focused on urban planning histories of different places. In particular, what I’m doing now is looking at two sets of study sites. The first is every city in the US, and the second is every census tract in the US, pulling the street network for each of them. First, I look at this measure of orientation entropy. So, we look at all the compass bearings of every street inside of the study site, and we just see what direction or directions the street is oriented. We treat every one as a bidirectional street, even if it’s not.
For example, if the street faces north and south, then its compass bearings are 0° and 180°. We put them together into a vector of data, and we bin it. Instead of doing a regular histogram, we use a polar histogram, so it’s circular rather than linear, and it just shows us what direction the street is pointing. The magnitude for each bar is how many streets fit into that bin of compass bearings. You can quickly see that if a street network is gridlike, you’ll typically have four bins that contain most of the streets. In a place like Boston, where there is much greater entropy, there’s much more uncertainty. You don’t have a one in four chance of guessing a street’s direction as you might in a place like New York, which is much more gridlike. What this does is give us an indicator of how ordered the street network is: does the entire study site follow a single set of two directionalities, North-South and East-West. We can match that up with a couple other indicators: the proportion of four-way intersections, which is another indicator of griddedness, and the ratio between the actual street segment length between two points, and the “as the crow flies” distance between the two points, which is an indicator of circuity.
If we take these three indicators, orientation entropy, circuity, and four-way intersections, mash them together into an equally weighted index, we can start to create an index of griddedness. We can then compare different places in a clear and theoretically sound way. We can see places like the Midwest and Great Plains ranked very high on the grid index because of their history of settlement during the Homestead Acts era of quickly planning out land according to an orthogonal grid. You see places like the Piedmont region in the upper South very not gridlike, both for historical reasons and especially for topographical reasons. I’m now looking at this longitudinally, by the primary era in which each city was developed. We have statistically significant trend lines for pretty much every indicator showing that even when we control for topography, there are still statistically significant effects of design in different eras that are preferencing grids or cul-de-sac loops and lollipops in the 1990s or 2000s suburbs.
I’m not an expert on the planning histories of Boston or Charlotte, but there are a lot of different reasons cities can have higher or lower entropy in their street networks. Some cities’ original settlement patterns did not necessarily follow an orthogonal grid, whereas many Spanish-settled cities, which followed the “law of the Indies,” are somewhat more gridded. Terrain can also break it up, with hills and bodies of water. We also see differences due to the auto-oriented settlement patterns of the second half of the 20th century. In the full study of orientation entropy I look at 100 world cities. Typically, US cities have much lower orientation entropy. Charlotte, however, is the most disordered street network in the world, more so than Rome, Sao Paulo, Venice.
Grids also do not necessarily tend to persist over long periods of time; cities like Rome, for example, had more of a grid 2,000 years ago, and it’s mostly gone now. No one ever just cleared the city and rebuilt it, but slowly over time, people chipped away at the corner of an intersection, or punched through a block diagonally. The city reorients itself around different foci of interest as the centuries go by. The grid isn’t necessarily permanent, even if some of those patterns are still there.
It depends. It can, but usually if you control for other things, it wouldn’t be as important. For example, a place with high entropy could be Charlotte, or it could be Venice. They provide very different levels of accessibility to their residents. If we just look at entropy, we’re not controlling for the grain of the network–how long the blocks are, how wide the streets are, land use. So it kind of obfuscates a lot of other things. But if we start controlling for them, we can get a better sense of what kind of accessibility patterns it creates. And if we add in other things like a grid inde we can at least make claims about the style of street network patterns when they were laid out, especially when controlling for size.
The current project I’m working on publication for is primarily interested in sampling biases in these platforms. I’m looking at where there are more rental listings on Craigslist than we would expect to see if listings were apportioned to each tract according to how many vacant units they have. So, are there places where there’s more listings per vacant unit than other places? There are, and so I’m interested in the different traits of those places. We can’t really say it’s a causal model, but in the model what increases Craigslist representation? We can see that whiter, wealthier, better educated, higher income communities tend to be more represented, whereas black and hispanic communities in particular tend to be underrepresented on Craigslist. There are different ways we can theorize this: there could be language barriers, there could be cultural barriers, access to technology. Historically, real estate agents would engage in steering different racial and ethnic groups toward different neighborhoods, providing different amounts of information to white versus black housing seekers. The concern that this work is raising is that the internet promises some sort of democratization of information, but if the internet has a sort of information channel segregation built in for social, structural reasons, it can be a modern self-organist form of steering, resulting in perpetuating residential sorting patterns.
In another publication I just sent out for review, I’m looking at, if I were an urban planner four years ago, and I wanted to know what was happening particularly in the Bay Area, which was undergoing very rapid housing price changes, what would I have known if were looking at ACS data, or AHS data, or Census data? What stories would those have told me year over year, or survey release after survey release? Versus, what would I have known if I were looking on these technology platforms, scraping listings, looking at median rents in different communities? Our argument there is that Craigslist can be a biased but leading indicator for housing price change. If ACS data is lagged by a couple of years (and even then, at the tract level it’s a five-year rolling estimate), that weakens it. You can see what’s happening, at least in the online portion of the market, which is now the primary mode of information exchange, more so than word of mouth, paper signs, etc. It tells us something at least. And we need to caveat those biases, and be cautious about having policy responses just to a market segment that is clearly biased toward whiter, wealthier communities.
For this paper at least, one of my main goals is just to raise questions. I want to raise questions in two areas: first, are there information inequalities online that create different search costs for different communities? So, are we reproducing traditional housing search inequalities for the poor online now? And similar to that, the second hypothesis coming out of this is, what if underrepresentation on Craigslist is actually really good for these communities? What if it shields them from gentrification and displacement by wealthier people who use Craigslist more, and can’t do neighborhood discovery and colonization?
The second point I’m raising is that planners are increasingly turning to Craigslist. And while Cragslist is super useful for that, we’re not looking at a representative sample. We’re not looking at all market segments, and any policy response based on what we’re seeing on Craigslist will be biased toward whiter, wealthier, better educated communities. So we need to be cautious.
A good example of this is a book Maria Krysan and Kyle Crowder had come out last year, in which they suggested that broadening people’s information supplies could be one of the most important strategies for helping to end residential sorting and residential segregation–if we can start showing people units in other kinds of sociodemographic communities that might meet a lot of their criteria and interest, that sort of neighborhood discovery could be a good thing in that we start integrating communities better. However, this research suggests that it might not be possible to follow that policy prescription if landlords aren’t even listing those kind of communities online. At least in the direction of more privileged classes filtering into less privileged neighborhoods, there isn’t really a good information supply pipeline. We could go the other way, if we’re able to give these kinds of neighborhoods listings assuming there is some affordable tier, for people who live in very different communities to filter upwards. But, based on the sampling biases, it looks like it’s a one-directional thing.