From Zero to App Store in 90 Days

I hadn’t planned on learning mobile development during quarantine, but Paul Hudson was kind enough to hold a “Stay at Home” sale for his Swift courses. (Swift is the de facto language for iOS development.) The prices were low enough to convince me to take a stab at a personal goal of mine, which was to build a mobile app using a native framework. While I’m aware of cross-platform alternatives using languages I’m already familiar with, I wanted to learn something new and not rely on my existing web development skills as a crutch. Being a self-taught programmer, I often suffer imposter syndrome, and this was an opportunity to prove my competence.

I purchased Hacking With iOS and started my journey on April 3rd. The first four weeks were spent working through the book (one to three hours each day). By the time I reached Chapter 20 (around week three), I became more discriminating about what I needed to read and work on to accomplish what I wanted to do. I think every chapter is invaluable in its own time, but in my experience, reading technical guides from cover to cover is rarely the most efficient approach.

Once I felt comfortable with the basics of UIKit, I came up with an app that would not demand much more than the skills I had already acquired. In hindsight, the end result, Interview Notes, provided functionality that parallels the CRUD apps (create, read, update, delete) web developers are often tasked with. That experience gave me some useful context when things got difficult, because I knew how much I had struggled when I started web development.

However, the struggles experienced with app development are quite different from those experienced with web development. Web applications often require a good amount of dependency and environment management in addition to understanding front- and back-end development. For example, a framework like Django might require a Python library that relies on a package that was removed from the latest Linux distribution. Figuring out how all the pieces work together takes a lot of time and energy for beginning developers.

Developing for iOS is different, because all of the functionality is there, but the documentation and community is much weaker. It reminds me of the (false) claim that humans only use 10% of their brains. What good is all of Apple’s libraries if I don’t know how to use them or whether they exist? As for the known unknowns, I could often find discussion threads for issues I was experiencing, but answers were either missing or outdated. Apple’s own documentation sometimes links to missing pages. I get the sense that rapid advances to the iOS ecosystem have led to breaking changes or paradigmatic shifts. It’s what keeps people using their phones, but it’s also an unpleasant reality for native developers.

One specific example is Core Data. Apple’s documentation, when it exists, is gibberish to beginners. After a week of stalled development, I purchased Core Data by Tutorials. Thankfully my use case was simple, and I was back on track after two chapters and another week of effort. Another example was Nib files (which are actually .xib files). I still don’t really understand how they’re called and managed at runtime. Lastly, code completion stopped working within my project at one point. I tried every bit of advice I could find online, but I eventually had to start a new project and copy my code over. The real bummer was re-implementing IBOutlets, IBActions, and my Core Data classes. As someone who prefers to use Vim as their editor of choice, I don’t understand why I’m forced to use the UI to implement some features. It’s extra painful being forced to redo this work because Xcode didn’t work like it should.

The last few weeks of development were mostly spent streamlining the user experience. I was scared of submitting my app to the App Store due to Apple’s reputation for being strict, but I thought the process was relatively simple and fair (provide a support channel, a privacy policy, screenshots, etc.). It also makes sense that this experience would be somewhat pleasant, since Apple requires that hopeful developers pay about $100 annually for the privilege to submit apps for review. Mine was accepted on the first try, which was pretty exciting for me and a major confidence boost. It appeared in the App Store on July 3rd, exactly three months after I started learning Swift.

Where to from here? Well, it’s worth saying, I get the appeal of using React Native. I almost switched midway through, but my dedication to my personal challenge just barely won out over my frustration with Apple’s developer experience. (Frustration that I’m also sure would be reduced if I had a mentor or worked on a team of mobile developers.) At the same time, I think Apple’s GameKit framework is actually a pretty well-polished framework for game development, and that’s something I do want to try. So I’ll keep at it for now, and hopefully I’ll have something else to share for my one-year anniversary with iOS development.

Rules of Thumb for Estimating Timelines

Estimating project timelines is a skill everyone should practice. Part science, part art, part domain expertise, producing accurate timelines is essential for determining the opportunity costs associated with any endeavor. From a career perspective, this can be the difference between generating profitable and unprofitable work. As Patrick McKenzie writes in his frequently shared article “Don’t Call Yourself A Programmer, And Other Career Advice”: You really want to be attached to Profit Centers because it will bring you higher wages, more respect, and greater opportunities for everything of value to you. If you accept Mr. McKenzie’s premise, letting your project go into the red due to bad time estimates would be counterproductive to your career aspirations.

Timeline estimation is a common task that is ripe for comedic interpretation, as seen in this panel from Scott Adams’ Dilbert.

Given the importance of accurate time estimates for managers and managed alike, I’ve written down a few of the lessons I keep in mind when performing an estimate for my own projects. This won’t be comprehensive, or informed by anything beside personal experience and general reading. Your mileage may vary.

  1. Every six-hour task takes eight hours.
    If the task is self-contained and takes less than a day, round the estimate up to a full day. First, most workers aren’t going to be productive for eight hours straight. Second, small tasks are psychologically different from multi-day tasks. If the worker knows they can finish the task by the end of the day, they are more likely to prioritize other activities that delay completion: addressing outstanding issues, taking a breather, reconnecting with coworkers, etc.
  2. Expect that you forgot something.
    No one has perfect foresight. Someone is going to forget about, or otherwise not anticipate, a project dependency somewhere in the planning phase. Mitigation depends on bringing in multiple people/teams, identifying the core requirements, and planning backwards from the final deliverable to the kick-off.
  3. Plan for curve balls.
    Some unexpected event is going to happen at some point. Data will be lost, an employee will quit, the client will disappear for weeks. The longer the project, the more curve balls will be thrown. Budget for mistakes or set-backs. Not doing so is the same as saying that everything will go 100% according to plan.
  4. Know the personalities of the people providing input.
    One of my former managers said that he doubled the estimates he was given and then added an order of magnitude. In his experience managing overly confident twenty-somethings, a “one week” job often took two months to fully complete. Nevertheless, his rule of thumb was circumstantial. Estimates from Type-A twenty-somethings should be weighted differently than estimates from fifty-something career veterans.
  5. Calibrate your estimates.
    Perform an estimate for work that has been done by yourself or your organization, and then compare that estimate to the actual time spent. If that option isn’t available, ask for additional estimates from coworkers that you trust (ideally more than you trust yourself). In either case, explain the delta and learn from it.
  6. Start with what you know.
    Some tasks are going to be more familiar than others and will therefore be easier to estimate. Tackle those first, and use those estimates to anchor subsequent estimates. If you have experience coding a website, but no experience promoting a website, you might assume the former takes twice as much time as the latter and derive a ballpark estimate from that assumption.
  7. Review the data.
    Once you have the data, go back and review your estimates. Learn from what you did right and what you did wrong. Edge cases to check for: Were your right estimates lucky guesses? Were your wrong estimates reasonable? Learning the wrong lessons from the results is detrimental. Learning the right lessons makes you even more successful at what you do.

Additional reading:

How many times do you forget a stranger?

Reddit user katyvs1 created an AskReddit thread with the following question: What’s a random statistic about yourself you’d love to know, but never will? I was excited to see that many of the responses were bona-fide Fermi questions, and one response in particular caught my imagination:

How many times have I walked past someone that I’ve walked past before without realising? — u/colecr

While we can’t give that user a relevant answer without knowing more about their life, it seems to me that we can come up with an estimate for the average American city dweller (from here on out referred to as Mike). The first step is to break the problem down into simpler questions.

  1. How many people does Mike walk past every day?
  2. How many people does Mike pass on more than one occasion?
  3. How many people can Mike remember in total?

How many people does Mike walk past every day?

Let’s set a fairly realistic scene. Mike lives and works within a bustling neighborhood situated in a large city. We’ll call this neighborhood Fermitown. It’s roughly one square mile and holds about 20,000 residents.¹ Rather conveniently for this analysis, Fermitown is perfectly square, spanning twelve blocks in each dimension.² In total, Fermitown has 144 blocks and 576 sidewalks.

That’s more than enough sidewalk for Mike, who only walks about two miles on average.³ We’ll ignore street crossings and say that Mike walks 24 blocks each day, or 4.2% of the sidewalks. Like most city dwellers, Mike walks a relatively fast five feet per second. That means he spends 2112 seconds walking, or 2.4% of the day. Let’s keep it simple and say that for any given second of the day, he has a 0.1% chance of being on a specific sidewalk.

Remember that Mike is the everyman of his neighborhood. If we ignore the likelihood of popular paths or commute times we can use his statistic to estimate twenty residents on any sidewalk at any given time. From personal experience, that seems like a reasonable guess.⁴ So if we assume that half of those pedestrians are going the opposite direction of Mike, we can say that he passes ten residents per block.

Answer: Mike walks past 240 residents each day.

How many people does Mike pass more than once?

According to this model, Mike walks past 240 residents, or 1.2% of Fermitown’s population, every day. We can use that to approximate a 1.2% chance of seeing a specific resident on the sidewalk on any given day. After one year, Mike will have seen about 93% of the population more than once.⁵ After two years, assuming residents move every nine years, Mike will have seen about 21,900 faces at least twice, and a quarter of those faces at least ten times.

Answer: After two years Mike passes around 21,900 residents at least twice.

How many people can Mike remember?

This is the hardest part of the question. The best statistic I can find is a recent study that pegs average recall at about 1,000 faces, and average recognition at about 5,000 faces (although at least one participant could recognize over 10,000 faces). The problem we face is that these numbers don’t tell us what Mike’s actual potential (i.e. facial recognition capacity) is. We can however assume that his capacity is partly used up by friends, family, significant figures, and celebrities. We also have to remember that we’re concerned with Mike’s ability to recognize faces that he passes on the street. In reality I suspect that multiple, significant encounters need to be had for a face to be committed to memory. We’re most likely overestimating the likelihood of Mike committing a face to memory and underestimating Mike’s ability to recognize faces in general, so I’ll leave the estimate at 5,000.

Answer: 5,000 faces.

Putting it all together

After two years of living in Fermitown, Mike has unknowingly passed by at least 16,900 people on more than one occasion.⁶ Assuming facial recognition capacity is limited to 5,000 faces, he would continue passing the rest of his neighborhood’s citizens multiple times without being the wiser. In reality I think Mike might perform much better than we’ve estimated. But even if he recognized 50% of the people he passed, that would still leave thousands unrecognized.

Let’s continue this example by moving Mike to the suburbs. Population density is much less, so we’ll say he lives in a town of 50,000 people and only walks past 15 strangers each day. Assuming he lives there for nine years, he’ll have seen nearly 10,500 residents at least twice. It took a little longer for multiple encounters to build up, but they eventually grew to twice our limit of 5,000 faces.

Final estimate: The average American will have unknowingly passed by thousands of strangers more than once. You can see my simulation’s code on GitHub. Let me know what I did wrong in the comments. (:


  1. Using San Francisco as a guide, the average population per square mile is a little over 18,000. However, most neighborhoods have higher population densities.
  2. I’m assuming a block length of 350 feet and street widths of 60 feet, using San Francisco as a guide once again.
  3. Americans don’t seem to walk much at all.
  4. It also implies that over half of the neighborhood’s residents are walking at any given time, which sounds less reasonable.
  5. I originally tried a naive equation of 1-.988³⁶⁵ but it didn’t match my (most likely more accurate) simulation. In the meantime I’m going to look into a better way to model this distribution.
  6. Remember this number only includes residents.

Using Neural Networks to Identify Hate Speech

Last year, researchers from Cornell published a paper describing their work classifying tweets as hateful, offensive, or neither. They experimented with a handful of models and parameters before settling on logistic regression. In addition to the effort spent building and evaluating models, they also extracted a handful of features for their model beyond the text itself, including the readability, sentiment, and metadata of each tweet. This was a lot of work, but the results were promising.

I wanted a similar classifier for my own purposes, but I didn’t have the patience necessary to extract the same data. Instead, I decided to leverage a technique that would require no pre-processing on my end: a convolutional neural network. Using this method, words are represented by vectors (thanks to pre-trained word embeddings), and features are selected by the neural network itself.

I implemented the CNN architecture suggested by Yoon Kim, deviating only in the word embeddings, the number of filters generated at the convolutional layer, and the error metric. The resulting model performed as well as, if not better than, the model used by the Cornell researchers when comparing overall F1 scores on the same data (0.91). This is gratifying, because I didn’t optimize my model’s parameters, let alone extract any features.

However, at the classification level, my model didn’t do as good a job at differentiating between hate speech and offensive speech. This is an important distinction to make, as it may be the difference between what is legal and illegal in some localities. Contradictions within the source data may be responsible for this confusion. “Tired of hoes man” is labeled as offensive, while “some lying ass hoe lol” is labeled as hate speech. Although both are offensive, I can’t say that one is more hateful than the other. This problem arises for a multitude of racial and gender-based slurs.

And although the previous examples show that annotators were not working with clearly defined categories, problems extend beyond that. Even a tweet stating that the “Lakers are trash right now” was labeled as hate speech. If I don’t know what the annotators were thinking, I can’t expect my model to.

So what about future work? Right now, I have a reasonably accurate (90%) classifier of offensive tweets that I can run against individual accounts or topics. Alternatively, I could try to create a better dataset. This seems hard, because people have different opinions about what hate speech actually is, but I do think that a good first step in that direction would be building a classifier that identifies genocidal or violent messages. This is both a clearer definition of hate and a more immediate concern in an era of mass shootings and populist leanings.

My code is available here.

An Introduction to the Storefront Index

Joe Cortright and Dillon Mahmoudi introduced the ‘Storefront Index’ in a 2016 report available here. The metric is a simple one, counting the total number of businesses within a city that meet several conditions (publicly accessible, densely located, and close to the city center). Despite its simplicity, this metric identifies the vibrancy of major metropolitan centers and indirectly measures other features such as walkability, safety, and economic health — all of which contribute to the quality of life of local citizens.

Although Cortright and Mahmoudi only calculated scores for the nation’s fifty largest metropolitan areas, they provided enough detail for replicating their method for smaller towns and cities. To demonstrate the utility of this metric, I will apply it to Oxnard, CA: a coastal city of over 200,000 people (and this author’s hometown).


Cortright and Mahmoudi acquired their data through a third party, but Oxnard’s data portal was all I needed, although some additional preprocessing steps were required. I had to convert the Business Data dataset’s 11,000 business licenses into a list of storefronts. Doing this required de-duplication and manual curation of some categories, but the end result was a list of Oxnard’s 1,200 storefronts.

The preprocessed data was loaded into QGIS as a delimited text layer. The latitude and longitude columns were assigned their respective point coordinates — in this case, WGS 84 (GPS). These coordinates were then re-projected into a local coordinate system (EPSG:26745) for more accurate distance measurements. (Note that points can be plotted against XYZ tiles as a sanity check.)

Once the points were mapped, QGIS’ distance matrix tool was used to calculate the distances between each location’s nearest neighbor, so that storefronts more than 100 meters away from their nearest neighbor could be removed. Next, the distance to nearest hub tool was used to filter out locations more than three miles away from Oxnard’s city hall (manually added as a separate layer).


After performing the steps described in the methodology section, the original count of 1,200 storefronts was reduced down to 963. This final number represents Oxnard’s own storefront index score. Surprisingly, the score of 963 is competitive with the quantities attributed to larger cities (circa 2014) in Cortright and Mahmoudi’s research, including the cities of Austin and Pittsburgh. This number may be deceiving, however, due to the use of different datasets.

Figure 1: filtering out Oxnard’s neighborless and distant storefronts

Plotting the qualifying storefronts (i.e. those that are densely populated and close to city hall) provides greater insight into city-wide trends, and certainly more insight than would be visible by simply plotting all storefronts (see Figure 1 for a before-and-after comparison). The resulting visual reveals a predominantly north-south, linear orientation. Significantly, only two regions show clustering, the larger of the two being anchored by city hall (the yellow dot) and the lesser being distributed around the Ventura Freeway (Highway 101). These results will make sense to locals, as the first cluster represents Oxnard’s historic downtown and business improvement district, and the second cluster mainly represents a new outdoor shopping center known as The Collection.

Neighborhoods with high walkability and convenience scores will most likely be adjacent to or interlaced within these clusters. Getting to these areas, however, is another question. The city hall cluster sits on the connecting road between both Highway 101 and the Pacific Coast Highway, the arterial roadways of the Central Coast, and The Collection is a freeway exit off the 101 itself. The rest of the city’s storefronts are essentially strip malls, none of which are enough to meet the variety of entertainment and consumption needs of nearby households (see Figure 2). This indicates critical gaps in Oxnard’s neighborhood development fueled in part by the historical presence of major highways.

Figure 2: Strip malls influence the index.


While the Storefront Index is supposed to measure economic strength, it is also supposed to shed light on vibrant communities. In this sense, fast food restaurants and big box retailers don’t contribute as much to a neighborhood’s character as do independent bookstores, pinball arcades, and bars. The Storefront Index would need to acknowledge this difference to better reflect neighborhood quality.

I also propose an additional step that removes the likelihood of sparse or linearly oriented storefronts increasing a city’s score. Starting from the hub point (in Oxnard’s case, City Hall), only a single chain of storefronts linked by distances of less than 100 meters should be included. This would exclude clusters of stores that are marooned on the boundaries and strengthen the index’s relationship to walkability scores.

The (Unchanging) Statistics of Deadly Quarrels

An illustration of Richardson’s vision of human computers performing calculations within a forecast factory. NOAA / L. Bengtsson.

Statistics of Deadly Quarrels was written by Lewis Fry Richardson and published in 1960. The book is notable for both its findings and for being one of the first examples of quantitative methods being applied to the realm of international relations. Richardson, a meteorologist by trade, turned his revolutionary and now widely used weather forecasting methods toward the outbreak of interstate conflict, hoping to find predictive variables by analyzing the years between 1809 and 1950. Although Richardson failed in this regard, he made a relatively shocking discovery: that outbreaks of war mirror the occurrence rates of rare events like meteor strikes and earthquakes, or the category of events known as “acts of God”.

The occurrences of these events and others, such as genetic mutations and customer arrivals, can be statistically modeled with Poisson distributions. The basic requirements of Poisson distributions are that events occur independently of each other and that the rate of occurrence is fixed over the period of time being studied. That the outbreak of war would follow a distribution that meets these assumptions raises interesting mathematical and philosophical questions that have yet to be resolved, and simultaneously assert and reject the value of forecasting attempts within this realm.

I first learned of Richardson’s work while reading the June edition of Harper’s Magazine on an airplane. The article’s author, Gary Greenberg, went on to describe Richardson as a visionary who imagined large rooms filled with “computers” (in this case people) who would perform calculations on incoming data in real-time. In a fitting testament to Richardson’s foresight, I just so happened to be en route to D.C., where I’d be spending my summer interning as a quantitative geopolitical analyst. The era of big data had arrived, and the $200 billion industry now reflects the popularization of the belief that any question can be answered with enough observations and computational power.

Out of curiosity, I decided to pick up where Richardson left off and conduct the same analysis on interstate conflicts through the present day. Specifically, I wanted to compare the frequency of n number of occurrences per year against that expected in a Poisson distribution. Thankfully, the task is much easier today than it would have been 50 years ago. There would be no monotonous paging through encyclopedias or lengthy calculations by hand for me. After a relatively simple Google search, I was able to get the data I needed from the UCDP/PRIO Armed Conflict dataset, which provided me with well-coded observations from 1946 through 2009. (In order to avoid overlap and any resulting bias, I only looked at the years from 1952 and on.) And, 60 lines of code later, here are the results:

wars started in a given yearcount (observed)count (expected)proportion (observed)proportion (expected)

To summarize the table, there were 30 years where no new conflicts were started, 21 years where one conflict started, five years where two conflicts started, and two years where three conflicts started. As can be seen by comparing the values of the expected and observed columns, the distribution of actual conflict outbreaks mirrored that of a Poisson distribution. This was verified at the 95% confidence level using a Yate’s corrected Chi-Square goodness of fit test. From the results, it appears that Richardson’s finding continues to remain relevant as we enter the new millennium.

View my code on GitHub.