July 29, 2011
On Wednesday night I taught a workshop at CUNY Graduate School of Journalism called Intro to Data Journalism with Python. In this class, I tried to teach enough programming to analyze a University registrar's website and find the most popular time slots for classes. The course outline is here on github.
I think the class went pretty well, though some people looked bored at the beginning and some left before the end. I think one issue was that when teaching a one time workshop people come in with a range of different levels of experience from beforehand. I haven't done much teaching before and so I learned some things about how I could improve it for next time.
- Better communicate the expected knowledge level coming in. The class description should have more clearly stressed that the class was for complete beginners and described what the starting material would be in more depth.
- Having an assistant (or a smaller class) would have helped to get people set up and using their computers. At the beginning of the class people needed to open up the command line and navigate to the proper directory. Having another instructor would have made this quicker. And there would be someone to float around the room during instruction to help anyone who got lost get back on track.
- In the class, I jumped into the coding part too quickly. In retrospect, people would've learned more/been more engaged if I had gone over more examples of why what I was about to each is useful. Having specific use cases in mind would have made it easier to understand the coding part.
- Been more insistent on getting feedback from people about the pace of the class to make sure that people weren't falling behind.
- Finally, I should have made people type more and listen less. If I had split up the me talking part with some simple exercises or incomplete programs and asked people to finish them I think people would've been more engaged and learned more.
Anyone else have other tips for teaching programming?
July 14, 2011
Social media has quickly become a major source of traffic for news sites. (See the Pew Research study Navigating News Online published May 2011) People spend a lot of time on social sites and find a lot of relevant news through them. It seems imperative for news websites to "go where the readers are" and engage with them through social media. All the major social networking players even have special media relations teams to help news brands use their networks to the fullest.
Initial steps into using social media usually seem to succeed with increased site traffic in a big way. And newsrooms have before been faulted for failing to innovate enough and embrace the web. So it seems they should jump into Twitter and Facebook with both feet and join the modern web before it's too late.
Not such a good idea?
By investing effort into Facebook and Twitter, news sites give the social networks more mainstream legitimacy and consequently more new users. And by easily making news available on social networks users become more locked in to those platforms. By all means reporters and editors should be using social networks to find sources and do the business of reporting and spreading a story. But for any specific social networking site to become a major part of a news website's strategy is giving up too much control of the reader relationship and could be a dangerous mistake.
If users always interact with a news site through Facebook or Twitter, then that news site is at the mercy of the platform and a small algorithm tweak could easily send all that traffic to a competitor.
The interests of profit-seeking tech companies are at best orthogonal to those of any media company. Depending on their platforms to engage with readers would turn a news organization into a sharecropper, putting in journalistic effort but letting others reap the majority of the rewards in exchange for a pittance of pageviews.
To thrive, news sites need to own their reader relationships with social networking sites playing a secondary role. The user experience should be such that they are not substantively harmed if the social networks were to disappear (or change the rules) the next day.
The key concept is lock-in. Is the news site building user engagement in a way that increases a user's lock-in to the news site more than their lock-in to Facebook and Twitter? If not, then it's probably a mistake
For an elections news app, it may be smart to use Facebook to provide recommendations to a user based on their friends. But the apps core manner of engaging with the user should be something independent, like the ability to pick candidates or races of interest to follow.
Print publications have long known the value of a loyal, locked-in audience of subscribers. A successful online strategy will be one that focuses on user engagement and making the news site irreplaceable for users. Social media is then just another customer acquisition channel to bring new readers in.
This is a topic I've spent a lot of time thinking about and talking with people about (including one long discussion on a rainy hike in the south Jersey Pine Barrens) but this is the first time I've tried to set the ideas down in a fixed form. I did write about social engagment for news sites from a paid content perspective a year and a half ago.
July 11, 2011
So I'm teaching a class on "data journalism" and Python at CUNY Journalism School. This will be the third time I've done this particular session. First at the CMA College Media Convention in March and then at BCNI Philly in April. The session at CMA went pretty well, but it was much better at BCNI because all the attendees had computers and could follow along and so that's the way I'll be doing it this time.
The code I taught with before is here, if you're curious.
June 15, 2011
One widely adopted kernel of wisdom about news online has become that the vast majority of traffic to a news site is made up of "casual" visitors or "fly-bys" that visit just once or twice a month. I think measurement error might be driving this statistic far higher than reality. I'm reading Matthew Hindman's report for the FCC on local news consumption (summarized and linked to from here) and it again repeats this observation.
My roommate has a habit of clearing his browsers cookies and all private data every time he closes it. Yet, he basically visits the same set of news sites every single day. If these sites are using cookies to track his visits, as is the standard way, they are over counting there visitors number for him by 30 times. Let's do some rough math to observe how much impact this could have on the results of a study of that data.
Let's assume we have a site that has measured 130 unique visitors at an average of 10 pageviews per visitor for the month. In total they've got 1,300 pageviews. If 1% of their visitors browsed like my roommate did, they would actually have only 100 unique visitors, and each person would have 13 pageviews for the month. What if 2% of people did it? Then the average pageviews per person soars to 19.
Maybe news visitors aren't so disengaged after all.
February 21, 2011
Beyond the story lines about Murdoch and the Bancroft family, and Marcus Brauchli and Robert Thomson, Sarah Ellison's "War at the Wall Street Journal" has an interesting story line about what had made the Journal unique before the takeover and about a newspaper trying to adapt to the Internet.
About being a "second read" paper
The notion that the Journal could be a second read, famously espoused by the legendary midcentury Journal editor Barney Kilgore, was no more. No one had time to read two publications. And anyway, Murdoch didn't want to be second at anything. As smaller papers around the country faltered, Murdoch wanted to pick off their readers.
-- War at the Wall Street Journal, by Sarah Ellison. page 199
About "Journal 3.0"
[Publisher Gordon] Crovitz decided he would call the new iteration of the newspaper "Journal 3.0." He arrived at the name &em; never popular in the Journal's newsroom or executive floor &em; by taking particular note of the Journal's lead front-page story the day after Japan attacked Pearl Harbor: "War with Japan Means Industrial Revolution in the United State" read the headline. The story outlined the implications of the attack on the country's economy, industry, and financial markets. For Crovitz, it also marked the end of the first phase of the Journal &em; "Journal 1.0," the time between the paper's founding in 1889 and December 5, 1941. During that period, the Journal reported the news like any other outlet. After that headline and under Bernard Kilgore, who became the paper's managing editor the year of the Pearl Harbor attack, the Journal started adding more analysis to its stories and expanded its coverage beyond business and finance. Crovitz defined "Journal 2.0" as starting on December 8, 1941. He planned for it to end of December 31, 2006, when he would usher in the paper's third phase.To compete against the immediacy of the Web, Crovitz wanted the paper, instead of running stories that rehashed what people had learned the day before on their BlackBerrys, to become more analytical. Journal reporters would break news on the Web site and then examine it in the next day's paper.
-- War at the Wall Street Journal, by Sarah Ellison. page 51
About the morning news meeting
Following the Journal's tradition, the editors wouldn't talk about the biggest news of the day. Unlike every other newspaper in every jurisdiction of every country in the world, the Wall Street Journal didn't put news on its front page. The paper relegated the biggest news stories to the inside of the paper, on page A3. Epic features and investigations for Page One were mapped out weeks if not months in advance. Because of this Journal peculiarity, the morning news meeting was not a frenetic debate about the most disastrous or dramatic news events, but rather a mannered recitation of the day's "sked" of stories. In a business of attention-grabbing headlines and color photos, the paper treated its front page like a quiet haven for reflective storytelling. Breaking news was important, and the paper did plenty of it, but the craft of feature writing was the center of the paper's identity.
-- War at the Wall Street Journal, by Sarah Ellison. page 48
About "the pack"
[Murdoch] wanted the Journal to lead the media pack. It was antithetical to the Journal ethos. "Even if you're leading the pack, you're still part of the pack," Peter Kann, the Journal's former CEO, liked to say. "If there's something everyone is talking about, that should be on the front page of the Wall Street Journal," Murdoch told his aides.
-- War at the Wall Street Journal, by Sarah Ellison. page 170
May 20, 2010
Some classmates of mine at Penn recently finished a class on Pricing Strategies in the Marketing Department taught by Professor Z. John Zhang who studies such things and they've written a paper named "From Print to Portal: Pricing Strategies in the Online News Realm."
They've kindly given me permission to post it online and share it so go ahead and check it out here. (PDF Link) They give a history of the topic and discuss what many companies are doing now. In the conclusion they suggest that news sites should adopt hybrid subscription models.
The paper is a good qualitative treatment of the subject and a fresh take from some people not personally invested in the subject. This was a final paper for the class, and from what I know, none of the five team members have ties to or have worked in the industry.
May 19, 2010
I am officially a graduate of the University of Pennsylvania.
This infographic I made for the DP does a fair job of summing it up.
May 4, 2010
A Mixed Bundling Pricing Model for News Websites
Abstract: This paper outlines a method for finding revenue maximizing mixed bundling prices for news websites. This can help better understand paid content strategies for online news content. Drawing on work in the field of bundling information goods, I apply a two-parameter model of consumer preferences to web site traffic data and a roughly estimated willingness-to-pay curve. We can then calculate revenues for different price points and find the optimal one for any given site. This method is applied to a sample of ten sites. At revenue maximizing prices, the majority of paid revenue for these sites comes from the sale of individual articles, rather than subscriptions. Site traffic showing highly loyal consumers is found to correlate with higher subscription prices. This model suggests that while it is possible for overall revenue to be higher with a paid content plan, total traffic will certainly fall.
It can be found online here in PDF form.
I'm mostly happy with the way it turned out, though there were a lot of compromises and broad assumptions needed to bring it to a finished product. There's so much interesting material in this field, I wish I could spend a few more years studying it. I guess that's what graduate school would be, if I ever decide to attend.
Special thanks go out to Aleks Jakulin for supporting and encouraging me in this work.
March 31, 2010
I don't post here nearly as much as I should because I've set a precedent of long posts that take a lot of effort and I don't want to muddy up the stream with little stuff. I know I've also promised posts that I haven't delivered on. They're coming (I hope).
But meanwhile, I will blog in short-form, and a little more personally, at http://albertsun.posterous.com/ to keep things flowing.
January 26, 2010
After the New York Times announced its metered paywall last week there has been a lot of empty blather. Standing out from all the noise are two very good analyses. The first was by Felix Salmon for Reuters, analyzing a consumers decision of whether or not to pay. The second one was by Jonathan Stray on Nieman Lab, showing the effect of several different variables on revenue.
This stuff is right up my alley, and I'm currently working on a senior thesis in the field and so I'll try to extend Salmon's analysis a little bit. Later on, I'll take on Stray's model as well.
Let's say a reader in a given period reads articles from the New York Times. Then suppose the New York Times sets the paywall after a consumer has read some articles. In order to read the article, the reader must pay a fee of . If is the value the reader gets from each article, then he will only pay the fee if . This is a good simple model synopsis.
Article Values are Different
Let be as before. The first issue that jumps out is that the value of any given article is not constant. The value of articles over a period varies, so let's arrange them in order of value from highest to lowest.
Let be a monotonically decreasing sequence of article values for our reader, with . Then the reader gets value,
The reader would clearly choose to read the articles he values most first, and after that only pay the subscription if the rest of the articles he has yet to read are still valuable enough. Only if will the reader pay the fee.
But this is not quite right either. There's no way for a reader to know ahead of time which articles are most valuable to him.
Predicting future value
Now, instead of ordering the values of articles from highest to lowest, let's say that the value of articles our reader reads are drawn independently from a probability distribution. Let the value of articles be a random variable with a normal distribution and the average value of an article. are the value of the first article read, second article read, etc.
Let the period of time for which the reader pays be represented as , and the moment when the reader has read free articles and must choose whether or not to pay the fee be at time . Assume the reader reads articles at some constant rate throughout the entire period. Then .
Now the reader must predict what the value of articles he will read will be to determine whether or not he should pay the fee. Up to point , he has gotten value and average value per article of . is also the sample mean of the distribution.
Our reader will choose to pay the fee if . As goes up, so does and as goes up, goes down.
There are some interesting suggestions from this. When the New York Times imposes the paywall, they should carefully monitor the rate at which people read its articles. Those that have a low rate would be ideally suited for targeted discounts. Also, since readers make their predictions based on past articles they've read, the ideal time to convert non-paying readers is right after a reader reads a series of good articles. If the Times can be subtle about dialing up and down , then they can exploit variance in article value to increase sales.
This analysis is of course still incomplete. Problems I still see with it.
- Knowing that you'll only get a limited amount of articles for free will change a reader's behavior. If they're still uncertain about whether or not paying the fee will be worth it, they will more carefully pick which articles they read before time t. This will bias upwards, but push downwards. At time , there will also be a back-log of articles that would have been read but weren't influencing the decision of whether to pay or not.
- How will the reader decide whether or not to read an article before time ? He'll have to depend on the headline and a summary if available to make a prediction. Before actually reading the article, the reader will predict some value and after reading the article realize some value . This average spread will likely affect predictions of future value.
- As is, the model says decreasing and increasing leaves the reader's decision of whether to buy unchanged. But as this becomes a strict paywall, which the gut says people would be less willing to pay for. Another factor in the reader's decision of whether or not to pay is their confidence about their decision. The larger is the more confident they will be about their value prediction since the sample mean's standard deviation will fall, as .
- Paywalls, as described by the New York Times and as currently implemented by the Financial Times and WSJ, are easily bypassed. This can be done either by spoofing the referrer header, or by clearing cookies. This avoidance could also be modeled in in some way.
- Letting people in for free if they come via social media or links from other sites screws everything up. I think this may turn out to be such a huge gaping hole in the paywall that they severely restrict it, but if they don't there are several ways it can be modeled.
You could divide articles between different distributions of those that are primarily found through social media and those that aren't. The reader would choose whether or not to pay based on the value of those that aren't. Alternately, an article's ability to be found through social media could just affect its .
- Print subscribers get free access as well. In Salmon's post he looks at , the difference between print subscriber's fee and online subscribers. If this is less than the value of getting the print paper then the reader will choose the print subscription.
- What if users can choose between a short period, and a longer period with a discount? What does the renewal decision look like?
There are undoubtedly more things that can be done with this model. One of the most obvious is to try and figure out what and should be set to.
Finding good values for F and n
Since it's reader's will not have the same distribution for it would be theoretically ideal to pick values for and individually for every reader. Realistically, the New York Times probably shouldn't be that opaque about their pricing as it would cause confusion and a negative reaction among readers.
If forced to pick a single price, it would be necessary to find the average value of articles for all readers. That's what Stray did with his paywall simulation. However, part of the reason that simulation has such wild swings in revenue from relatively small changes is because many of the variables are dependent on each other. For example, the percentage of people who pay for a subscription does not stay constant when or change.
I'll tackle this issue more in my next post.
Special Bonus! A pricing algorithm for the FT
This part might still be a bit half baked, but working backwards from the consumer's decision, it seems possible to figure out a demand curve for each individual piece of content if enough data is available. Since the Financial Times already has a metered subscription plan, if they've been good about collecting user data they should have what's necessary to do this. Here's an outline of the method.
It requires some change of notation from the above.
Let be an article, and be a reader. We will now represent the value of an article to a reader as a mapping with representing to the value of article to reader . The functions and replace and as the fee and rate for reader . is as before.
Define the set such that iff reads before deciding whether or not to buy.
So our former equation becomes .
Rearranging, we get .
The left side of the above equation is the average value of an article that a reader reads before making the buying decision. So if does buy a subscription, we then know that the average value was at least the right side.
Now that we have an estimate of a given readers average value for content we want to estimate that value across all readers. For any given piece of content, some fixed , to determine its value we sum the average value for content of all readers who read before purchasing, and then divide by the total number of readers (who aren't already subscribers) who've read .
This function is an estimator of the average has for an article.
Now define the set such that iff reads before deciding whether or not to buy a subscription. This set is all non-subscribing readers that read article in the current period, whether or not they've ultimately paid for a subscription by the end of the period or not.
If we take for each in the set , we have a distribution of estimated values for article . That might look something like this.
Finally, to come up with a set value for a specific piece of content, we sum over the entire set and divide by the number of readers.
With this value, you can now derive a demand curve for the entire site. Or you can dynamically set prices based on what articles a reader has viewed before hitting the paywall.
Exciting stuff, if actually implemented.
If you think I've screwed up the math in some way, or if anything isn't clear, please please let me know. The thoughts in this post are still very much a work in progress.