A list of the yearly updates I’ve sent out to close friends and mentors:
A list of the yearly updates I’ve sent out to close friends and mentors:
This is a yearly update I send out to close friends and mentors. This installment is from March 2016.
This year, I’ve refined my academic and career focus and gotten very interested in using data science to tackle problems in biology. I’ll share stories below from my summer at The New York Times, from my travel to Japan and elsewhere, and from my side project at school that is now being used by half the campus. Finally, I’d like to share links to some of the photos and music from 2015.
I’ve had another great year at Princeton. As a junior majoring in computer science, I’ve nearly completed my degree requirements. Now I have the chance to refocus my remaining class slots.
For the last five years, the theme behind my studies and work has been to apply computer science to other fields. I’ve narrowed my computer science interest to data science in particular. The popular definition of “data scientist” is someone who is better at statistics than any software engineer and better at software engineering than any statistician… so my software engineering experience has being quite helpful in this pursuit! And I’ve moved between several application fields – from neuroscience, to economics, to politics, and now to tackling problems in biology and healthcare using data science methods.
A combination of conversations, classes, and readings sparked my recent interest in biology and healthcare. I took a wonderful seminar taught by Professor Shirley Tilghman, a renowned biologist and the previous president of Princeton, that focused on genetics and public policy. We discussed fascinating topics ranging from eugenics, to the role of genetics in the criminal justice system, to direct-to-consumer testing (e.g. the 23andMe–FDA saga), to gene therapy and genetic editing. At the same time, I was having conversations with computational biologists on campus about their work, and reading books like The Emperor of All Maladies, which presents the history of cancer, cancer research, and cancer treatment in such an interesting way; I highly recommend it.
The current semester is a crash course, accompanied by many conversations from which I am trying to ascertain a zeitgeist of biotech:
(Other classes I took over the last year focused on machine learning and statistical theory, artificial intelligence, compilers, microeconomic theory, and linguistics.)
So far, this interest is still quite broad. But as I continue along this dive into biology and healthcare, I’m tracing the connections to technology, and specifically looking for the problems I can solve with a data science approach. This school year, I’ve been working on a research project with my good friend Andrew and with Professor Barbara Engelhardt, whose work lies in the intersection of computer science and biology. Specifically, we are focusing on improving experimental design for CRISPR experiments.
CRISPR is a new, revolutionary technique that makes gene editing straightforward. To edit DNA with CRISPR, all you have to do is design a guide sequence – made of RNA – that will guide CRISPR to the location in the genome that you want to edit. Basically, the guide RNA will bind with the piece of DNA that you want to edit, and then the system’s Cas9 protein will cut the existing genome at that location. If you also put some new DNA nearby, built-in DNA repair mechanisms will incorporate it to fill the gap. Biology researchers are jumping all over this simple and general gene editing technique. Democratizing gene editing will transform the next 50 years the same way democratizing technology and the Internet have transformed the past few decades. Of course, CRISPR is now the subject of a huge patent war between Stanford, Berkeley, MIT, Harvard, and the Broad…
The challenging part in using CRISPR is designing a guide sequence to match the DNA portion you would like to target but not make changes elsewhere in the genome (where similar sequences might appear, perhaps). Researchers have to meticulously tune their RNA sequence construction until the CRISPR system is able to bind precisely to a piece of the gene to be edited. This problem is very similar to tuning parameter settings in many other experimental fields of science (or, for that matter, in machine learning).
Today, biology labs try to do a “grid search” through the parameter space, meaning they slowly try all possible guide sequences (parameter settings) until they find one that works well. A more intelligent technique, named Bayesian optimization, has been developed for the analogous parameter tuning problem in machine learning. Instead of naively trying all parameters, this technique decides which next experiment to run – which parameter settings to try next – by estimating the expected improvement of a new set of parameters or by trying to gain as much information about the problem as possible through the next experiment. While Bayesian optimization works quite well when you have a single-digit number of parameters, it does not scale. CRISPR guide sequences are 20 bases long, and physical experiments often call for even more parameters. Thus, our goal is to scale this intelligent experimental design algorithm to work for higher-dimensional parameter spaces. Moreover, we’d like to produce something that researchers could use interactively in their lab to accelerate science.
I spent summer 2015 working on the data science team at The New York Times, run by Professor Chris Wiggins, a professor at Columbia and a long-term mentor of mine. The team embeds out into different business and newsroom divisions to improve the ways in which content reaches readers and to keep The Times in business. My understanding of our job was to be evangelists of data-informed decision making – an interesting and creative task at a 160-year old company like The Times – and to introduce some more skepticism into the product design process, i.e. to design and run experiments to evaluate hunches and inform business decisions.
I was offered several projects when I joined but instead forged my own path slightly after chatting with many newsroom and business people from across the company. I focused my summer on a particular customer retention challenge as well as on a long-term strategy experiment.
Working with Chris Wiggins and his team was a lot of fun, and being on the data team at NYT was truly a great learning environment, in many ways. I practiced how to reframe business problems as machine learning challenges and how to make the output of statistical analysis interpretable to business leaders – a task that involves careful evangelism of data science in an organization still adapting to the digital world. Chris Wiggins, whose research now focuses on biology, helped me understand how to do this reframing in the biology context as well, which I think will be helpful with my new interest. On a meta level, working cross-functionally / across the organization at NYT and chatting a lot with Chris about his philosophy in team structure and orientation gave me good mental models for how to organize data science teams and taught me about what powers team dynamics.
Besides all that, it was a thrill to spend 10 weeks at The Times. My hope was to immerse myself in the culture and understand the ethos of the people. It’s hard to convey the subtleties I noticed, but there were a few particularly memorable moments:
A colleague and I started a weekly group meeting to individually work through tutorials on technologies we wanted to learn, ranging from Git (which I had used for ages but never sat down and learned fully/correctly) to survival modeling. I’m now starting a similar “doing group” at Princeton to get my hands dirty and learn more about the following:
The highlight of the year was a trip to Japan in August-September. My girlfriend Shannon, who is in the same year at Princeton and studies environmental science and environment studies (a major she designed herself), speaks Japanese, so we were able to visit some fascinating places. After Tokyo and Kyoto, we traveled to some beautiful rural mountain villages like Takayama. I recently took a class on 20th century Japanese history, so it was particularly interesting to see Japanese lifestyles first-hand. We even witnessed an anti-rearmament protest – another cultural subtlety I would have missed without a translator like Shannon!
I also visited friends in Boston and Toronto over school breaks. Most recently, Shannon and I went on a beautiful road trip from the Bay Area to Ashland and Portland, Oregon over Intersession (aka ski week in late January, which exists because Princeton is still in the stone age and has finals after the winter holidays). I’m including pictures at the end!
With a couple of friends, I launched a webapp at Princeton this year that took off. It’s called ReCal, and it helps students pick their courses and design their schedule in a sleek and intuitive way. It so happened that the university decided to pay a vendor an ungodly amount of money to make its own version, called TigerHub, that, frankly, is incredibly confusing and annoying to use. In fact, leading with “Frustrated with TigerHub?” has gotten us to over 3,000 users making schedules in ReCal (and has made some administrators bitter!).
ReCal used to be a class project for Brian Kernighan’s COS 333. Then two of my friends from our class team isolated the class selection component and made that the new ReCal. I rejoined them to finish the project and especially to coordinate the launch. Having wanted to get some more operational experience, it was fun to handle PR and marketing and focus less on the technical aspects.
Now I’m broadening my involvement, for several reasons: I don’t want to paint student creations in a bad light for the administration, my friends are graduating soon, and I’d like to make ReCal less of a hack and more of a killer app. Some institution needs to be involved for ReCal not to suffer the fate of nearly every other Princeton app and fade after I graduate. As a new side project, I’m playing developer advocate by designing a sustainable, long-term hosting and maintenance solution and brokering it between the undergraduate student government and the computer science department. This will keep ReCal alive and will make it far easier for students to launch apps on campus. Since I’ve been playing with Docker a lot recently, I’m building the system on top of Docker containers. A Docker container is just like a virtual machine but without the overhead – you can run many Docker containers on the same machine and they will share resources nicely. That means we can containerize all student apps and standardize how we monitor, backup, and run maintenance on them all. Long story short, it’s a fun project for me to train my dev chops a bit further, and it will help keep my side project alive.
Finally, earlier in the year I worked on a side project with Professor Sam Wang, who does autism research by day and political forecasting by night. I found his work when I noticed a Twitter war between him and Nate Silver of FiveThirtyEight. We decided to look for ways to detect and quantify gerrymandering that are so simple a judge/the legal system could understand them. Though we did not publish our data analysis work, Prof. Wang has separately published an interesting law journal paper with statistical metrics of gerrymandering that is summarized in NYT pieces (1) and (2). Worth a read!
The excitement of being at Princeton has started to wear off, unfortunately. I’m focusing again on my routines to build a stable lifestyle and continue following my excitement. My Mastermind group with friends Andrew and Fiz is continuing – we meet weekly to help each other think through long-term goals and build habits, and especially to continue living deliberately despite the treadmill that is the Princeton experience. I’m currently doubling down on my fitness, piano, and reading habits.
I made good progress on my goals from 2014. First, after a maximum-entropy search process, I’ve found a new specialty: biology. If that ceases to interest me, I’ll continue following the path of things that seem most exciting, and may take a gap year (since I have two extra years working for me) to read broadly and travel. Second, I’ve continued to go deeper into machine learning and data science through classes, the summer, and by getting my hands dirty on my own. Finally, I’ve had more opportunities to hone my operational skills: it was nice to live on the business side at NYT and to handle product management and PR for ReCal.
I’m setting some new goals for 2016:
I’m so happy to be where I am now, and I’m very excited to see what this next year holds in store. I am very grateful to all my close friends and mentors for their kind support and advice. If you have any feedback, I would appreciate it if you could please send it my way!
Thank you for being a part of this chapter of my life, and all the best.
P.S. I always include some multimedia at the end. Here are some pictures from 2015 (click the info button to see descriptions on individual photos). And here is a curated Spotify playlist with some of the music I’ve been listening to.
Finally, here’s a selection of the books I read over the past year:
Having forgotten the root passwords to several of my Ubuntu virtual machines, I searched for ways to crack or reset the passwords. Here’s my solution.
First, you want to get access to a root shell so you can change passwords and other settings. The easiest way to do this is to boot into Ubuntu recovery mode from the GRUB bootloader screen. Select “root” from the options menu that appears. This drops you into a root shell!
Ubuntu now mounts the filesystem as read-only by default, so execute
mount -o rw,remount / to remount it with read-write permissions.
If you don’t have a recovery mode option in your bootloader menu, boot up from a Linux live CD. Open a terminal window, then execute the following:
sudo su # Authenticate as root within your live-CD environment fdisk -l # List the hard drives available on your system, and identify which one holds your Linux setup. In this example, /dev/sda1 is my Linux partition. mkdir /mnt/internalhdd # Create a mounting point mount /dev/sda1 /mnt/internalhdd # Mount your Linux partition chroot /mnt/internalhdd # Change root into your Linux partition. Now you have root access!
Now that you have a root shell, run
passwd root and
passwd [your-username-here] to reset passwords for your accounts.
If you used the Linux live CD method, here are the commands you should execute to safely unmount your Linux partition:
exit # Exit to one level up, i.e. from /mnt/internalhdd root to root on your Linux live CD umount /mnt/internalhdd # Unmount your Linux partition mount # List mounted devices. Confirm that /dev/sda1 is not mounted anywhere else. exit # Exit to one level up, i.e. from root on your Linux live CD to the default user account on your Linux live CD exit # Exit to one level up, i.e. close the terminal window.
I was accidentally featured on the local nightly news on 05/31/13. A journalist called me up that morning to confirm the story, then arranged a 15-minute interview for the evening.
Here is the article the news bureau released the news morning: http://www.10news.com/news/teen-prodigy-headed-to-princeton-university-unable-to-walk-at-high-school-commencement06012013
Originally published in the January 2013 issue of The Tower_, the official newspaper of The Bishop’s School._
A lifetime of creativity, innovation, and passion began with Aaron Swartz’s birth in November 1986. But this was abruptly cut short 26 years later, when US attorneys’ hounding of a bright young man for what was essentially trespassing pushed him to suicide.
The epitome of the 21st century prodigy, Swartz came up at the age of 13 with the idea that Jimmy Wales later developed into Wikipedia. The next year, he coauthored the RSS standard, used today in RSS feeds, real-time documents which present blog content in a standard format for applications like Google Reader to access.
Then Swartz cofounded Reddit, a massive online news community. As an advocate for civic liberties and openness, he started DemandProgress, a group instrumental to the fight against the SOPA and PIPA online censorship bills. And he confronted PACER, the documentation system the United States legal system uses to make court proceedings available online at a fee.
Aiming to make these public domain documents available to all at zero cost, Swartz created a browser extension named RECAP that allowed PACER users to upload accessed documents to the Internet Archive Web site. His creativity in challenging restrictions on public information earned him many powerful enemies among the defenders of the status quo. But most daring of all was his final stunt — downloading 4 million JSTOR documents from an unlocked closet at MIT — and the enemies that action yielded.
To understand how JSTOR works, imagine you are a scientific researcher. You apply for and receive grants from government agencies including the National Science Foundation and the National Institutes of Health, both funded by taxpayer money. Your university collects around half of your grant money as overhead, which is then used to fund the administration, maintain buildings and other infrastructure — and also to purchase subscriptions to electronic publications.
One such digital source of science articles is JSTOR, which makes many publications available online and charges universities around $50,000 a year for access. You receive the remaining funds, do research, and then present the results as a research paper. In order to publish, though, you often have to pay the publisher a fee of several thousand dollars, again coming from that grant money.
Next, the magazine collects this money and asks fellow researchers to review your submission pro-bono. If all goes well, you get published — but your university is then told that to see your work, they must purchase a subscription to the magazine in which you had just paid to be published.
Such is the predicament of the academic, who is a creator earning little financial benefit. Meanwhile, taxpayers pay for science thrice: first they fund the research, then they pay publishing fees, and finally they buy back the final product from publishers in the form of restricted online subscriptions. But sharing information openly is the foundation of science and of innovation. That value stems from our culture’s early days, when Jefferson claimed, “He who receives an idea from me receives [it] without lessening [me], as he who lights his [candle] at mine receives light without darkening me.” Lighting a second candle means the amount of light has doubled; when research papers, data, and ideas are available to more people, innovating is easier.
Swartz simply wanted to allow open access to these papers written by public employees, with public money, and potentially benefitting wide groups of researchers ultimately leading to a faster pace of innovations. He wanted to bring the candle holders together and spread knowledge by doubling the light.
Ideally, our legal system should enable innovation, not prevent the sharing of ideas. JSTOR accepted this reasoning: the company peacefully asked Swartz to return the hard drives containing the downloaded data and modified their system to prevent future bulk downloading. But MIT called in federal investigators and wouldn’t drop the case, even after JSTOR reached its resolution with Swartz. Soon, a grand jury indicted Swartz with 13 felonies. At the time of his death, he faced 35 years in prison and over $1 million in fines — all for downloading articles. Compare that to armed bank robbery, a graver crime which carries a maximum term of 25 years. Attempting to spread knowledge should neither be met with a higher prison sentence, nor should it be considered a crime at all.
Were any bankers hounded with threats of long prison sentences to the brink of suicide during the financial crisis? Arguably, their crimes affected far more people than did Swartz’s trespassing in an unlocked closet at MIT. But U.S. attorneys chose not to pursue them with the law; only Swartz faced a de-facto witch trial.
The difference is that bankers have good lawyers, and it’s easy to defeat defenseless prodigies. Susie Bright, who has served as expert witness for the defense in several obscenity trials, explains in her blog post entitled “I Have Something to Say about Aaron Swartz’s Suicide and the Special Way the US Justice Dept Hounds People to Death”:
[When] the Defense team showed me all their files… I dropped their papers to the floor halfway through my review: “What are we talking about here? This defendant is developmentally disabled…” The Justice Department was bagging obscenity law trophies by going after the poor, the suicidal, the insane, the cognitively impaired— because that’s the way they rack up numbers and status. That’s the way they fuel their careers at the Justice Department— not by taking on constitutional issues, or injustice, or fat cats who believe they’re above the law.
Here, Swartz was the low-hanging fruit. He had published multiple blog posts about suffering from depression. When the US attorneys hit him with enough felony counts for doing something that shouldn’t be a crime, something snapped. This genius did not just commit suicide; he took his own life after facing unwarranted, excessive legal threats for doing what he believed to be the right thing to do.
That day, we lost a man devoted to his ideals of making information free and open for the world. And once again, our legal system showed how effective it is in heartlessly protecting the status quo against easy targets, killing innovation and progress. Be it in patent law or overzealous pursuits of petty criminals, legal power is ruled by money and fear, not by intellect or morality. Aaron Swartz, among others, had the potential to change this world in meaningful ways – and did, even in a short life, through Wikipedia, RSS, Reddit, DemandProgress, and RECAP — but we took his possibilities, his freedom, and his life away.
Until we reform our justice system to support fairness and openness over brutality and greed, our society will continue picking on those who try to fix it.