My dad flies his own airplane, a Cessna 182 Skylane in which I have occasionally traveled with him. I've always been impressed by how safe a pilot he is -- he plans conscientiously, he doesn't cut corners, and he doesn't get in over his head. Each time he takes off, he performs a full preflight checklist and if there's anything that doesn't check out, you will not go flying with him today.
I was thinking about my dad's checklist recently in a different context. You see, one of my roles around the Weiss lab for the last year or so has been to help "expedite" manuscripts. Ron is an excellent science communicator, but his time is limited and once he becomes involved in preparing a manuscript for submission, he rapidly becomes the rate-limiting step. The more I can help my colleagues get things in order before he gets involved, the faster we can go from draft to submission.
I've been involved in this role for three manuscripts thus far, and it has been amazing to me how careful, otherwise detail-oriented scientists who have been reading papers for many years miss some of the most basic things about science communication. Not subtleties about narrative structure or debatable questions about the work's larger import -- I'm talking about "did you label this axis" kinds of missing pieces.
So, in the spirit of my dad's checklist, I present a preflight checklist for manuscripts. Are your seatbelts buckled?
Wow. SEED came and went in the blink of an eye -- what a whirl-wind! The following are some only lightly organized reflections on what went well, what didn't, and what might change next year.
For me, the core of synthetic biology is the ability to solve real-world problems by creating new organisms. So that's what I wanted the students to do:
- Choose a real-world problem they were interested in that could be addressed with a new organism
- Describe that organism's new behavior and how it would address the problem
- Plan a plasmid or two that implements the new behavior
- Build those plasmids out of reusable genetic parts (BioBricks)
- Test their plasmids
- Communicate their results
....over the course of 8 Saturdays together. The difficulty, as I outlined in the previous post, is that there's an enormous amount of domain knowledge and operational skill involved in this. Did it work? Read on!
I left the choice of an interesting problem up to the students. They each came up with a problem, then wrote it on the whiteboards around the classroom so they could think about their classmates' proposed problems. Then, they self-assorted into 5 groups around five different problems.
What worked: The students were all really stoked to be working on projects of their own choosing. This was reflected again and again in the feedback we got from the students.
What didn't work: While I had asked the students to develop their own ideas based on a list of parts from the Parts Registry that could be useful in solving them, I'm not sure they understood quite what I was asking for. Thus, we ended up with 5 cool problems to work on -- only one of which had the pieces we'd need available to us. Due to an incomplete understanding of the relevant biology, we also ended up with several projects that were biologically impossible.
What could change: I am still thinking about how to balance the authenticity and engagement that you get from allowing the students to choose their own project, with the predictability and improved success rate you get from constraining their choices. I think one way to go about it would be to choose a project area for them to work on, one that had good parts support and a number of "obvious" good projects, then allow the students some choice within the project space.
I did some reading and came up with a set of related questions that the students could feasibly build plasmids to answer. But how to give them experience actually doing the plasmid design? I ended up hacking together some "datasheets" for the parts and backbones that would actually be useful, then putting them in folders for each of the projects. This gave the students a chance to practice thinking about what each part did, how it related to the other parts, and how they would be assembled into transcriptional units.
What worked: There was a lot of engagement with this activity -- by far the best non-lab activity we did. I think we lost some of the students; but there was also a lot of them that really got it by the end, which was really rewarding.
What didn't work: This was also a significant amount of work on my end -- there was a lot of content to create for what was essentlly a one-off lesson.
What could change: I think that constraining the projects (as above) could ameliorate this substantially. You could give each group the same set of parts and have them come up with different designs.
We completed a single BioBricks build cycle:
- Digest the parts
- Ligate the digested parts
- Transform the ligations
All this was pretty straightforward on their end.
What worked: They loved (loved!) the lab work. The more pipetting, the better -- miniprep day was the best, followed by the day they did transformations.
What didn't work: On my end, it required preparing a pretty massive number of parts from the Registry distribution plates. I also designed and synthesized a number of parts (usually promoters for which repressors were available in the Registry but for which there were no promoters on the plates) -- with all of the attendant issues for a cloning cycle. In the end, each group tried to build 2 or 3 plasmids -- but I think each group only managed to get one to go together.
Finally, there was at least one group that was quite large, and I was really sad to read in the feedback that at least one student felt like they were stuck watching everyone else pipette but didn't have a chance to themselves. No bueno.
What could change: More constrained projects mean fewer parts to synthesize (or none at all??), higher success rates, and the possibility for the instructors to build things behind-the-scenes. Also, I'd love to think about ways to spend more time in the lab. We were about 50-50 this year, and even though I think the classroom stuff was important, it was clearly less engaging. (One alternative: make the classroom stuff more engaging?)
Experimental Design and Characterization
I asked the students to design their own experiments to test the plasmids. This worked out okay -- but again, we ran into the trouble of not enough biology background. We also ran into issues with plasmids they had designed but hadn't managed to build (and that I couldn't get to go together in the intervening week.) At the end of the day, I pretty much had to propose experiments that were related to the problems they wanted to solve.
The actual experiments were pretty straightforward -- most were growth-rate measurements under different conditions. (For example: some of the students wanted to work on lead bioremediation for drinking water. They had a plasmid that constitutively expressed a lead-binding protein. The experiment: do the engineered coli grow in media containing heavy metals better than an un-engineered strain?)
What worked: For the most part, the students had experiments that could directly test the functionality of at least one of the plasmids they had built. And connecting that experiment back to their system and their problem was good practice.
What didn't work: Sometimes that connection wasn't super-solid. A lot of the disconnect came from not having managed to build all the plasmids that they proposed to make. Changes to improve the success rate would make this more straightforward.
What could change: One of the hardest connections to make in the plasmid design phase is with the experiments you want to do to test them. I'd like to think of some way to de-couple these a little bit, which could make both the design and the subsequent experiments more straightforward.
The last day, the students are expected to give presentations, posters or demonstrations to their parents. This year I asked each group to give a 10-minute talk. I demonstrated what I wanted using an example project we had been discussing all semester. We spent most of the last session together (before the wrap-up session) doing talk planning, then actually working on the slides.
What worked: They all gave talks. They all spoke pretty fluently about the projects they were working on and why they were interesting and what their approach was. The best talks took a deep dive into the human practice implications of their project and did a solid job with data interpretation.
What didn't work: We spent a long time talking about science communication, and even so I thought the structure of the talks was pretty universally shaky.
What could change: I'd love an opportunity for them to communicate more about their science -- learning happens via practice and feedback, and I'd love them to have more opportunities for both. The question is ... when?
There are clearly things that could change next year. I think it's also really clear that the students got a lot out of SEED this last semester. There were at least two students who told me they wanted to continue to study bioengineering in college, which I count as a success. (-: And maybe some of them will get involved in iGEM, which seems like an obvious extension of this....
Python tools for quantitative, reproducible flow cytometry analysis
Welcome to a different style of flow cytometry analysis. For a quick demo, check out an example IPython notebook.
What's wrong with other packages?
Packages such as FACSDiva and FlowJo are focused on primarily on identifying and counting subpopulations of cells in a multi-channel flow cytometry experiment. While this is important for many different applications, it reflects flow cytometry's origins in separating mixtures of cells based on differential staining of their cell surface markers.
Cytometers can also be used to measure internal cell state, frequently as reported by fluorescent proteins such as GFP. In this context, they function in a manner similar to a high-powered plate-reader: instead of reporting the sum fluorescence of a population of cells, the cytometer shows you the distribution of the cells' fluorescence. Thinking in terms of distributions, and how those distributions change as you vary an experimental variable, is something existing packages don't handle gracefully.
What's different about CytoFlow?
A few things.
An emphasis on metadata. CytoFlow assumes that you are measuring fluorescence on several samples that were treated differently: either they were collected at different times, treated with varying levels of inducers, etc. You specify the conditions for each sample up front, then use those conditions to facet the analysis.
Cytometry analysis conceptualized as a workflow. Raw cytometry data is usually not terribly useful: you may gate out cellular debris and aggregates (using FSC and SSC channels), then compensate for channel bleed-through, and finally select only transfected cells before actually looking at the parameters you're interested in experimentally. CytoFlow implements a workflow paradigm, where operations are applied sequentially; a workflow can be saved and re-used, or shared with your coworkers.
Easy to use. Sane defaults; good documentation; focused on doing one thing and doing it well.
Good visualization. I don't know about you, but I'm getting really tired of FACSDiva plots.
Versatile. Built on Python, with a well-defined library of operations and visualizations that are well separated from the user interface. Need an analysis that CytoFlow doesn't have? Export your workflow to an IPython notebook and use any Python module you want to complete your analysis. Data is stored in a pandas.DataFrame, which is rapidly becoming the standard for Python data management (and will make R users feel right at home.)
Extensible. Adding a new analysis module is simple; the interface to implement is only four functions.
Statistically sound. Ready access to useful data-driven tools for analysis, such as fitting 2-dimensional Gaussians for automated gating and mixture modeling.
Sound like your kind of thing? Join us.
I deeply appreciate good design in data visualization, and this jumped out of my news queue today.
I'm not going to comment on the content, except to say that for the most part I align myself with "AAAS scientists" -- no surprise, right? But imagine, for a moment, this data presented as a bar graph: "public" in red and "science" in blue. Doesn't this do a much better job conveying both "magnitude" and "difference"?
All living organisms face the same problem: their DNA is much longer than their cells. If you took the DNA from a single human cell and stretched it all out end-to-end, it would be about 1 meter long! Not only do the cells have to fit all that DNA in there, they have to be able to access it - to transcribe it, to copy it, etc.
Prokaryotes and eukaryotes solve these problems in different ways (as you might expect: remember, one of the ways prokaryotes and eukaryotes are different is that prokaryotic cells don't have a nucleus.) Prokaryotes solve the problem by supercoiling their DNA: imagine taking a piece of rope, pinning down one end and then twisting the other. Eventually the rope starts wrapping around itself; and as you continue to add twists, the wrapping gets tighter and the end-to-end length gets shorter. Prokaryotes have a set of enzymes that supercoil DNA to pack it tightly, and another set that selectively uncoils it when it needs to be accessed or copied. Many of these proteins are present only in prokaryotes and not eukaryotes, which makes them a good target for antibiotics.
Eukaryotes solve the problem differently, wrapping their DNA around tetrameric protein cores called histones into a 10 nm-wide fibre that, close up, looks like "beads on a string."
These chromatin fibers are further squeezed together into higher-order structures, the sum of which is called chromatin: the gooey mass of DNA and proteins that together hold each cell's genetic information intact. Far from being random, these higher-order structures form something akin to a fractal globule, a self-organizing structure that achieves tight packing without becoming knotted. Oh, and it's quite visually striking too:
Two things to note. First, the fact that the DNA reproducibly self-organizes at this level explains the phenomenon of DNA transregulatory elements, where a spot on the genome regulates gene expression at loci many millions of bases away: just because they're distant in linear "genome" space, doesn't mean that they're far away in actual space.
Second, genome architecture provides another layer of regulation for gene control. Some parts of the DNA hairball are open, accessible for transcription (these genes are "on"), and some parts of the DNA hairball are closed, compacted, inaccessible (these genes are "off"). What I find particularly wacky, and what got me thinking about this in the first place, is that these structural changes seem directly related to cell type. That is, the DNA in a skin cell and a liver cell may have exactly the same sequence, the same genetic "program", but because the DNA is arranged differently different parts of the program are "running."
And yes, this means that if I could take a skin cell and change the parts of the DNA that are on and off, I might be able to make it into a liver cell, or a brain cell, or a heart cell. This is one of the hottest areas of regenerative medicine research right now. Soon, if you get hepatitis and need a new liver, you won't have to wait for someone to die and take theirs -- you'll donate some skin cells (or some fat cells) and three months later you'll have a new liver (well, some liver-like tissue) waiting for you in a jar.
This is also (one of) the reason(s) why biomedical science didn't end when the human genome was sequenced. (Not that it's finished, even a decade after it was declared finished.) Not only do we still not know what all that DNA does; there are several layers of regulation that determine whether a piece of genome is active or not, and sorting out all those relationships will provide graduate projects for a long time yet.
I'm attending the last day of the Keystone Symposium on Precision Genome Engineering and Synthetic Biology. The afternoons are free, and the skiing is kind of weak, so when I need a break from TALENs and Cas9 (so much Cas9), I'm learning Python.
What's particularly interesting is the community that's trying to position Python as the next big thing in scientific computing; the successor to R, MATLAB, Mathematica, etc. I used to think of Python as a "programming language" like C or Java or PERL, where you wrote a program to do what you want, then ran it on your data. (And there are plenty of resources to support using it that way; PyDev comes to mind.) I knew from my first brush with it 15 years ago (!!) that it had a REPL interface: you can bring up a Python "command line" and type expressions in, and the interpreter will evaluate them for you and give you the answer. I didn't really think much of it; I figured it was useful for noodling around, learning the language, debugging, etc.
Boy was I wrong.
IPython is a Python shell with proper support for interactive computing, like R or MATLAB. It extends "traditional" Python with support for parallel and distributed computing, tight integration with several visualization toolkits, and a browser-based notebook that lets you record your data analysis workflow along with the results, and then share the whole thing trivially with coworkers and collaborators. It makes literate programming absolutely effortless.
The other part of the equation is a set of libraries for data handling and analysis. SciPy and SAGE are two "meta" libraries, bundling together a lot of mature software for importing, manipulating and analyzing data; building and running models; doing computational experiments, etc. I was particularly happy to discover pandas, a library for handling structured data similar to data frames in R. The toolkit isn't quite as developed as R or MATLAB, but it's growing as companies embrace the open source ethos of using Python tools for their own work, improving those tools and then contributing their improvements back to the community. The adoption seems to be particularly strong in the academic community; it even saw a spot on Nature.com recently.
Which brings me to reproducible research. Philip Bourne is one of my science idols; he was the founding editor-in-chief of PLoS Computational Biology and the originator of the "Ten Simple Rules" series (if you are a researcher in any field and you haven't browsed these, you should!). He has long been an advocate of reproducible research, but especially in computer science and computational biology it can be difficult to document exactly the steps you took to generate your data or do your analysis. The last time I heard him speak on the subject, he was advocating standard directory layouts to organize data and using GNU Make to automate the running of tools, programs and scripts. Clunky and time-consuming to say the least.
An IPython notebook completely obviates that. It lets you record exactly what you did (the Python code) along with the rationale (in beautiful rich-text) and the output, all stored in one place. It makes publishing your work so that others can reproduce it trivial, but the importance goes way beyond that. I've learned the hard way that keeping a good notebook isn't for some speculative person who picks up my work when I'm gone, it's for me-in-six-months. Keeping track of where I've been mentally, and what I've tried that didn't work (or occasionally did), is astoundingly important ... and anything that can make that easier is something that I'll adopt enthusiastically.
So, now I'm a Python enthusiast. Not looking forward to scaling the learning curve, but the underlying language makes a lot more sense to me than, say, R (which I've been using for a decade and still don't feel particularly comfortable in.) if only I could get easy integration between IPython and my Drupal-based online notebook.....
Postscript - I know that Mathematica has had a notebook interface for something like 5 years. IPython's strikes me as more flexible, better looking, based on open standards, and you can get it without paying a zillion dollars. (-:
In response to Maria's latest post:
In their study "Effects of High Dementor Density on Health Outcomes, Including Soul Loss, in Graduate Students", Sundaram et al. propose the intriguing hypothesis that dementor colonization may be responsible for the apathy and despair commonly associated with graduate studies. They measure both dementor-related environmental factors and health outcomes among a population of public health graduate students; observing a strong correlation between the two, the authors conclude that evidence exists for a causal relationship.
Sundaram et al. have identified a timely, important problem that inexplicably has not been addressed by other researchers in the field. This reviewer laments his own shortsightedness in this regard; I read the books, what, ten years ago? Despite a limited sample size, questionable ethical standards and shoddy statistical analyses, the study's results are highly suggestive and deserve further investigation. It is unfortunate that the authors stopped short of an interventional study, given that cleaning the fucking microwave takes like five minutes, I mean really. I also would have liked to see some consideration given to other possible causes for student soullessness, including professors that ask for five data slides for their talk and then don't use any of them; coworkers that use the last of the molecular weight standard and then don't order any more; and mice that escape their cages, then get killed in mousetraps because the animal facility has a rodent problem.
Recommendation: accept with revisions.
My first post. More coming soon...