Fourth in an occasional series on iGEM at MIT.

Earlier, I wrote a post ostensibly about choosing an iGEM project.  When I went back to look at it, though, it was all full of flowery pedagogy and very little about the brass-tacks practices and issues.

I'm happy to report, then, that the overall structure for the IAP experience I outlined in that article has worked well for the last three years.  Recall that we start meeting as a team in the MIT IAP (January term) -- we meet for two hours an evening, the from Thursday of the first week through Tuesday of the last week.  So, if we don't meet on MLK day, there are 13 IAP meetings.

Recall also that the point is not only to help the team choose a project, but also to help them learn while they're doing so -- about iGEM, about synthetic biology, and about research. My strategy has been to explore these topics through the lens of past iGEM projects, both at MIT and elsewhere.  Around the Weiss lab, we generally break a synbio project (in general) and a synthetic gene network (in particular) into three phases -- sensing, processing and actuation. That is, the cells sense either something about the world around them or their internal state; they combine those signals to make some sort of decision or representation; and then they do something as a result of that computation.

To help the team explore these different aspects, I and the other team mentors (students and non-students) meet before IAP to choose some exemplar projects from previous years.  For each of the three topics, we choose three teams whose project embodied that topic done well. We also strive to be diverse -- for example, the sensing projects are generally sensors of some intracellular molecule, some biological extracellular molecule, and something physical (like light.)

We break the iGEM team into three sub-groups and each sub-group will be responsible for presenting one team's work.  The first day, the team mentors work with the students to understand the project and make a (brief! uncomplicated!) presentation about it. The second day, each sub-group presents. This has two goals: first, it helps the entire team learn about all of the projects.  Second, and perhaps equally importantly, these presentations help students get comfortable talking about other peoples' science.

We also give the students a (brief!) template slide deck to help them think about the project -- help them ask the right questions.  There are four slides:

  1. What was the team's goal?  What problem were they trying to solve?
  2. What was the team's approach?  How did they go about solving their problem?
  3. What were the team's results? Did it work?
  4. What transferrable ideas are there? What could you maybe use in your own project?

This also helps students begin to see that there is a traditional structure for talking coherently about science.

We also include a fourth topic for the groups to study: human practices. Here, diversity is particularly important -- teams do all sorts of cool things in outreach, yes, but also in ethics, in IP, in business development, and in other areas.  It's really important to help new iGEMers appreciate the breadth of possibilities so they can choose a human practices aspect about which they are excited.

If exploring other teams' projects takes 8 days, that leaves us 5 days to work on choosing the team's project for this year.  As we go, we remind the students that they should be thinking about problems they might want to work on in the context of a synbio solution, even if they don't know how they might want to approach it.  Then, when we get to day 9, we start with an open whiteboard, and ask people to propose problems they might want to work on.  People can propose multiple problems, or people can decline to pitch one.  After soliciting ideas, each person chooses one problem. A problem can (and frequently does) have multiple people working on it.

Then, we go several rounds of pitching the problems, getting feedback and questions, and refining. Here's where we start thinking concretely about what kind of approach we might take. In general, the approach is written out as what a cell senses, what the cell does in response, and how this solves the problem we set out to address.

This is also an excellent time to think (with the team!) about what makes a good iGEM project.  Note that this is usually condordent with what makes for a good research project, but not always.  In my personal view, a good iGEM project is:

  • Impactful -- ie, it solves an important problem.
  • Exciting -- ie, a synbio approach has the potential to be a much better solution than "traditional" approaches
  • Feasible -- a thing that our lab has the expertise and resources to support.  In general, this means we're a mammalian synbio team, but not always.  (Yeast might also be okay.)
  • Supports strong modeling and/or computation.
  • Supports strong integrated human practices

As the project ideas get more refined, we insist more and more that the people pitching them get more and more explicit about how their proposal fits these ideas, and more and more concrete about the approach they want to take. Then, at the end of IAP, we vote for the first time.

A quick aside about voting -- I have found that formal voting processes work better than informal ones.  Ie, we don't vote by holding up hands in our conference room.  Instead, especially because there are (usually) a large number of possibilities relative to the number of voters, in recent years I have run a Borda count using the Opavote website.  (It's free for small polls.) Instead of producing a "winner", it produces a ranked list, and at this point in the process we generally narrow ourselves down to three top candidates and ask the team to split into subgroups to further refine their ideas.

(Sometimes, everyone wants to work on one project, at which point that is the project we go with.)

At this point the semester begins.  We generally tell the team that we expect them to spend 4 or 5 hours each week on iGEM -- 90 minutes or two hours on a weekly meeting, and 2-3 hours working independently.  Each meeting is just the "update/pitch" part, and we encourage the subgroups not only to work together offline but to consult with the mentors as well.  At this point, the "what problem are we working on and why is it interesting?" part of the conversation is generally pretty solid.  What we continue to work on is the approach -- making it more concrete and more detailed, so we can evaluate feasibility.  I generally give the students a deadline of the end of February, which means they've got 4 or 5 more weeks.  Finally, we revisit the discussion of the things that make for a good iGEM project, then we vote again, again using Opavote, but this time using an Instant Runoff rule.  The team members still rank the projects, but at the end a single choice is made.

One final note. Students can get really invested in their chosen project, which is a good thing -- we want the students working on things that they are interested in, even passionate about. The flip side is, if that project isn't the one that's chosen, those students can be quite disappointed. I always try to address this head-on -- by this point in the semester, I usually think all three projects are really cool. I like to let that enthusiasm show. And I always promise the team that, if they want to come back after iGEM as a "traditional" UROP, we'll work on their project together. (So far, nobody has taken me up on it.  Some day, maybe?)

Next up -- refining the project.

My dad flies his own airplane, a Cessna 182 Skylane in which I have occasionally traveled with him.  I've always been impressed by how safe a pilot he is -- he plans conscientiously, he doesn't cut corners, and he doesn't get in over his head.  Each time he takes off, he performs a full preflight checklist and if there's anything that doesn't check out, you will not go flying with him today.

I was thinking about my dad's checklist recently in a different context.  You see, one of my roles around the Weiss lab for the last year or so has been to help "expedite"  manuscripts.  Ron is an excellent science communicator, but his time is limited and once he becomes involved in preparing a manuscript for submission, he rapidly becomes the rate-limiting step.  The more I can help my colleagues get things in order before he gets involved, the faster we can go from draft to submission.

I've been involved in this role for three manuscripts thus far, and it has been amazing to me how careful, otherwise detail-oriented scientists who have been reading papers for many years miss some of the most basic things about science communication.  Not subtleties about narrative structure or debatable questions about the work's larger import -- I'm talking about "did you label this axis" kinds of missing pieces.

So, in the spirit of my dad's checklist, I present a preflight checklist for manuscripts.  Are your seatbelts buckled?

...continue reading

Wow.  SEED came and went in the blink of an eye -- what a whirl-wind!  The following are some only lightly organized reflections on what went well, what didn't, and what might change next year.


For me, the core of synthetic biology is the ability to solve real-world problems by creating new organisms.  So that's what I wanted the students to do:

  • Choose a real-world problem they were interested in that could be addressed with a new organism
  • Describe that organism's new behavior and how it would address the problem
  • Plan a plasmid or two that implements the new behavior
  • Build those plasmids out of reusable genetic parts (BioBricks)
  • Test their plasmids
  • Communicate their results

....over the course of 8 Saturdays together.  The difficulty, as I outlined in the previous post, is that there's an enormous amount of domain knowledge and operational skill involved in this.  Did it work? Read on!

Project Selection

I left the choice of an interesting problem up to the students.  They each came up with a problem, then wrote it on the whiteboards around the classroom so they could think about their classmates' proposed problems.  Then, they self-assorted into 5 groups around five different problems.

What worked: The students were all really stoked to be working on projects of their own choosing. This was reflected again and again in the feedback we got from the students.

What didn't work: While I had asked the students to develop their own ideas based on a list of parts from the Parts Registry that could be useful in solving them, I'm not sure they understood quite what I was asking for.  Thus, we ended up with 5 cool problems to work on -- only one of which had the pieces we'd need available to us.  Due to an incomplete understanding of the relevant biology, we also ended up with several projects that were biologically impossible.

What could change: I am still thinking about how to balance the authenticity and engagement that you get from allowing the students to choose their own project, with the predictability and improved success rate you get from constraining their choices. I think one way to go about it would be to choose a project area for them to work on, one that had good parts support and a number of "obvious" good projects, then allow the students some choice within the project space.

Circuit Design

I did some reading and came up with a set of related questions that the students could feasibly build plasmids to answer.  But how to give them experience actually doing the plasmid design? I ended up hacking together some "datasheets" for the parts and backbones that would actually be useful, then putting them in folders for each of the projects.  This gave the students a chance to practice thinking about what each part did, how it related to the other parts, and how they would be assembled into transcriptional units.

What worked: There was a lot of engagement with this activity -- by far the best non-lab activity we did.  I think we lost some of the students; but there was also a lot of them that really got it by the end, which was really rewarding.

What didn't work: This was also a significant amount of work on my end -- there was a lot of content to create for what was essentlly a one-off lesson.

What could change: I think that constraining the projects (as above) could ameliorate this substantially.  You could give each group the same set of parts and have them come up with different designs.

Plasmid Constuction

We completed a single BioBricks build cycle:

  • Digest the parts
  • Ligate the digested parts
  • Transform the ligations
  • Miniprep

All this was pretty straightforward on their end.

What worked: They loved (loved!) the lab work.  The more pipetting, the better -- miniprep day was the best, followed by the day they did transformations.

What didn't work: On my end, it required preparing a pretty massive number of parts from the Registry distribution plates.  I also designed and synthesized a number of parts (usually promoters for which repressors were available in the Registry but for which there were no promoters on the plates) -- with all of the attendant issues for a cloning cycle.  In the end, each group tried to build 2 or 3 plasmids -- but I think each group only managed to get one to go together.

Finally, there was at least one group that was quite large, and I was really sad to read in the feedback that at least one student felt like they were stuck watching everyone else pipette but didn't have a chance to themselves.  No bueno.

What could change: More constrained projects mean fewer parts to synthesize (or none at all??), higher success rates, and the possibility for the instructors to build things behind-the-scenes.  Also, I'd love to think about ways to spend more time in the lab.  We were about 50-50 this year, and even though I think the classroom stuff was important, it was clearly less engaging. (One alternative: make the classroom stuff more engaging?)

Experimental Design and Characterization

I asked the students to design their own experiments to test the plasmids.  This worked out okay -- but again, we ran into the trouble of not enough biology background.  We also ran into issues with plasmids they had designed but hadn't managed to build (and that I couldn't get to go together in the intervening week.)  At the end of the day, I pretty much had to propose experiments that were related to the problems they wanted to solve.

The actual experiments were pretty straightforward -- most were growth-rate measurements under different conditions.  (For example: some of the students wanted to work on lead bioremediation for drinking water.  They had a plasmid that constitutively expressed a lead-binding protein.  The experiment: do the engineered coli grow in media containing heavy metals better than an un-engineered strain?)

What worked: For the most part, the students had experiments that could directly test the functionality of at least one of the plasmids they had built. And connecting that experiment back to their system and their problem was good practice.

What didn't work: Sometimes that connection wasn't super-solid. A lot of the disconnect came from not having managed to build all the plasmids that they proposed to make. Changes to improve the success rate would make this more straightforward.

What could change: One of the hardest connections to make in the plasmid design phase is with the experiments you want to do to test them. I'd like to think of some way to de-couple these a little bit, which could make both the design and the subsequent experiments more straightforward.

Science Communication

The last day, the students are expected to give presentations, posters or demonstrations to their parents. This year I asked each group to give a 10-minute talk. I demonstrated what I wanted using an example project we had been discussing all semester. We spent most of the last session together (before the wrap-up session) doing talk planning, then actually working on the slides.

What worked: They all gave talks. They all spoke pretty fluently about the projects they were working on and why they were interesting and what their approach was. The best talks took a deep dive into the human practice implications of their project and did a solid job with data interpretation.

What didn't work: We spent a long time talking about science communication, and even so I thought the structure of the talks was pretty universally shaky.

What could change: I'd love an opportunity for them to communicate more about their science -- learning happens via practice and feedback, and I'd love them to have more opportunities for both.  The question is ... when?


There are clearly things that could change next year. I think it's also really clear that the students got a lot out of SEED this last semester.  There were at least two students who told me they wanted to continue to study bioengineering in college, which I count as a success.  (-:  And maybe some of them will get involved in iGEM, which seems like an obvious extension of this....

Python tools for quantitative, reproducible flow cytometry analysis

Welcome to a different style of flow cytometry analysis. For a quick demo, check out an example IPython notebook.

What's wrong with other packages?

Packages such as FACSDiva and FlowJo are focused on primarily on identifying and counting subpopulations of cells in a multi-channel flow cytometry experiment. While this is important for many different applications, it reflects flow cytometry's origins in separating mixtures of cells based on differential staining of their cell surface markers.

Cytometers can also be used to measure internal cell state, frequently as reported by fluorescent proteins such as GFP. In this context, they function in a manner similar to a high-powered plate-reader: instead of reporting the sum fluorescence of a population of cells, the cytometer shows you the distribution of the cells' fluorescence. Thinking in terms of distributions, and how those distributions change as you vary an experimental variable, is something existing packages don't handle gracefully.

What's different about CytoFlow?

A few things.

An emphasis on metadata. CytoFlow assumes that you are measuring fluorescence on several samples that were treated differently: either they were collected at different times, treated with varying levels of inducers, etc. You specify the conditions for each sample up front, then use those conditions to facet the analysis.

Cytometry analysis conceptualized as a workflow. Raw cytometry data is usually not terribly useful: you may gate out cellular debris and aggregates (using FSC and SSC channels), then compensate for channel bleed-through, and finally select only transfected cells before actually looking at the parameters you're interested in experimentally. CytoFlow implements a workflow paradigm, where operations are applied sequentially; a workflow can be saved and re-used, or shared with your coworkers.

Easy to use. Sane defaults; good documentation; focused on doing one thing and doing it well.

Good visualization. I don't know about you, but I'm getting really tired of FACSDiva plots.

Versatile. Built on Python, with a well-defined library of operations and visualizations that are well separated from the user interface. Need an analysis that CytoFlow doesn't have? Export your workflow to an IPython notebook and use any Python module you want to complete your analysis. Data is stored in a pandas.DataFrame, which is rapidly becoming the standard for Python data management (and will make R users feel right at home.)

Extensible. Adding a new analysis module is simple; the interface to implement is only four functions.

Statistically sound. Ready access to useful data-driven tools for analysis, such as fitting 2-dimensional Gaussians for automated gating and mixture modeling.

Sound like your kind of thing?  Join us.

1 Comment

I deeply appreciate good design in data visualization, and this jumped out of my news queue today.

Conflicting views: Public versus scientists

I'm not going to comment on the content, except to say that for the most part I align myself with "AAAS scientists" -- no surprise, right?  But imagine, for a moment, this data presented as a bar graph: "public" in red and "science" in blue.  Doesn't this do a much better job conveying both "magnitude" and "difference"?

All living organisms face the same problem: their DNA is much longer than their cells.  If you took the DNA from a single human cell and stretched it all out end-to-end, it would be about 1 meter long!  Not only do the cells have to fit all that DNA in there, they have to be able to access it - to transcribe it, to copy it, etc.

Prokaryotes and eukaryotes solve these problems in different ways (as you might expect: remember, one of the ways prokaryotes and eukaryotes are different is that prokaryotic cells don't have a nucleus.)  Prokaryotes solve the problem by supercoiling their DNA: imagine taking a piece of rope, pinning down one end and then twisting the other.  Eventually the rope starts wrapping around itself; and as you continue to add twists, the wrapping gets tighter and the end-to-end length gets shorter.  Prokaryotes have a set of enzymes that supercoil DNA to pack it tightly, and another set that selectively uncoils it when it needs to be accessed or copied.  Many of these proteins are present only in prokaryotes and not eukaryotes, which makes them a good target for antibiotics.

Eukaryotes solve the problem differently, wrapping their DNA around tetrameric protein cores called histones into a 10 nm-wide fibre that, close up, looks like "beads on a string."

DNA beads on a string.  Image: Figure 31-19, Biochemistry, 6th ed, Stryer

These chromatin fibers are further squeezed together into higher-order structures, the sum of which is called chromatin: the gooey mass of DNA and proteins that together hold each cell's genetic information intact.  Far from being random, these higher-order structures form something akin to a fractal globule, a self-organizing structure that achieves tight packing without becoming knotted.  Oh, and it's quite visually striking too:

Fractal globule genome.
Fractal globule genome. Ashok Cutkosky, Najeeb Tarazi, Erez Lieberman-Aiden, via BioTechniques


Two things to note.  First, the fact that the DNA reproducibly self-organizes at this level explains the phenomenon of DNA transregulatory elements, where a spot on the genome regulates gene expression at loci many millions of bases away: just because they're distant in linear "genome" space, doesn't mean that they're far away in actual space.

Second, genome architecture provides another layer of regulation for gene control.  Some parts of the DNA hairball are open, accessible for transcription (these genes are "on"), and some parts of the DNA hairball are closed, compacted, inaccessible (these genes are "off").  What I find particularly wacky, and what got me thinking about this in the first place, is that these structural changes seem directly related to cell type.  That is, the DNA in a skin cell and a liver cell may have exactly the same sequence, the same genetic "program", but because the DNA is arranged differently different parts of the program are "running."

And yes, this means that if I could take a skin cell and change the parts of the DNA that are on and off, I might be able to make it into a liver cell, or a brain cell, or a heart cell.  This is one of the hottest areas of regenerative medicine research right now.  Soon, if you get hepatitis and need a new liver, you won't have to wait for someone to die and take theirs -- you'll donate some skin cells (or some fat cells) and three months later you'll have a new liver (well, some liver-like tissue) waiting for you in a jar.

This is also (one of) the reason(s) why biomedical science didn't end when the human genome was sequenced.  (Not that it's finished, even a decade after it was declared finished.)  Not only do we still not know what all that DNA does; there are several layers of regulation that determine whether a piece of genome is active or not, and sorting out all those relationships will provide graduate projects for a long time yet.

IPython logo

I'm attending the last day of the Keystone Symposium on Precision Genome Engineering and Synthetic Biology.  The afternoons are free, and the skiing is kind of weak, so when I need a break from TALENs and Cas9 (so much Cas9), I'm learning Python.

What's particularly interesting is the community that's trying to position Python as the next big thing in scientific computing; the successor to R, MATLAB, Mathematica, etc.  I used to think of Python as a "programming language" like C or Java or PERL, where you wrote a program to do what you want, then ran it on your data.  (And there are plenty of resources to support using it that way; PyDev comes to mind.)  I knew from my first brush with it 15 years ago (!!) that it had a REPL interface: you can bring up a Python "command line" and type expressions in, and the interpreter will evaluate them for you and give you the answer.  I didn't really think much of it; I figured it was useful for noodling around, learning the language, debugging, etc.

Boy was I wrong.

IPython is a Python shell with proper support for interactive computing, like R or MATLAB.  It extends "traditional" Python with support for parallel and distributed computing, tight integration with several visualization toolkits, and a browser-based notebook that lets you record your data analysis workflow along with the results, and then share the whole thing trivially with coworkers and collaborators.  It makes literate programming absolutely effortless.

(I should note that IPython isn't the only player in this space; Spyder and Enthought Canopy are two of the other efforts to make Python well-suited for interactive scientific programming.)

The other part of the equation is a set of libraries for data handling and analysis.  SciPy and SAGE are two "meta" libraries, bundling together a lot of mature software for importing, manipulating and analyzing data; building and running models; doing computational experiments, etc.  I was particularly happy to discover pandas, a library for handling structured data similar to data frames in R.  The toolkit isn't quite as developed as R or MATLAB, but it's growing as companies embrace the open source ethos of using Python tools for their own work, improving those tools and then contributing their improvements back to the community.  The adoption seems to be particularly strong in the academic community; it even saw a spot on Nature.com recently.

Which brings me to reproducible research.  Philip Bourne is one of my science idols; he was the founding editor-in-chief of PLoS Computational Biology and the originator of the "Ten Simple Rules" series (if you are a researcher in any field and you haven't browsed these, you should!).  He has long been an advocate of reproducible research, but especially in computer science and computational biology it can be difficult to document exactly the steps you took to generate your data or do your analysis.  The last time I heard him speak on the subject, he was advocating standard directory layouts to organize data and using GNU Make to automate the running of tools, programs and scripts.  Clunky and time-consuming to say the least.

An IPython notebook completely obviates that.  It lets you record exactly what you did (the Python code) along with the rationale (in beautiful rich-text) and the output, all stored in one place.  It makes publishing your work so that others can reproduce it trivial, but the importance goes way beyond that.  I've learned the hard way that keeping a good notebook isn't for some speculative person who picks up my work when I'm gone, it's for me-in-six-months.  Keeping track of where I've been mentally, and what I've tried that didn't work (or occasionally did), is astoundingly important ... and anything that can make that easier is something that I'll adopt enthusiastically.

So, now I'm a Python enthusiast.  Not looking forward to scaling the learning curve, but the underlying language makes a lot more sense to me than, say, R (which I've been using for a decade and still don't feel particularly comfortable in.)  if only I could get easy integration between IPython and my Drupal-based online notebook.....

Postscript - I know that Mathematica has had a notebook interface for something like 5 years.  IPython's strikes me as more flexible, better looking, based on open standards, and you can get it without paying a zillion dollars.  (-:

In response to Maria's latest post:

In their study "Effects of High Dementor Density on Health Outcomes, Including Soul Loss, in Graduate Students", Sundaram et al. propose the intriguing hypothesis that dementor colonization may be responsible for the apathy and despair commonly associated with graduate studies. They measure both dementor-related environmental factors and health outcomes among a population of public health graduate students; observing a strong correlation between the two, the authors conclude that evidence exists for a causal relationship.

Sundaram et al. have identified a timely, important problem that inexplicably has not been addressed by other researchers in the field. This reviewer laments his own shortsightedness in this regard; I read the books, what, ten years ago? Despite a limited sample size, questionable ethical standards and shoddy statistical analyses, the study's results are highly suggestive and deserve further investigation. It is unfortunate that the authors stopped short of an interventional study, given that cleaning the fucking microwave takes like five minutes, I mean really. I also would have liked to see some consideration given to other possible causes for student soullessness, including professors that ask for five data slides for their talk and then don't use any of them; coworkers that use the last of the molecular weight standard and then don't order any more; and mice that escape their cages, then get killed in mousetraps because the animal facility has a rodent problem.

Recommendation: accept with revisions.