End of Year Report: GSoC 2010

gsoc2010This was our 4th year participating in the Google Summer of Code program and we reached some new milestones. We mentored 10 excellent students (more than any prior year) with a 100% success rate. We integrated and released more code from this summer’s harvest than in prior years. And, most importantly, we’ve continue to expand our development community, including many of this year’s students who are enthusiastic to continue working with us and also a new organization, Reactome, joining the open source efforts of GenMAPP, Cytoscape, PathVisio and WikiPathways. Our projects this year covered a broad range of topics:

  • Alternative Splicing Analysis Plugin for Cytoscape
  • CyAnnotator and CyAnimator Plugins
  • User Interface Development in PathVisio
  • Tools for Exploring Pathway Relations in WikiPathways
  • Expression Data Reader plugin for Cytoscape
  • Improving Cytoscape’s Labels Experience
  • KEGG Global Map Browser
  • Semantic Network Summary for Cytoscape
  • Reactome-WikiPathways Converter
  • Edge-Weighted Layout for Cytoweb

As part of the open source experience, we invite our GSoC students to our annual Cytoscape Retreat. This is a great way to engage students in both our development and user communities. One student pointed out a truism that is rediscovered from time to time in our digital age, face-to-face meetings turn out to be very efficient.” Here are some other gems of reflection and advice from our students this year:

The most rewarding part was when I was told that I should merge my changes back from my branch into the trunk :)

It has been the chance to meet and interact with wonderful people from various parts of the world, be it virtual or physical. I had a chance to physically meet another graduate student from my university and a professor from USA due to GSoC.

They opened up my perspective about a lot of things — how the industry looks like, where people with similar skill domain as me put themselves in the society, how important the projects I am involved in are, and other subjects unimaginable if I were to not join GSoC.

Got a taste of open-source development which is just amazing and I would like to keep attached with this project even after this GSOC ends.

This program is a great initiative, I loved the amount of exposure the participating students get and it definitely is one of the most exciting summers someone can ever get.

The most rewarding part is to be able to go to the cytoscape retreat. It is absolutely helpful to the project, and helpful to get to know the mentors and others.

Be the best user of the software. If you are the best user, you write and participate [in] the software project spontaneously.

At the beginning of the summer, I really had my doubts on whether or not I had gotten in too far over my head. So I very much enjoy being able to look back at what I was able to accomplish and realize that I was able to supersede my original expectations for myself.

Don’t be afraid to ask questions. Your mentors are an amazing source of information, and they are really interested in helping you in any way possible.

Be cool.

And check out our students’ blogs:

Update on Thesis Project… Seven Years Later

bk_data_mapping
I’m pleased to report that the aims of my 1998-2003 Rockefeller thesis are finally complete! With the solution of the human BK channel C-terminal domain structure, we finally have an accurate, high-resolution model for the activation apparatus, upon which we can map decades of functional data and assess mechanistic models.  Remarkably, the structure shares much in common with the homology model we constructed over 7 years ago based on scant sequence identity, structures from bacterial homologs and supporting electrophysiological evidence. The structure confirms the identity, location and critical features of two RCK domains within the C-terminal sequence, as well as the relative location of the dominant Ca2+ bowl.

I had the unique honor of being invited back to Rod MacKinnon’s lab, seven years later, to contribute to the interpretation and description of this work. It was an incredible week of science being back at Rockefeller, in Rod’s lab, and pouring over sequence, structure and function data. The bulk of the credit for this latest piece of the puzzle, of course, goes to the postdocs and grad students who continued the work after me. But it completes a dream of mine to finally see The Structure.

Better later than never!

Software Carpentry

under-constructionFor over a decade, Greg Wilson has been teaching scientists how to use computers effectively through the Software Carpentry. Yes, even scientists struggle with computer proficiency! Greg is dedicating the next year to a major upgrade of the materials and online presentation. Check out the plans. Spread the word and get involved!

The Archaeology of Innovation

In yet another mind-blowing seminar at the Long Now Foundation, Professor Sander E. van der Leeuw condensed millions of years of biological evolution, the archaeology of tools, cities and empires, and revolutions of innovation into a cogent socioecological presentation. He began by reviewing the research on short-term working memory (STWM) and encephalization as a metrics for quantifying human evolution and human development. For example, an adult human can keep approximately 7 concepts in short-term memory, STWM = 7 (+/- 2), while chimpanzees demonstrate an STWM of 3, which is the same as a human infant. By analyzing the complexity of tools made by human ancestors, van der Leeuw presented a timeline of STWM increasing in step with mastery over 2D and 3D concepts (e.g., blade lines and spear heads) and eventually composition and staged manufactoring, i.e., the 4th dimension of time. All of this was to make the point that we arrived at our current biological evolutionary state around 10,000 years ago and haven’t changed much since. This marks the begining of an era of innovation cycles that were not possible before such evolution and that are perhaps inescapable without further evolution.

The innovation cycles can be understood in the context of villages forming cities, and clusters of cities forming empires. Now imagine the abstraction of energy (e.g., food, water, infrastructure) going in to these increasingly complex  organizations while innovation is coming out. Cities require energy and in turn support innovation; Innovation allows growth, which increases energy demand. If they fail to innovate (due to depleted resources or mounting infrastructure problems), then they collapse. Yet innovation leads to a cascade of new challenges that must be paced in order to manage the risk spectrum. It’s basically one great big Ponzi scheme, where we are wholly dependent on innovation. There is a perceptual cycle at work here as well. A cycle between us adapting to nature and adapting nature to us.

The major innovations thus far have been our mastery over spatial and temporal dimensions (tools, writing, agriculture, etc) and our mastery of energy (Industrial Revolution). We are now in the beginning of an Information Revolution, which must provide new solutions to the problems generated by past innovations. Can we effectively increase our STWM by reducing the dimensions of problem space using computation? Can we understand the phenomena of innovation itself to attempt sustainability? Can we distribute information control to the individual, allowing population densities to spread and define new city-concepts that are more robust? How will we manage the coming innovations in nanotech and biotech as a society? What new challenges will these innovations create?

Writing a Book Collaboratively (and Quick)

fmPut 8 people in a room for 48 hours and produce an 89 page (18,000 word) document! It’s called a Book Sprint. It’s rapid, collaborative authoring. I participated in a Sprint last week to crank out a guide for Google Summer of Code mentors. The endeavor was organized by GSoC admin, LH, and facilitated by Adam Hyde of FLOSS Manuals. Overall, the experience was surprisingly enjoyable and effective.

We did a bit of prep work via email before meeting. This included brainstorming on chapter titles and a rough index. The main hurdle here was to identify a clear focus  and target audience to make sure we were setting a tangible goal. Then on Day-1, we introduce ourselves (other than by a few emails, most of us didn’t even know each other), got a brief overview of the process, and got to work. We probably spent another hour and a half on the index, leaving it intentionally flexible. And then we dove into writing parts of chapters that we felt inspired to write. It was a very loose, flowing process; you were encouraged to follow your muse. Adam had us using a customized wiki to manage the collaborative editing of chapters. Pro Tip: they’ll be releasing a new and improved interface called Booki in the next couple of months. It was critical that we did not assign chapters to individuals and work independently. The momentum and synergy that build from rapid cycles of writing-proofing-editing distinguish the Book Sprint from more traditional collaborative authoring practices. By the end of the first day, we had collectively written 12,000 words and estimated that we were 60% done.

On Day-2 we prioritized the remaining work, e.g., there were a few chapters that were still blank. By lunchtime we were ready to print out the first draft. We broke into groups of 2 or 3 and proofread sections that we had up to that point contributed to the least. This was an opportunity to tighten-up the prose and address the overall flow by dovetailing chapters and considering large-scale rearrangements. It was also an opportunity to appreciate the writing and editorial skills of my colleagues. A priori, I would not have guessed that proficiency in writing open source code correlates with writing skills in general, but our GSoC Mentoring team makes a compelling case. After we ran out of red ink, we regrouped to discuss our sections and the major changes we were considering. A few hours later and we were nearly done.  A last minute dissection of a chapter. A couple passes over the entire document. Some final formating. Fin.

We had printed and bound copies in our hands the next day.  Very satisfying. I’d recommend the process to any FLOSS project that finds itself in need of documentation (user manuals, developer guides, overviews of project context/field/scope, etc), but in short supply of time and resources. Sound familiar?

Humanity of the Scientist

lucretiusAs an undergraduate student, I overloaded every semester (including summers) to indulge my burgeoning interests. It was all quite fulfilling, and I didn’t feel rushed, but there was a noticeable dearth of relevant humanities offerings. There are always “science for non-majors” courses, as there should be, but I’ve never heard of a “humanities for non-majors” or, better yet, “humanities for scientists”. Putting the history and philosophy of science into its full and rich context as one of humanity’s major driving forces would serve students well. It would not only enrich the content they’d be absorbing in science courses, but, more importantly, it would apply to their lives beyond the university walls and beyond their career pursuits. I’m going to start a list of topics that would constitute the outline for a “humanities for scientists” course. Who knows? Maybe I’ll even teach it someday! The list begins with a recent read that inspired this whole idea…

“On the Nature of Things” by Lucretius

The poet and philosopher Titus Lucretius Carus wrote “On the Nature of Things” sometime around 60 B.C.E., reintroducing the tenets of Greek Epicureanism (dating back to 300 B.C.E.) to his Roman audience. This work aims to explain through reasoning and observation the nature of everything. While it is easy to find flaws in a physical description of matter, life and the universe dating this far back, what is remarkable is how much is right, or at least on the right track. And you can’t help but be impressed by his rigorous thinking, bold anti-dogmatism, and beautiful expression of the freedom and enlightenment that derive from a reasoned worldview. Here is a sampling of the topics he covers:

  • the atomic nature of matter and the resulting properties of compounds
  • self-assembly and the physical laws of nature
  • astronomy and life on other planets
  • conception, death and decomposition
  • the irrationality of religion, gods and superstition
  • heredity, evolution and speciation
  • the senses and perception
  • psychology and behavior
  • sleep and dreams

And he does all this in Latin hexameter verse. That’s right, it’s also a beautifully crafted epic poem. Where are the poets today who have a fraction of Lucretius’ scientific understanding? Where are the scientists today who can educate and persuade the masses through artful communication?

Almost by instinct now, with mind alert, I range those pathless groves where no one ever has gone before me and I come to fountains completely undefiled. I drink their waters; delight myself by gathering new flowers, fashioning out of them a kind of garland no muse before this time has ever given to crown a human being. I teach great things. I try to loose men’s spirits from the ties, tight-knotted, which religion binds around them.  - Lucretius

Path to Atomism

Tracing the path of ideas leading to Lucretius’ description of the nature of matter in terms of indivisible atoms, we travel back over 4 centuries to Parmenides (520-450 BCE), a Greek philosopher who described all things as being singularly composed of a fiery aether. Matter could not be created or destroyed and there was no such thing as movement as that would require a void to move into, and void represents nothing and therefore does not exist. This form of reasoning follows from the work of Pythagoras (and the school of Pythagoreans to which most his acknowledgments should be shared), which conceived of numerical or unit-based formulations to describe the nature of things, often (and perhaps proudly) in direct opposition to experience and common sense. Next in line was Empedocles (490-430 BCE) who allowed for 4 basic elements to compose the universe: earth, water, fire and air. He even inferred recipes for various substances, e.g., bone was composed of 2 parts earth, 2 parts water and 4 parts fire. Hinting at forces like gravity, van der Waal, and electromagnetism, Empedocles perceived attractive and repulsive forces between the 4 elements, which he referred to as Love and Strife. Anaxagoras (500-428 BCE) upped the ante, suggesting that every substance has an elemental form and it composed of some small fraction of every type of element. Thus, bone would be primarily composed of bone elements in addition to elements of tree, water, blood, gold, etc. This was his way of addressing how things could be made of other things through cycles of birth, decay, and death without having to instatiate creation. Then came Democritus (460-370 BCE) who settled on an atomic model that supported the unifying, shared and diverse properties of matter. This model survived for 2000 years with little alteration. Despite the fact that Plato and Aristole didn’t much care for it, it was passed on to Epicurus (340-270 BCE) and reinvigorated by Lucretius (99-55 BCE).  Chemistry text books in the late 17th century could be found referencing the shape and surface properties of indivisible atoms that gave rise to the properties of the composed substance (e.g., acidic tasting substances have sharp, unevenly shaped atoms). While these whimsically fabricated details do not hold up today, with a little imagination you can readily map these ancient descriptions of atoms to modern understanding.

Evolution of Evolutionary Thought

Anaximander (610-546 BCE) is the most ancient of the Greeks to have attempted a natural, materialistic explanation of life. He intuited that life started in the seas and that a series of environmental/climate circumstances led to its migration to land and the appearance of man. Details notwithstanding (e.g., he imagined early humans developing in the mouths of fish for protection from the primeval world), his ideas are in stark contrast to the invocation of myth and deities, and hint at a theory of speciation. Empedocles (490-430 BCE) got straight to the heart of the matter, explaining that the diversity of traits among living creatures was due to the fact that all manner of variations are tried and the strange deformed ones simply don’t survive, leaving only the well-suited creatures to propagate. Though, again, its his details that stray far the mark, literally (e.g., stray arms and eyes composing fantastical creatures).  Aristotle (382-322 BCE) summarized Empedocles’ theory: “most of the parts of animals came to be by chance, having been randomly thrown together in the melee of the battle between love and strife. And when those parts were useful, the creatures lucky enough to have them survived, being organized spontaneously in a fitting way, whereas those that grew otherwise perished and continued to perish.” Darwin, in turn, referenced this passage saying: “you see here the principle of natural selection shadowed forth.”  Funny thing is, Aristotle was restating Empedocles’ ideas in order to refute them. Aristotle was in favor of purpose over random chance and mechanism, as was Plato before him. What he lacked in terms of  theoretical biology insight, he made up for in his practical application. Aristotle’s first love was biology. Likely influenced by his father who was a medical doctor, nearly 1/5th of Aristotle’s extant writings describe the physiology and behavior of ~540 species. Some descriptions from his observations and dissections remained relevant and unchanged through the renaissance and even into the time of Darwin!

Formal Logic and Computation

Logic is the science of reasoning and proof, a systematic inquiry into the principles of deduction, and thus fundamental to all subsequent reasonable deductions. Aristotle (382-322 BCE) was the first to formalized logic with his 109 syllogisms, which made use of variables to represent concepts, such as in the example: all ‘A’ are ‘B’ and all ‘B’ are ‘C’, therefore all ‘A’ are ‘C’. This was well before variables were being used to represent numbers in mathematics! Aristotle thus established the first system of logic, defining the scope and rules governing logical statements. This system was considered complete and essentially closed well into the 18th century. Parallel work on logic was contributed by the Stoics and later by Galen (129-199 CE), but being difficult to merge with Aristotle’s system, it was largely ignored until the topic was intellectually revived. Indeed, it took 2,000 years before substantial challenges and additions were made to formal logic by the likes of Gottfried Leibniz, Augustus De Morgan, George Boole, Georg Cantor and Bertrand Russell, laying the modern foundation of mathematics and computation. It is also difficult to think about modern object-oriented programming and ontology design without revisting the ancient debate between modeling the world as derivatives of Forms (Plato, b.428 BCE) or as objects with properties (Aristotle, b.382 BCE).

You Call That Ancient?

Upon whose shoulders did the ancient Greeks stand? Egyptian, of course. Though neglected in most histories of philosophy, one cannot help but wonder what contributions were made by ancient Egyptians to the philosophy of science. If they had art, religion, economics, architecture, and agriculture, then surely they had science. Indeed, you may well consider Imhotep (2650-2600 BCE) to be the “Grandfather of Medicine” as his works were studied by Hippocrates. He is even indirectly mentioned in the Hippocratic Oath, being associated with (if not identical to) the Greek god of medicine, Asclepius. Imhotep was also an architect, astronomer, poet and philosopher. He was apparently an Epicurean 2,300 years before the philosophy officially existed! He promoted contentment and cheerfulness, and may have given original voice to the saying “eat, drink and be merry for tomorrow we die.”

Historians are also often remiss in examining the philosophies of ancient Asian cultures with the strict exception of religious ideas. Hinduism, for example, is the earliest of the major religions, documented in a collection of texts, the most ancient being the Vedas (2000-1200 BCE). By ~650 BCE, however, a materialistic doctrine called Lokayata was coming into bloom, its adherents were the Carvaka. They dismissed the notion of afterlife as ridiculous and relied on their senses for knowledge about the world around them. The Carvaka even attempted the same reductionist description of matter declared by Empedocles (everything being composed of earth, water, fire and air)… only they did it ~200 years earlier!

There are doubtless many other examples of parallel and preceding scientific thoughts and ideas from less documented, less preserved, and less examined cultures. Suffice it to say, the philosophy of science has many roots reaching deep and wide into human history.

Humanity of the Scientist

It is from the crucible of philosophy, flamed and annealed over millennia, that the modern scientist is formed. In fact, Science was still called Natural Philosophy in the 17th century when the modern foundations were being laid. This foundation, comprised of mathematics, experiment and systematic observation, continues to serve as a substantial base, but is itself built upon layers and layers of earlier foundations. Picture a cross-section of a London street through asphalt, cobblestone, brick and Roman quarry stone. Though perhaps discontinuous with our own, this earlier work is extremely relevant. The insights and ideas of ancient “seekers of truth” form a network of pillars and steps that not only raises the level of our modern foundation (i.e., “on the shoulders of Giants”), but also grounds it in our timeless humanity by revealing our most intimate motivations and epiphanies as apparently universal conceits of  human nature. One cannot help but be amazed by just how familiar the debates, doubts and dogmas of the ancients seem today. Being able to tap into these thoughts, crystallized in such distinct settings over human history, is a valuable exercise in perspective and it’s inspirational.

Bibliography

  1. Lucretius. On the Nature of Things. Audio Connoisseur, 2007. Narrated by Charlton Griffin
  2. Jennifer Michael Hecht. Doubt: A History. Harper One, 2004. ISBN: 978-0060097950
  3. Anthony Gottlieb. Dream of Reason. WW Norton & Co, 2002. ISBN: 978-0393323658
  4. Doxiadis A and Papadimitriou CH. Logicomix. Bloomsbury, 2009. ISBN: 978-1-59691-452-0
  5. Molefi Kete Asante. The Egyptian Philosophers: Ancient African Voices from Imhotep to Akhenaten. African American Images, 2000. ISBN: 978-0913543665

End of Year Report: GSoC 2009

2009-summer-of-code-logo-final-r3-01
This was GenMAPP’s third consecutive year participating in GSoC and it was by far the most productive. As an organization we are continuing to learn how to be better mentors and the students continue to step up.

Our projects this summer covered three different projects: Cytoscape, GenMAPP-CS and WikiPathways, each working at the interface of biology and computer science. It takes a special type of student to succeed in this field and we found 9 of them this summer! Here is a brief description of the range of topics we covered:

  • Processing-based graphics renderer
  • Data mining interfaces
  • Animation tool using frame interpolation
  • Pathway model merging and visualization
  • Identifier mapping solutions
  • GPU utilization for graph layout
  • Phylogenetic tree visualization
  • Ontology-based categorization of pathways
  • Pathway exchange formats

In addition to the great code we produced, the GSoC experience is about building open source communities. We have had the great fortune of retaining most of our GSoC students from previous years, many of whom come back as mentors! And this year I have no doubt that we’ve significantly added to our ranks. But don’t take my word for it!

“Overall, this is the most productive summer I ever had. It increased my confidence as a developer and as a person that I can actually pull off a project like this and interact with awesome people like you. I also become a part of a growing community and hope to help it grow further.”

“There aren’t many opportunities for computational biology enthusiasts to make a difference in the field while still in school. GSOC at GenMAPP/Cytoscape was one such gem of an opportunity that illustrated exactly why writing software for biological research is benefited by a background in biology. The experience I have gained here is definitely irreplaceable. The fundamentals of open-source programming and the rhythm associated with regular coding and problem solving helped nourish an intellectual side of me that I will not forget in a hurry. This is definitely not the last time you will see me.”

“This was the first time I took part into either GSoC or an Open Source project, and it was also one of the most exciting things I’ve ever done! This project gave me the chance to spend ~3 months learning lot of new things, having fun, doing something for the community and even getting pay for that! That’s an amazing combination! …I will remain as an active participant of the Cytoscape community beyond the official end of GSoC 2009!”

“It was definitely a very good learning experience for me. …But I have some unfinished works… So I’ll continue this project after GSoC, and be help to drive Cytoscape 3 development.”

“It was the first time that I worked with an open source community and it was really a great experience. I am very thankful to Google and GenMAPP for providing me this great opportunity. I would like to thank [my mentors] for their excellent support during the summer. Looking forward to working with you again.”

“It was a GREAT experience to work with GenMAPP during GSoC 2009!!! The administrators and mentors were very helpful during bounding and coding periods. Whenever I had problems they were always responsive and offered me help in time. Under their guidance, I’ve improved my programming and communication skill, and learned how to work within a group. I would like to express my gratitude to all my mentors in GenMAPP. If I can participate in GSoC next summer, I would like to work with GenMAPP again:)“

Google Summer of Code 2009

2009-summer-of-code-logo-final-r3-01

We made it into Google’s Summer of Code program again this year. This will be our 3rd year participating as the GenMAPP organization, serving as an umbrella org for Cytoscape and WikiPathways projects. Over the past 2 summers, we mentored a total of 12 students in open source code development projects, keeping those wild bioinformaticians off the streets!  The applications will start pouring in next week. We are eager to get a few more hands writing code… paid for by Google.

Here’s a tagged list of the 150 mentor organizations participating this year. And here is the GenMAPP project page.

The Daemon-Haunted World

Daniel Suarez led a discussion about the non-fiction topics in his recent fiction work entitled Daemon.  Yet another terrific event hosted by The daemonLong Now Foundation down at their Fort Mason headquarters. In this cyber-thriller, Suarez explores a new kind of “perfect crime” where the perpetrator acts through daemons, or software bots, that are activated by real world events parsed via online news reports, the first trigger being the news of the perpetrators own death. The bots go on to wreak all kinds of technically intricate and interesting, yet deadly damage. The identity of the perp is found out early on, but he is already dead. There is no “one” to stop and the daemons are lose on the net, managing their own survival. Daemon intertwines network architecture, MMORPGs, sonic weaponry, wireless hacking, ID theft, and globalized and networked services within the plot of a game developer-gone-mad [need new term analogous to "going postal" here...] in a way that is realistic enough to make both an engaging action story (movie options have been sold) and cautionary tale for our networked world today.

You may recognize the title of this post as a play on the Sagan title, The Demon-Haunted World: Science at a Candle in the Dark. Instead of the “demons” of pseudo-science, blind faith and ignorance, here we are talking about the literal daemons that expose critical vulnerabilities in our networks and thus in our everyday financial, institutional, and personal lives. So, what is the “candle” that will lead us our of this darkness? We discussed the development of information systems modeled after natural systems and thus more dynamic, robust, etc. Picture modules of code recombining like fragments of chromosomes during mating and then undergoing selection by the parameters of the given environment. This, by the way, will be the topic of the sequel which Suarez plans to have out in the Fall ’09. We also discussed building a strong, gated “darknet” for vital institutions that is distinct from the current net which can stay open and continue to serve as a crucible for innovation. Obviously, no one in the room had “the” answer, but as the author noted, at least we were having the discussion…

P.S. This very blog post is being propagated across the internet by daemons responding to trackback calls…. mwahaha.

Changing the Game

John Wilbanks

SEED magazine featured John Wilbanks as a “Game Changer” in its Revolutionary Minds series for his work with Science Commons. I met with John earlier this year to get advice on how to license the content at WikiPathways.  His perspective completely transformed how I viewed the topic and directly led to our current Creative Commons license and terms of use language. Be sure to keep an eye on Science Commons. They will certainly be taking a lead on navigating and defining the publication and sharing of science online. At least I hope they will be…