Tracing Matryoshka References for smarter collaboration

Serena Marletta

Have you ever felt like Indiana Jones in the Raiders of the Lost… Reference? We’ll see together how to put an end to the intricate issue of tracing references, and what measures can be taken to work better and save ourselves the headache.

Memories of childhood

One of the sweetest memories of my childhood is linked to my great grandmother’s tiramisù. Unfortunately, she never wrote her original recipe but passed it down through hands-on cooking sessions. For me, my grandmother (her daughter) was the depositary of “The Recipe” and, judging by her size, even the most reliable one.

Mum and I were very determined to reproduce the original taste, but, despite faithfully following grandma’s instructions, our attempts were no match for that masterpiece.

Every time I met grandma, I asked her to clarify one of the many details I thought were responsible for my culinary failures: “Should the coffee be sweetened or not? How much mascarpone to use to make the cream? What’s the brand of ladyfingers to buy?”

She was sure the coffee needed not to be sweetened but, with regards to the mascarpone affair, she told me to ask to aunt Maria who wasn’t sure and told me to go to uncle Giorgio for confirmation.  And so, I was quickly sucked into a complicated word of mouth and eventually gave up.

Matryoshka References

I’m 34 years old now and most of the recipes I read are the “Material and Methods” sections of scientific papers. Every now and then, while reading them, I live the same sense of helplessness I experienced when I was not able to get a tiramisù as good as my great grandmother. It happens when I run into what I call Matryoshka references: a reference citing another reference that refers to other references, and so on so forth until it gets almost impossible to get the procedures or the materials used in that studies.

As a protein enthusiast, several times I came across the need to adopt purification methods already described in the literature. After several days of extensive reading, I always felt relieved when I found a paper already heading down the route I was going to take. But I have sadly found, more than once, that going back to the original conditions in which a study was conducted is hard (if not impossible).

I stumbled in a string of nested papers inextricably entangled by the clue-sentence “as described in [43]”.  And the answer to myself was always the same: “Where?”. More than once, matryoshka references forced me to start a paper hunt, most of the time without a happy ending, especially if the chased article dated back to Pre-Open Access Era. The point is that this vicious circle of citations to some extent hampers one of the fundamental aspects of the scientific method: the reproducibility of the data.

Our solution – transparency and Standards

At DOULIX, we tried to mitigate the matryoshka references phenomenon by leaving a unique, permanent and clear trail for those who- scientifically speaking- come after us.

To reach this goal, we act on two fronts, (i) the generation of a FAIR-compliant database of DNA parts and (ii) the development of publicly available Standard Operating Procedures.

Each DNA part you find on DOULIX is identified by a unique persistent identifier, called Part ID, making it findable and accessible over time from everyone. So, let’s say you have been developing a new shuttle vector you want to share with the synbio community and you are writing to get your results published. Well, you may be glad to know that by designing your plasmid with DOULIX, you can use the Part ID automatically assigned to your DNA part to easily cite it in your Material and Methods. Furthermore, with DOULIX advanced sharing options you can share your DNA parts with your teammates or make them publicly available for everybody to enjoy.

Moreover, within the framework of MIAMI Project, funded by the EU Horizon 2020 research and innovation programme, we are in charge of drafting organism-specific standard operating procedures (aka SOPs) that can be shared among operators to ensure that collected data can be exchanged and compared among different laboratories using different types of equipment at different times. If you’re curious about how a standardised SOP may look like, have a look here.

Once overlooked, protocol standardisation and data recording are now hot topics in synthetic biology. There is a common feeling that how we generate and collect data are just as important as the results we publish. By adopting standardised procedures and reference materials we can promote data exchange and boost reproducibility to get a tiramisù everybody can taste.