Data analysis is not as hard as it looks
Bastienne Wentzel


Omics data sets are an increasingly important part of the research of wet-lab cell biologists and immunologists. But they are usually not equipped with enough bioinformatics or coding skills to be able to analyse these data sets. A first for PhD students in the ICI program, this spring they had the opportunity to participate in the online training "Omic data analysis and visualisation using R" which focuses precisely on these skills.

The course is developed for wet-lab scientists with no prior experience in coding. It is part of a wider bioinformatics training programme, developed and delivered by bioinformatics specialist John J. Cole, manager of the Glasgow Bioinformatic Core at the University of Glasgow. He explains: "Anyone can do omics experiments nowadays. You just buy a kit and hire a company to produce the data for you. But the analysis of these data can take months. Labs can't afford to pay anyone to do that. Also, the wet-lab scientist knows best what to look for in their data. But without the skills to do the data analysis themselves, these labs get stuck."
The training focuses on R, a computer language and software package for statistical computing and graphics such as omics data analysis. The course starts with an introduction into R. Using a provided data set, the students learn step by step to write a script to open and analyse the data set. The next step is to produce various kinds of analyses and plots such as a PCA analysis and an MA plot. In the end, the students have their own scripts and the skills to analyse any data set.

No fear
The aim of the course is first of all to take away the fear of data and programming, explains Cole. "We offer a structured program where we don't just throw bits at people but really explain how the different analyses and plots work."
This obviously worked for ICI principal investigator Sander van Kasteren who made the course available to ICI researchers: "I have no talent for programming but this training was exactly what I hoped it would be. The basics are not as scary as I thought. I can now judge the quality of analyses in my own work and others and avoid pitfalls." One of the pitfalls is to keep looking until you find something, says Van Kasteren. "John explained that your data might not offer what you want. It is important to stick with the research question and analysis that you identified as the best up front and then stop."
Eva George Matlalcuatzi was one of the ICI PhD students who participated. She said: "I wanted to do more in-depth statistical analysis of my proteomics data using more common software like R. Also it will help me to increase my capabilities on coding. I liked that it considered everything from simple coding to organize data until large and more complicated data management and different plots."
Anyone can sign up for this training, which gets very high reviews from participants worldwide, over 1200 so far. Van Kasteren hopes that in addition to PhD students more principal investigators will invest the time in this training. "The invested time was definitely not wasted. I use the knowledge on datasets for my own papers and when I'm reviewing others. And John is a great teacher who knew exactly what we did not know."