Week 1 (26th May, 2014 - 30th May, 2014)
As this was the first week, the most important thing for me was to work on the process of employment, filling out paperwork. When the employment process had been set up, I started to learn about the project. As this project was an ongoing research during the school year, I had to learn what had been achieved in this project. I went through the posters that other students had made to present and their reports.
Even though I have learnt Computer Science and Biology, I have never done anything which applies both of them together in a project. Thus, the subject was pretty new for me and the visualization softwares and analysis methods used in this project were novel.
Week 2 (2nd June, 2014 - 6th June, 2014)
This week, I became familiar with the subject, the school campus and people. After getting the basis of the project, I could get my hands onto it, and tried to run some codes that had already written. I tried to learn the tools and VMD, the software used for structural visualization. I also helped out in structural analysis by writing programs. The first program would produce files, each of which included a model of the protein whose atoms were ordered according to a given reference file so that it made easier to compare between the structures of the models.
Another program I wrote was about changing the atoms' names in the files. There were two software used for structural analysis in this project, Amber and Rosetta, and these two software had different naming conventions. My code would get two input files, a file to change and a reference file. If Amber-compatible file was needed to be changed to Rosetta-compatible, that file and another Rosetta file would use as input, and the program would produce a file that could be run in Rosetta, and vice versa.
You can find the code from these two programs here.
Week 3 (9th June, 2014 - 13th June, 2014)
I debugged the codes from the second week, and ran on some files so that most of the files would be ready to run on Rosetta and Amber at any time. As preparations, to create a Voronoi graph that would help visualize the relations between different structures of models of Met-enkephalin, I was assigned to read three papers that are about stochastic roadmap simulation, transition networks in macromolecules and transition networks in proteins. I was also assigned to run some testing on the voro++ library and pele.
The weekend was relaxing with a visit to Washington, DC, visiting the attractions and meeting with my brother.
Week 4 (16th June, 2014 - 20th June, 2014)
I kept working on pele and voro++ during this week. I was trying to install pele and voro++ in Windows. At first, I was not successful installing the libraries on my computer. So, I tried to install Linux on my computer, which also failed and later I figured out that was because of USB failure. I bought a new USB, and installed both pele and voro++. I started to test some example programs and learned how those exactly worked.
This week was a bit frustrating because so much time was spent installing. I felt upset because each python package took some minutes in Windows, but it was done with only a few command lines for all packages in Linux.
Week 5 (23rd June, 2014 - 27th June, 2014)
After successfully running some test programs in pele, I tried to run our data with pele. In pele, the energy and coordinates were saved in the database. It would calculate the minima and transition states, and produce a graph connecting the minima energies. However, the minima energies were already calculated for our data but it was with a different formula. Therefore, when the pele was run, the minima calculated were quite different from our data, and we decided not to use pele graph anymore.
Week 6 (30th June, 2014 - 4th July, 2014)
With the help of a graduate student, I learnt how to use the voro++ library. The voro++ library was run on 6000 models got from minimization method with 2,000,000 steps. In the resulted Voronoi graph, each Voronoi cell represented a structure of met-enk model, with the PCA results as the coordinates. Each Voronoi cell had topological neighboring cells, and these were divided by an edge. There were a lot of choices to output about the information of the Voronoi cell. Among these information, we were interested to know the neighbors and the number of faces of each cell. We could find the shortest path or random paths between any two interested points, or transition matrix including the probability of transition between every pair of structures.
From diverse directions, we chose to find a path with points going to neighbors with minimum distance. The chosen neighbour would connect to another neighbour with minimum distance. The path ended when the chosen neighbour goes back to the previous point. The distance between neighbors were calculated with the Euclidean distance. The path was run for 6000 times starting at different points.
Another implementation was to find paths with points choosing a random neighbour. The path ended when all 6000 points were visited. The frequency of each point visited in each path, and also the path length were recorded. This program was also run for 6000 times with different starting points.
I spent another relaxing weekend with a trip to Boston.
Week 7 (7th July, 2014 - 11th July, 2014)
6000 runs for finding paths with nearest neighbors were already finished. However, the paths were found out to be very short that most included less than 10 points. We decided to focus more on random paths.
Finding a random path took some minutes to finish as the program tried to visit all 6000 points. Meanwhile, as the next approach, we decided to use Metropolis Monte Carlo algorithm to select points. Instead of randomly selecting the points, the points with the largest probability based on Boltzmann distribution would be selected.
Week 8 (14th July, 2014 - 18th July, 2014)
The random path program had to be run for 6000 times and it took a while to finish each path. I was concerned if it could finish before the end of my internship. I talked to my mentor, and she created an account in one of the servers at school. I ran multiple programs at different places. At the same time, I started to implement the Monte Carlo approach.
In Monte Carlo approach, there would be a starting point, and a neighboring model would be chosen at random. Then, there was a list of energy differences from the energy of that model to that of the lowest energy found when doing the minimization. The probability was calculated as the exponential of the difference, which was from the energy difference of the neighboring to that of the current point, divided by a constant, currently assumed as 0.596, roughly calculated from the temperature, and had the same unit as the energy. After that, another random number between 0 and 1 would be selected, and compared to the probability. If the random number was less than or equal to the probability, the current point would move on the next neighbour, and the process was repeated. If the random number was larger, it had to choose another neighbour. The path ended when it had tried all neighbors for 10 times, and failed to move.
Week 9 (21st July, 2014 - 25th July, 2014)
With the help of my mentors, the random path finding program was finished running for 6000 times and also the Metropolis Monte Carlo approach. For analysis of the paths, the times each point was visited, the path lengths, and the information of the Voronoi cells was graphed, and analyzed. There were not much relationship between the data, except that there was a strong correlation that if a Voronoi cell had more neighbors, it would be selected more times.
Also, it was learnt that the previous Voronoi graph was made with rectangular boundary conditions, in which the boundary Voronoi cells were stretched out to the container walls. Therefore, a C++ program was written to construct another Voronoi graph with irregular boundary conditions in which the container walls were not considered, and the walls of the boundary cells were not stretched out.
Week 10 (28th July, 2014 - 1st August, 2014)
After building the Voronoi graph with irregular boundary conditions, the same programs were run for the resulted graph, and the same analysis was done. The results were compared with previous ones, and it turned out that most results were pretty similar. As the data did not show any significant results, it was decided to run the program more times on other met-enk models later.