After a year of working at SARA (http://www.sara.nl) on the GRID, it was time to learn every thing about the GRID and its Middleware. The last few months I was also introduced to Cloud Computing, the second subject of this GridKa Summer School. It would be interesting to find out how both subjects "work together".
The registration was between 12:00 and 14:00 in the afternoon, so I decided to make the trip from Gouda to Karlsruhe by car. Leaving home at 06:45, I picked up a SARA-college at Utrecht train station round 07:30 and without almost no delays we arrived round 12:45 at the Hotel. A bus arrived at the Hotel round 13:30 and it took a large group of GridKa attendees (including me) and the Key-Note speaker to the KIT Campus Süd. Just in time for the registration we arrived, registered and at 14:00 the introduction kicked-off. The GridKa Summer School 2011 had started.
It is always a good idea to start with definitions before you start learning about them subjects. Jeff introduced GRIDs, Super Computing and Clouds using an analogy new to me: Making coffee with coffee machines. All tough the analogy worked fine and made every body in the audience laugh, a real problem became clear: There is no single definition of Cloud. By using a quote of I. Foster, Jeff made clear that whatever the final definition for CLoud will be, the GRID has a connection to it: "Grid needs Cloud to prosper; Cloud needs Grid to scale". In the lecture there was a mention and a picture of the SARA Cloud. http://gridka-school.scc.kit.edu/2011/downloads/Grids_and_Clouds_050911_jeff_templon-sanitized.pdf
The current "Cloud Hype" is powered by virtualization, Owen explains the concept. The history of virtualization goes back to the early days of modern computing and is "nothing new". Techniques like "chroot", "zones" and "jails" where always available (and used). The support of X86 for hardware virtualization, made is cheaply available, hence the current rise of virtualization. Owen gives an overview of the different type of virtualization and shows the pro's and cons. And then there is data and it preservation. Again a new analogy for me is introduced: Preserving Pizza. http://gridka-school.scc.kit.edu/2011/downloads/Virtualisation_050911_Owen_Synge.pdf
Paul explains how storage in GRID works and where it is needed. The functionality of a Storage Element (SE) in the GRID was explained. Data Storage tailored for GRID, it for example supports VO's, is provided by dCache. It is specially suited for streaming data as used by CERN and the Tier 1 storage sites. Besides disks it can use SSD when files are needed fast and often or tape for long term storage. dCache takes care of the staging.
After there lectures the first day finished with the social event located at KIT Campus Nord. Besides Flammenkuchen, Pretzels, Beer and a band there was the possibility to see the server rooms of KIT where the clusters and Tier 1 storage is located. It's always nice to see an other one's server room and recognize hardware or discover new/other solutions. The trip back to the Hotel was by tram, quite a trip but we cloud admire Karlsruhe by night so it was worth it.
After a fine and plentiful breakfast the GridKA bus arrived at 08:30. It transported us to the KIT Campus Süd where the lectures started.
In this non technical lecture Christoph describes the structures of the european organizations involved in the GRID like EGEE, EGI and EMI. On European level there was already a budget in 1984 with the main focus on e-infrastructures (Research network GEANT), that was and still is fundamental for all other e-science projects. The current state of the European Middleware Initiative is made clear: The EMI software release consolidates other once separate projects like Globus Initiative Europe. There is no mention of UMD (Universal Middleware Distribution) of EGI. In reaction to my question about that, Christoph explains UMD is just EMI with the approval of EGI. http://gridka-school.scc.kit.edu/2011/downloads/EU_Projects_060911_Christoph_Witzig.pdf
Unlike the previous lectures, this is done by a vendor. Just by watching this presentation it is impossible to judge if any of the claims made are just vendor-speak or true facts. Hands-on experience is needed. The one thing I found interesting: Hardware Parity on read. The DDN storage does error correction on read because of the real probability of a false read of a SATA disk, which are known for their unreliability. http://gridka-school.scc.kit.edu/2011/downloads/Storage_Architectures_060911_Toine_Beckers.pdf
How to get user support and how this is arranged within EGI is explained by Sabine. The real users are in Virtual Research Groups (VCR) an extra layer on top of the Virtual Organization (VO) layer that has members in the User Community board which talks to EGI.eu who will communicate with the NGI's. I think organizing this kind of support this way will not lead to advantages but time will tell. As a last the EGI support ticket system is shown. An NGI can build its own, using the xGUS template -> http://xgus.ggus.eu http://gridka-school.scc.kit.edu/2011/downloads/Grid_User_Support_061111_Sabine_Reisser-v2.pdf
Nabil has a Dream: One interface for all king of computations. To accomplish this, bridges have to be made between Grid, Cloud and Volunteer Computing. Volunteer computing also called Public Computing has some challenges as it is anonymous, which has extra security risks, volatile and heterogeneous. Cloud has even more challenges especially because there is no definition or no common agreement on what cloud computing is. What we know according to Nabil is that it is a combination of pre-existing technologies. XtremWeb-CH is a working bridge to volunteer computing (ARC volunteer) and by making cloud part of volunteer computing no special cloud bridge is needed. Technologies to get jobs directly to the cloud will be available.
For lunch the attendees of GridKa school where issued coupons for the KIT Mensa. It turned out that we could have the most luxurious version of the six possible lunch offers. The choices where plenty and I had a great lunch while talking to fellow GridKa attendees.
After an introduction to cloud computing and the basics of OpenNebula a well prepared workshop was given by Tobias and Victor. All attendees to this workshop had access to the workshop environment and every body was happily playing with the environment. I was pleased to learn we where using Open Nebula 3 (beta) which is the version currently being implemented at SARA. The real hands-on help me to understand the workings and philosophy of Open Nebula. As a bonus I could (on request) "play" with sunstone the web front-end to OpenNebula3. Using povray and two virtual machines the workshop instructed how to create a small movie build distributed on the two virtual machine. I was very nice to see the end result display on my laptop. http://gridka-school.scc.kit.edu/2011/downloads/OpenNebula_Cloud_Computing_Tutorial_20110906_Tobias_Kurze.pdf
The evening lecture was almost immediately after the workshop and there was little time to eat. Fortunately there were drinks (sparkeling wine included) and Pretzels (with butter included) aaranged by the GirdKa School organizers. I noticed the place where drinks and food was served was slowly filling up with a lot a people, more than normal. Was the next lecture more special than the others up till now? It turned out to be.
In a packed lecture room (people where sitting on the stairs), Michael started of with telling us that this was the first time (world wide) a lecture would be given about the new C++0x standard. Michael was not only very excited about the new standard but also had an extreme inside knowledge of both C++ and hardware. And tough I understood only part of what he was telling (i'm not a programmer) his enthusiasm and extensive knowledge made the lecture a pleasure to watch. No new standard without a history and a commission to get to the new standard, Michael explained it all. And according to Michael the name C had more to do with the name of one of the writer of a C (and B) predecessor CPL a Christopher Strachey (C as in Christopher). Using just few example statements of the new standard a subject like programming multithreaded applications was addressed in an extensive manner.
After this the GridKa bus brought us back to our hotels and together with my SARA college I drank some Paulaner Weißbier to flush some of the overwhelming impressions of this day away. The next morning it was "business as usual" breakfast and a bus ride to the 3rd day of GridKa Summer School.
After explaining what Ulrich thinks is the cloud, he hits a fundamental point in resource provider challenges: How can the (GRID)users benefit from cloud computing. Virtual Organizations (VOs) knows the user. They for example invented pilot jobs, a way around some GRID limitations. A lot of GRID services can run on Virtual Machines. After testing these virtual services a 20% I/O penalty as found. An other issue is scalability when each Virtual Machine is assigned to single core, so when adding services more hardware is needed. Is this cloud? Cloud is not Virtualization is what Ulrich states. http://gridka-school.scc.kit.edu/2011/downloads/CloudComputing_070911_Ulrich_Schwickerath.pdf
The scale of Data analyses has gone up intensively the last years. Creating, running and processing output of GRID jobs has become more and more complex. As a software developer Sergio describes a solution based on a python API to the ARC middleware. With his argument that the current Bash/Perl solutions are to complex and unstructured Sergio has a point, but with stating that learning Python is more easy than using the current solutions he has not. And according to my opinion the Python program using the interface to API Sergio presents, is in fact a WMS. http://gridka-school.scc.kit.edu/2011/downloads/ARC-for-developers_071111_Sergio_Maffioletti-v2.pdf
Being a GRID administrator for about a year now I was pleased to learn that Sara was one of the maintainers of the Yaim configurations for the gLite CREAM CE. During the workshop witch involved building a CREAM CE and a WorkerNode "from scratch", I as able to ask (and get them answered) questions I had for some time now. The GridKa IT environment provided each workshop attendee with two SL5 virtual machines to Play" with. The clear instructions converted the machines to a gLite CREAM CE and WorkerNode. The EMI gLite Middleware included the latest version of the Torque PBS, new to me and using Mungu for host authentication. When that was configured, directly submitted jobs (qsub) where running. The real challenge came when we tried to submit jobs using vom-proxy-init and glite-ce-job-subbit, it didn't work and the was no time left to investigate and fix this. http://www.pd.infn.it/~bertocco/GridKASchool2011/
The GridKa bus did its job perfectly and round 19:00 my SARA college and I went down-town Karlsruhe to eat Schnitzel („Fliegender Holländer“) with a Weißbier at restaurant and Kneipe Marktlücke. It was a nice walk back to the Hotel, but because is was still early, I put un my running-gear and ran 6.12 kilometers next to a big road with sufficient lighting through Karlsruhe: http://connect.garmin.com/activity/112655342 . The next morning it was "business as usual" breakfast and a bus ride to the 4th day of GridKa Summer School.
In his evening lecture of Tuesday, Michael did only cover a part of what he wanted to tell. On requests of the GridKa attendees to tell more, this lecture was an extra. The focus was on concurrency, not easy to do when every layer of the system (compiler, cache, memory) tries to reorder things. Even more difficult to be portable for C++0x standard when different processor architectures have different memory models. To me the subject again was difficult to comprehend but when Lesie Lamport was mentioned in relation to sequential consistency of memory and programming a lot of the talk became clear to me, that was something I had read/learned about not so long ago.
Bernhard kicked of with some information about the DSGI project. This project tries to create interoperation of multiple technologies, based on software created by Platform Computing. After this more vendor specific (but using standards) talk, Thijs addressed something far more interesting: The need for (more) Standards. The message: If technology thrills you and you like working in groups, join a group and create cool standards & technology. The difficult part of the message is of course the "like working in groups". http://gridka-school.scc.kit.edu/2011/downloads/DGSI_DCI_Federation_Protocol_GridKa_v2.pdf
The development themselves where giving this workshop dCache (a storage solution for really big setups). Each of them lectured during the workshop about the different components of dCache. The practical part started with a SL6 virtual machine prepared by the developers and downloaded to my laptop a few days before the start of the GridKa School. The excellent instructions allowed me to build a fully working dCache installation in that Virtual Machine on my laptop. One of the protocols that dCache supports is (p)NFS (v4 and v3), when I mounted the NFS share, the "df" command showed a whooping 10PB. A lot more than the 64GB disk that is in my laptop . It's just a (tunable) setting in dCache, a developer told me. During the installation and configuration some cryptic (to me) error messages were displayed when I made some mistakes. The nice thing about being in one room with the developers while those messages scrolled by, one could immediately show why there are cryptic to a novice user of dCache.
The GridKa Summer School evening dinner was in the Hotel were I was staying, so the GridKa bus to the hotel had more passengers than usual. The dinner buffet was delicious and plenty, as a desert there was scoop it yourself ice-cream (among other choices) Mmmmmm. Being seated at a big table I talked to various other GridKa school attendees while drinking German Wine. The next morning after breakfast my college and I checked out of the hotel and drove to GridKa School by car. I wanted to be home round 18:00 to participate in a running event that passes my house 3 times, so we skipped the last lecture and de Afternoon security workshop.
Isn't Confidentiality, Integrity, Availability, Authenticity and Transparency is what scientific users want when using GRID and Cloud asks Andres. But can cloud provide that? not according to the Amazon license one accepts when using Amazon. The license states "the client has full responsibility". To take that responsibility IBM, for example, has Virtual Machines images available.
I very interesting lecture. Günter describes the background of the European LHC GRID (and thus all of the European GRID). His slide on the way the experiments generate huge amounts of data, need huge amount of compute for simulations and the comparison of the experiment and simulation data being the real challenge. This explained a lot to me why the GRID is as it is. And did it work as expected?. "What thought was easy, was easy, what we thought was difficult, was difficult" Günter states. With a lot of graphs to prove it, the GRID proves to be a success and it is working. As a SARA employe I was scanning the graphs and tables presented for mentions of SARA (SARA-MATRIX). I was very pleased to see SARA-MATRIX show up very reliable (it looked like the most relaiable) in the success rate for Atlas jobs (slide 27). http://gridka-school.scc.kit.edu/2011/downloads/LHC_Computing_Grid_today_20110909_Guenter_Quast.pdf
After a cup of coffee during the coffee break, we thanked Dr. Christopher Jung for a fantastic week and jump in the car to drive home. With almost no traffic jams during the trip I arrived at home in time to participate in the running event. Looking back this was a very interesting event, I learned a lot about new (for me) technologies but I also learned that I already knew a lot just by working with the GRID and getting a lot of support of my SARA colleges.
Please note any opinions in this report are my personal opinions and not those of any other person or company.
Alain van Hoof, September 2011