You wouldn't believe it, but I'm actually done with my thesis. It's all signed and submitted, and it should be only a matter of weeks until I can call myself an MSc (either that or its Austrian equivalent, "Diplom-Ingenieur").
As you might or might not have guessed, I used my thesis to work on worthwhile Drupal stuff, namely Transformations. Now that the written part is also finished up, I'll return to Drupal hacking for two months again before leaving for my new job. Transformations will still see significant improvements (more on that at a later time), and will be further supported by my company, Pro.Karriere - we just hired a new short-time student to revamp the UI while I'll be removing complexity from the API itself.
If you're into Drupal import/export, you might want to take a short look at it. Here's the links to the thesis, the summarizing poster, and further resources (print version of the thesis, LaTeX sources). It's licensed under the Creative Commons Attribution-No Derivative Works 3.0 Austria License.
So that's for the "prototype" part. Now, let's make Transformations a competitive framework and push it from "prototype" to a more "product" state. Plus of course all the other interesting tasks that are on my plate for the upcoming two months. Welcome back, life! Glad to have you back, Drupal community! Bite my shiny metal ass, TU Wien! (And yes, studying there has been a great time.)
As of half a day ago, Drupal's accepted student projects for the Google Summer of Code 2009 have been announced (and of course, also for the other participating organisations). We selected 18 wicked students with intriguing project proposals, and had a hard time sorting out which ones to take because the number of promising proposals was pretty high this time for Drupal. We still should be able to provide motivated students with mentoring even if they didn't make it to the final list, assuming Google's prize money was not the primary motivation to apply for SoC.
In addition to the official Drupal list, I also noticed a Drupal-related project hosted by Creative Commons. Also, former Drupal SoC alumni Allister Beharry (who worked on DAST in 2007) is now showing up with a project for upstream PHP. Other favorites of mine include the entirety of KDE and X.org projects, as well as my buddy klausi getting accepted for a Drupal/Rules project.
The coolest thing for me personally, though, is the selection of not one but an thrilling number of two awesome projects related to Version Control API: Marco Antonio Villegas Vega (a.k.a. marvil07) will work on internal API improvements and other stuff like SSH key authentication, while Daniel Hackney (a.k.a. chrono325) takes care of better VCS support and Rules integration. Both want to help with the transition of Drupal core to a distributed version control system such as Git, and will be mentored by Tony Narlock (skiquel), Sam Boyer (sdboyer) and of course me. With a little luck, we should be able to avoid the epic failure of last year's project and push Version Control API to the next level.
And of course, pushing it onto drupal.org is still on the plate and will be the first thing for me to work on after I finally submit my thesis in mid May.
Apart from Version Control API, I've also been working on another piece of code, and after a long time of fuzzing around with architecture and internals it's now slowly getting to a state where one can get excited about it. I didn't make it in time to show it off at DrupalCon DC, but now the module can be found on drupal.org: Say hello to Transformations, your favorite new uncut diamond for performing data transformations in Drupal (including, but not limited to import/export).
Experience has shown that starting with the underlying concepts always triggers the question "So what does it actually do?", so let's start with the use cases that lead to the module's inception:
Now, I hear someone arguing like, "Scope creep! When you pack all that functionality into a single module, it will get bloated, hard to use and unmaintainable." And to a certain extent, I agree with that. As far as I can see, a major reason that Import/Export API failed is that it tried to establish a single common data format that described all possible information, and moreover, every data backend (Drupal nodes, CSV, XML) was required to cope with all that information. In other words, the architecture of Import/Export API might have required too much of a monolithic approach, which lead to the maintainability problems that were at least part of the reason why the project stalled.
Does that mean that generic import/export systems are unfeasible? I don't believe so, and I hope Transformations' architecture takes enough precautions to avoid the fate of Import/Export API.
The key idea to Transformations is that we'll never be able to capture all possible data formats and structures in a single module. What we can do, though, is to decompose that data into bite-size pieces, and define operations that can process those. (For example, operations transforming a Unix timestamp into an ISO date string or into a PHP DateTime object are pretty straightforward.) Building on that, we can assemble the pieces into larger pieces of data, and we can also define operations on those. (Extract the first three dates from a list of dates? Sure, easy enough.) In essence, importing and exporting data is nothing more than decomposing data, processing it according to a given set of rules, and reassembling it in a different form.
The key idea to Transformations is that we don't want to know all those rules upfront, there's just too many ways in which data can be decomposed, processed and reassembled. So instead of trying to cover all use cases, let's provide users with the necessary tools to define their transformations by themselves. Let's not define a single common data format that all data needs to conform to - all we need is a set of data formats, plus the knowledge which operations work on which data, plus a way to wire them up. If you think that wiring up operations sounds like Yahoo! Pipes, you're on the right track - only that Transformations can actually deal with structure information (schemas), and that my current user interface is way more crude than Yahoo's nice JavaScript wizardry.
The key idea to Transformations is that if we want a generic import/export module, it's important not to provide a solution but to provide a framework. Transformations is a framework for creating data transformation pipelines. ETL for Drupal, you might say - a braindead attempt to eliminate a whole class of import/export modules, and a potential basis for a Yahoo! Pipes clone on Drupal. Transformations provides the means for developers and users to build their own stuff.
Oh, and Transformations is incomplete, unpolished and (beware!) object-oriented. Until now, I worked to get the concepts right. Next step: building on the foundations, and extending it to enable more concrete use cases being built upon it. For now, it can be used for CSV import/export in a way that is more complicated than Node Import will ever be. But the flexibility of the framework also promises possibilities that go way beyond nodes and two-dimensional tables.
Still reading? Still interested? Then go and check it out.
[1] "We" as in "my company, Pro.Karriere", or more specifically, my boss and visionary entrepreneur, Klaus.
Since the Summer of Code of Version Control API's inception ended, there had always been one piece of cvs.module functionality that hadn't yet been ported to the new VCS-independent infrastructure. Incidentally, this was also one of the most visible and important features on drupal.org: packaging releases from CVS tags and branches by means of a simple two-step form.
Well, not anymore! Yesterday, I released Version Control API 6.x-1.0-rc1, which sports a number of important improvements under the hood, and versioncontrol_project 6.x-1.0-rc1, which makes use of these new features to provide release node integration and a release tarball packaging script, like the one found on drupal.org. The main difference is that the code is not CVS specific, and even though only the CVS backend supports the required functionality at the moment there is nothing holding up an implementation for SVN or the newly-revamped Git backend.
The latter has pretty much singe-handedly been ported by new contributor CorniI (a.k.a. Cornelius R.) to Drupal 6 and to API changes in Version Control API for 6.x-1.x. He's not yet done making sure that everything works as intended, but most importantly the Git backend runs again and can be played with. Good times coming for DVCS support!
So with Version Control API & friends being "feature-complete", the next step is to push it onto drupal.org and replace cvs.module. For that to happen, the primary task is to complete the migration scripts that pull data from cvs.module to Version Control API, If you want to help out with that, you can get in contact with me, dww and hunmonk - unless new cvs.module functionality is added in the meantime, the data migration is the last missing piece for getting deployed on drupal.org.
In general, the roadmap for Version Control API looks like this:
Hope to see you contributing soon!
Drupal related stuff, short and to the point:
Version Control API is back! There's no stable 5.x-2.0 release yet, and the Drupal 6 version will only be tackled after that happens (i.e. next year), but we're alive and kicking again. And yes, I'm still determined to get it to replace cvs.module on drupal.org. Even if it might take another year until all the odds and ends come together.
Anyways, now is the time for potential contributors to jump in and create recipes on how to set up version control integrated sites, to port the Git and Mercurial backends, create a new one for Bazaar, to make the repository viewer work on previous revisions or to improve the CVS backend so that it's browsable like the SVN backend's development version. If you're ambitious, you might also try porting drupal.org's release scripts to the Version Control API so that automated releases can be made with Subversion and other version control systems.
Whatever the outcome, the era of cumbersome reinventions is over for the time being. Time to get started on features (and the D6 port, soonish) - let's push cvs.module into irrelevance and take version control integration with Drupal to the next level!
If you're passionate about version control systems, here's where you can make a difference!
For a moment, I grew tired of saving the world with complex modules like Version Control API. (Which, btw, after a long time, is slowly nearing the completion of the 5.x-2.x branch. But more on that when it's done.) I also can't seem to stick to my "stealth mode" policy of avoiding the Drupal community for even a short time, so I'm like, well... whatever!
Of course, you're asking yourself how I shall ever become a Drupal rockstar if Version Control API 5.x-2.x is not yet done and my super secret diploma thesis project (yeah, that secret!) is still in the beginnings. And of course, the answer to that legitimate question is to pull an Eaton. Like, putting together a few trivial lines of code to produce a module that everyone wants to use, or stuff. So that's what I did! Ok ok, I'm getting to the point already...
"Do you love RSS and comment threads? Do you hate checking a page dozens of times just to see whether new comments have appeared there, or whether there are comments at all? Have you ever envied those Wordpress users with their blogs showing the current number of comments as a dynamically-generated image? You haven't? Dude, you don't know what you're missing out on."
Anyways, there's a nice blog using a (still) nice theme, so on a quick look everything would indicate that the blog is running on Drupal. It's not, however, because no Drupal site has those "Comments: xyz" images appended to their feeds.
Let's change that. For the sake of our feed-gorging users, every Drupal blog needs this functionality. At least that's what I thought, so I proudly introduce a module bringing no innovation at all - a functionality-wise clone of original Wordpress goodness, but for Drupal! Say hello to Comment Count Image, your favorite new syndication-related module. And here's the screenshot, because you asked for it. (You did, didn't you? ...right!)
So Google Chrome is out. Features and developer implications aside, it's another major step for Google in order to push the operating system and its desktop applications into irrelevance, and replace them by web applications. Because the web is where Google has its business. As a nice side effect, we get an open source browser, a fancy new JavaScript engine and a push for wider usage of web standards.
Now I'm a member of that hardliner fraction that emphasizes the "Free" aspect of Free Software (or Open Source, whatever), because it empowers users to choose which tools I can use to operate on any given set of data - as long as that data is available at all and follows open standards. I'm delighted that, nowadays, I can run my system on an open kernel with open drivers, get 3D accelleration from an open X Window System, and have it all fall into place with the wonderful KDE 4.1 desktop (shameless plug). It's all software that I can trust, because the Open Source development model guarantees that the code won't be stripped of crucial features or spiced up with indiscrete phone-home functionality and advertisements. I know that I'll be able to swap applications while still keeping all the important data, and I know that if something goes wrong, everything will still be alright in the end.
Open source web frameworks like Drupal do the same thing for web site creators: they provide a base that you can trust to go into the right direction, because that's the nature of genuine Open Source and in everyone's own interest, too. However, it does not provide the same level of trust to its end users: they only get HTML/JavaScript output without being able to hack the application and control the data that they put on the web page. Users can only delete data if the corresponding permissions are set, they can neither control nor modify the information that is logged about them, and they can only migrate to another system if the web site provides explicit export functionality or a suitable API.
With a tad of worry, I watch the trend of people giving away lots and lots of personal data to the web, in exchange for comfort or reliability. Mails keep being stored by GMail or other mail providers with fancy web interfaces, pulling them away and on one's own system with POP3 is a dying practice. Life is being captured in Blogger, Facebook and Twitter. If I want to browse through my friends' photo albums, I need to register on StudiVZ (German Facebook rip-off) because that's where they store them. Those services are provided by people who I do not trust to do the right thing, because even if the web sites run Free Software, the way it works does not guarantee that my data is safe and my interests are being followed - it might just not match the business model of the web site providers.
If you think that sounds like a lot of paranoia, you're probably right. Still, the point that I'd like to make is that we had all of this before: the user depending on proprietary software that controls what happens to the data, and thus creating vendor lock-in - which is a network effect, and causes more people to use the same software. As the desktop is slowly being freed from lock-in, the exact same thing is now being shifted onto the net. Instead of having to trust Microsoft for their office data, people now have to trust Facebook for their social online life. The only difference is that MS Office costs lots of money while Facebook is free (as in beer), because of their business model.
As of today, the web is not open. The GPL is the new BSD, and the web is the new freeware (not to be confused with Free Software). In order to let users keep the freedom that is now available (and usable) on the desktop, open source web software must work on decentralizing the web. Users should be able to keep their own web presence like they keep their desktop system: personal, trusted and only passing data around when that is desired. It shouldn't be necessary to have a single huge web site where the data of all different users comes together; instead, users would have their own data store that, for example, sends out twitter updates to the data stores of all the intended receivers. Instead of a central site that's in charge of everything, lots of small sites would communicate with each other, and the user would be in control of the data.
If the web replaces the desktop, it should be judged by the same criteria, and that goes not only for bling and usability but also for openness. Personally, I think that centralized, data-centric web applications are the biggest threat for openness and self-determined choice of client software since MS Office came around, and Google is doing well covering that issue by supporting Open Source where it doesn't hurt their main strategy. But at least they're being honest about it and try to do it nicely: I'm still a big fan of the Summer of Code and GHOP programs :P
In other news, both my Japan/India trip and Drupalcon were a blast, and I'm finally going into stealth mode now. See you later!
Update: Most of the stuff below does not apply anymore, as dopry has decided to throw away most of that code because he found it to be too experimental. Whether that is a good thing or not is up to the reader - personally I'm a bit disappointed about that move, but not enough to fork FileField. End of update, original posting follows.
Hi Planet! Without a doubt, you were most probably wondering what I was up to after my Summer of Code student seemingly disappeared without further notice (and is still missing in action). Well, your desire for information shall be appeased. If you inferred something from the title, you were right:
FileField for Drupal 6 is coming!
And coming strong. In fact, I think it's a lot nicer than any FileField version for Drupal 5. That wasn't to be expected a month ago when there was no D6 port at all, but fortunately our lovely Larry Garfield came along and ported the changes from the initial but yet unfinished ImageField 6.x to FileField.
That got me hooked. File uploads were basically working, but everything else was just as badly broken still. It seems to me that FileField was affected by every single large change that Drupal 6 and its companions - CCK 6.x-2.x, Views 2 - brought along. Uploads, AHAH, CCK widget interactions, Views integration, and a lot more, all of those essentially required to be rewritten nearly from scratch. Ok, Views integration was a no-brainer thanks to Views 2, and is infinitely more capable than the previous one.
All in all, little remains from the original FileField code, and in addition to the plain porting effort I also worked on making the widget and formatter more extensible. And of course, there can only be a single consequence:
Image support for FileField is coming too!
In less than 200 lines, that is. The majority of those being registration hooks, theme wrapper functions and comments. But then, two images say more than 351 words:
If the one or the other maintainer of some <whateverfileformat>field module jumps on the bandwagon then we should easily be able to phase out like two or three modules, replaced by significantly smaller (and more maintainable) FileField extension modules. Let's fight code duplication and unify CCK managed files from a unified codebase! Anyone feel like helping out with the one or the other feature or with a new file handler in addition to the generic and the image ones?
So I'm now aggregated on Planet Drupal. I'm Jakob (hi!), and you may remember me from projects such as Version Control API (Summer of Code '07), Temporary Invitation or the Duration CCK field. Granted, the latter is brand new, so there's little chance you know me from that one.
But introducing myself is not the purpose of this post, of course. The real purpose is to introduce the student that I'll be mentoring (together with AjK) during this year's Summer of Code. In his own words:
Hi! My name is Markus Schanta, I'm 21 years old and I live in Eisenstadt, Austria. I study Software & Information Engineering at the Vienna University of Technology and Business Administration at the Vienna University of Economics.
During this summer I will be working on the Version Control API, preparing it for usage on Drupal 6 and drupal.org. The Version Control API, which originates from Jakob's last year's SoC project, has largely decoupled the Project module from cvs.module. This project will unleash the full potential of the Version Control API and make it production ready for a new set of users including drupal.org.
My project will lead to increased flexibility in handling projects, reduce future maintenance costs and last but not least make it possible to use a more modern revision control system (like Subversion which is frequently in the talks) on drupal.org.
I'm really looking forward to working on this project an I'm sure that it's going to be a really exciting summer!
In other words, this time we're going to kill cvs.module for good. Congrats to Markus and all the other students that were accepted for this year's Google Summer of Code. Little fits better than the wise words of certain KDevelop developers:
It's gonna be great, you're gonna love it!
And that's all there is to say :)
Recent comments
2 years 7 weeks ago
2 years 21 weeks ago
2 years 22 weeks ago
2 years 40 weeks ago
2 years 40 weeks ago
2 years 42 weeks ago
2 years 42 weeks ago
2 years 42 weeks ago
2 years 42 weeks ago
2 years 42 weeks ago