The reason corpora are so widely used is that they play an important, potentially crucial role in the scientific process. In the words of R. Dekker:
“data-sets [...] are becoming more important themselves and can sometimes be seen as the primary intellectual output of the research".
In other areas such as the Life Sciences and the Humanities, there are large corpora of data, and it is easy to see that they are beneficial there.
Surely, in Model Based Software Development (MBSD) we could reap the same benefits. Still, we do not have such data readily available in our field. Why is this so? Can we overcome this impasse, and if so, how?

We are not the first to ask these questions, but earlier approaches have not remedied the situation. So we have launched ”The Free Models Initiative”.
  • In April 2014, we held a workshop to kick off our initiative. We collected contributions of al lparticipants, and discussed issues and strategies.
  • We are currently setting up the Software Engineering Model Index (SEMI) to disseminate the workshop results and our own findings.
  • We are planning to organise a workshop at ICSE 2015 in Florence, to further raise awareness with
We hope that we have now reached the critical mass to initiate a cultural change process that will move MBSD ahead.
Here are some possible applications of models-asdatasets in MBSE.

Benchmarking:

New approaches and algorithms ought to be validated against their predecessors to be able to accurately assess their contribution. Representative data sets will be indespensable for valid evaluations. For instance, evaluating clone detection or difference computation tools requires an benchmark.

Best Practices:

Model benchmarks and reference models may contribute to improving the state of the practice of modeling by making good (or bad) examples widely accessible. The prerequisite for this are sample models that can e assessed by the community to develop a common understandign of model quality.

Validation:

A body of examples that is generally accepted as being representative allows researchers to validate new models against them, as being equally valid in one aspect or another. This will allow us to expand the body of models available for research purposes. It will also u allow us to assess new models and create a comprehensive overview of all pragmatic types of and phenomena in models.

Durability and Integrity:

The models in a repository must be available for a long time (several decades). Special care must be taken to avoid changing archived models.

Flexible Licensing:

There will likey be a wide range of liensing needs, as different providers have different demands. this will impact the access control.

Access control:

A certain amount of access control (in particular in relation to licensing, and for write access) needs to be exerted.

Search Features:

The benefit of the repository is to serve researchers, and the discovery feature of is of prime performance. So, performance, capability, and usability of the search features are of prime importance.

Provisioning Features:

In order to attract as many contributions as possible, providing models should be made as easy and pleasant as possible.

Non-Requirements:

In contrast, storage capacity, availability, and security are likely of minor importance.

Archiving:

In order to be of scientific use, models need to be stored with the same durability, reliability, and accessibility as papers. Taking advantage of more advanced disciplines, maybe ZENODO is an option.

Searching:

What model meta-data are the right ones? Which ones can we afford/require to extract? Can we extract (some of them) automatically, and if so, how? What techniques are useful for searching model repositories?

Terminology:

Different communities use inconsistent terminology. E.g., “model” and “model repository” in BPM are the same as “diagram” and “model” in MBSD.

Measuring:

It is not quite clear how to measure the size of models since there is no uniform terminology, and the topic of model mertics has not been exhausted yet. Similarly, it is an open question how to visualize a collection of models with a view to comparing their size, contents, and .

Intellectual Property:

Clearly, models are IP. What licensing schemes are suitable? Can we make more models publishable by techniques like obfuscation?

Incentives for Contribution:

Academic and industrial partners need incentives to publish. The least possible incentive is awarding the same recognition as publishing papers. This demands a cultural change in the community.

Spread the word!

Help making this initiative known and tell your colleagues and friends via a social network, by sharing the link, or citing SEMI. The more people know about SEMI, the more can benefit and contribute.

Share your models!

If you have models that can be shared, share them via any of the reposiories indexed by SEMI.

Use it for your research!

If you use a model from a SEMI-indexed repository, cite both the repository and the index to maximize publicity. Please refer to it as:

Harald Störrle, Regina Hebig, Alexander Knapp (eds.): “The Free Models Initiative”, DTU Compute Technical Report-2014-14, DTU, 2014

Start today!

There’s no excuse for procrastination! :-) Follow the link to the SEMI - do keep in mind that we’ve only just started. If you want to help coding or reviewing, you’re also most welcome!