Invenio Blog

Follow news and updates on Invenio world

InvenioRDM Partner Meeting Summary, March 2024

Sara Gonzales Apr 23, 2024 InvenioRDM

The InvenioRDM partner community met in Münster, Germany from March 18 - 22, 2024, for our first in-person annual meeting in over four years. Forty-six attendees from over fifteen institutions spent the five days making connections, planning, and diligently working on all things InvenioRDM.

Organizer Sarah Wiechers, software developer and research data manager for the Service Center for Data Management, ULB Münster, proposed the use of Open Space Technology, a meeting organization methodology in which the agenda and topics discussed are voted upon and implemented by the meeting’s attendees on the meeting date. This methodology was highly effective, allowing all community members the opportunity to pitch ideas and vote on their favorites for discussion.

University of Münster’s InvenioRDM Community GitHub site contains our full output of discussions, decisions, and plans. Key topics addressed were the timely handling of pull requests, planning for the v12 release, improvements to community workflows, translations, large file management, deployment, digital preservation, vocabularies and fixtures, Kubernetes Helm-charts requirements, and much more. To help manage feature requests going forward, we have agreed to implement GitHub Discussions, a tool which will allow for providing context, asking questions, and upvoting most-wanted features. GitHub Discussions is now part of our workflow for keeping the Roadmap updated.

We were thrilled to be able to work with so many old and new friends in beautiful Münster, and a huge thanks goes to our local hosts at University of Münster for their hard work and hospitality. We hope that you will be able to join us at next year’s meeting, with Hamburg currently slated as the host city. In the meantime, please take advantage of our following updated community workflows, including:

  • Development-focused chats during half of each telecon
  • A new email list - coming soon
  • Newly established Interest Groups (long-standing) or Task Groups (deliverable-focused). For the current list of all Groups, see the new Onboarding page

Prism: The New Feinberg Repository for Global Dissemination of Research

Galter Health Sciences Library & Learning Center Nov 28, 2023 InvenioRDM

The Prism institutional repository has launched at Northwestern University. Prism preserves and makes available articles, conference presentations, preprints, datasets, and other items created by faculty, staff, and students. Prism helps openness, maximizes reproducibility, and enhances research connections within Feinberg School of Medicine and across the globe.

Prism replaces the former DigitalHub and includes many much-anticipated features, such as the ability to create metadata-only records for offsite datasets, set embargo dates for releasing content to the public, create and curate communities of practice, and share private links to view and edit with colleagues. These new features complement existing features such as the ability to assign Digital Object Identifiers that make records citable, indexing by Google to make research widely discoverable, and a responsive staff at Galter Library to answer questions and provide support.

Kristi Holmes, the Director of Galter Health Sciences Library, Associate Dean of Knowledge Management and Strategy, and professor of Preventive Medicine, played a leading role in the development of Prism. According to Holmes, “It is essential to have a robust institutional repository that can keep up with the latest technologies and trends. As models for open access and data sharing continue to evolve, it's clear that institutional repositories will play an increasingly critical role to make research Findable, Accessible, Interoperable, and Reusable (FAIR).” She continues, “What makes this project particularly special is the strong collaborative approach we’ve taken both at Northwestern and also in partnership with the larger open source community. I’m grateful to our team at Galter Library for their incredible work.”

“We are excited to announce the launch of Prism as the institutional repository for Feinberg School of Medicine” says Karen Gutzman, Head of Research Assessment and Communications at Galter Health Sciences Library and Learning Center. Prism builds on a strong research foundation first made possible in the DigitalHub repository. “One of the most exciting features of Prism is the ability to create communities on topics, projects, or events” says Gutzman. Communities include open access research from Feinberg on COVID-19, training presentations from the Biostatistics Collaboration Center, and the NUCATS Grant Repository, a centralized resource for grant writers and investigators internal to Northwestern provided by the NUCATS Institute. “Prism is an excellent home for the NUCATS Grants Repository, allowing us to easily share exemplar grant templates and other resources with investigators across the Northwestern University community,” says Dr. Richard D’Aquila, Director of NUCATS.

Prism is built on the InvenioRDM software, which also forms a strong and sustainable foundation for Zenodo. “With its user-friendly interface and advanced features, InvenioRDM is truly a game-changer in the world of repositories. This platform is designed to make research more accessible and open to the public, promoting innovation and collaboration within the academic community,” says Holmes. Over the past several years, CERN and Northwestern have collaborated as core co-developers of the software, partnering with the global Invenio Open Source Community to develop InvenioRDM as a turnkey, scalable, and top-of-the-class user experience software for repositories. The InvenioRDM software offers a reliable environment for science, empowering preservation, credit, discovery, and sharing while maintaining integrity in its responsiveness to the evolving needs of the research community, including data sharing policy compliance. Northwestern contributions to the open source project are led by Matt Carson, Senior Data Scientist and Head of the Galter Library Digital Systems Department, with developer Guillaume Viger leading technical work and Sara Gonzales, Senior Data Librarian at Galter Library, contributing to a wide range of efforts, including serving as the Community Manager.

Notably, Northwestern and CERN recently expanded this collaboration through an award from the NIH Office of Data Science Strategy/Office of the NIH Director pursuant to OTA-21-009, Generalist Repository Ecosystem Initiative (GREI), to Zenodo to help researchers improve discoverability of their data and lead to greater reproducibility and reuse of data. Through this award and others, the infrastructure reflects Desirable Characteristics of Data Repositories for Federally Funded Research and continues to evolve to meet research needs and support a vibrant data ecosystem.

Conference Spotlight: InvenioRDM Workshop Day at Open Repositories 2023

Sara Gonzales Jul 10, 2023 InvenioRDM

Last month InvenioRDM project partners convened in Stellenbosch and Cape Town, South Africa for the 18th International Open Repositories Conference (OR2023). We were thrilled to offer our second InvenioRDM workshop on the 12th of June (see our wrap-up of the first workshop: OR2022 workshop). During three eventful hours we spoke with librarians, data managers, administrators, and developers about the basics of joining the InvenioRDM project and standing up an instance. We also shared information on customizations and add-ons for those interested in the platform’s advanced capabilities.

Following Nicola Tarocco’s introduction and demo of key InvenioRDM functions and features, our sessions included:

  • Maximilian Moser’s presentation on customizations for his local InvenioRDM instance TU Wien Research Data, as well as a CLI command enabling very large file upload
  • Dan Granville’s presentation on IIIF in InvenioRDM, exemplified through his work with Data Futures GmbH
  • Guillaume Viger’s presentation on the launch of Northwestern University’s Prism instance, with a particular focus on successful migration of awkward data
  • Matt Carson’s presentation on InvenioRDM’s support of the FAIR principles, with additional information on the platform’s support for data policy initiatives
  • Maximilian and Guillaume’s presentation on the details of, and endless possibilities for, deployment of InvenioRDM instances
  • Zacharodimos Zacharias’ presentation on his and the CERN team’s iterative and productive efforts to achieve fast and efficient data migration as a test for Zenodo’s move to InvenioRDM later this year.

The workshop attendees were engaged, brought great questions, and gave us much food for thought. Our conversations with potential new partners were so extensive that we were not able to include all planned presentations! Be on the lookout for future blog posts containing this bonus content.

But wait, there’s more! InvenioRDM’s OR2023 representation included Nicola’s fantastic lightning talk on behalf of InvenioRDM in the annual Repository Rodeo. In addition, Maximilian Moser and David Eckhard of TU Graz presented their work on a communication protocol to connect machine-actionable DMPs with InvenioRDM, and Zacharodimos Zacharias presented on InvenioRDM’s mature integration with ROR (the Research Organization Registry).

All of the InvenioRDM Workshop Day presentations are now available for download from Zenodo. We hope to host another InvenioRDM workshop at the 19th Annual Open Repositories Conference June 3-6th, 2024 in Gothenburg, Sweden. If we’ve missed you at our previous two workshops, we hope to see you there!

Introducing the InvenioRDM GitHub Archiver (IGA)

Mike Hucka and Tom Morrell Jun 26, 2023 InvenioRDM

The InvenioRDM GitHub Archiver (IGA) is a new software tool created by the Caltech Library. InvenioRDM is the basis for many institutional repositories, such as CaltechDATA, that enable users to preserve software and data sets in a long-term archive. The metadata contained in the record of a deposit is critical to making the record widely discoverable by other people. However, creating detailed records and uploading assets can be a tedious and error-prone process if done manually. This is where our new tool comes in.

IGA creates metadata records and sends releases automatically from GitHub to an InvenioRDM-based repository server. It constructs a metadata record using information it gathers from the software release, the GitHub repository, the GitHub API, and various other APIs as needed. Here are some of IGA’s other notable features:

  • Automatic metadata extraction from GitHub releases, repositories, and codemeta.json and CITATION.cff files
  • Thorough coverage of InvenioRDM record metadata using painstaking procedures
  • Recognition of identifiers that appear in CodeMeta and CFF files, including ORCiD, ROR, DOI, arXiv, and PMCID
  • Automatic lookup of publication data in DOI.org, PubMed, Google Books, & other sources if needed
  • Automatic lookup of organization names in ROR (assuming ROR id’s are provided)
  • Automatic lookup of human names in ORCiD.org if needed (assuming ORCID id’s are provided)
  • Automatic splitting of human names into family and given names using ML-based methods if necessary
  • Support for record versioning
  • Support for InvenioRDM communities
  • Support for overriding the metadata record IGA creates, for complete control if you need it
  • Ability to use the GitHub API without a GitHub access token in many cases
  • Extensive use of logging so you can see what’s going on under the hood

Data and software archived in a repository need to be described thoroughly and richly cross-referenced in order to be widely discoverable by other people. Of particular interest to software developers is that a repository like CaltechDATA offers the means to preserve software projects in a long-term archive managed by their institution. IGA helps make the creation of metadata and InvenioRDM records for software and data managed in GitHub as easy as possible.

More information about the InvenioRDM GitHub Archiver is available at caltechlibrary.github.io/iga/.

How does InvenioRDM support proper resource curation and validation?

Matt Carson May 30, 2023 InvenioRDM

With mandates and directives such as the European Union’s Plan S, the recently-introduced National Institutes of Health Policy for Data Management and Sharing in the United States, and the White House Office of Science and Technology Policy Memo requiring free and immediate access to federally funded research in the U.S. by the end of 2025, increased emphasis is being placed on curation, validation, and preservation of research output. In addition, efforts such as the Generalist Repositories Ecosystem Initiative (GREI) aim to bring next-generation generalist repositories into alignment on key features required for discovery and reuse of research outputs. Zenodo has joined this initiative and will migrate to the InvenioRDM platform later this year, which will play a crucial role in helping to support best practices in research data management for this popular service.

From ensuring deposited records abide by FAIR principles, to granular access control, InvenioRDM supports resource curation and validation with a variety of functionalities. Below are some of the key features that make the platform a great option for sharing and preserving your work:

Deposits:

  • Deposited records check all FAIR principles (Findability, Accessibility, Interoperability, Reusability)

  • Configurable confirmation text before deposit to ensure agreement with security and privacy policies

  • Long-running record drafts for pre-publication edits

  • A wide variety of licensing options

  • Ability to set an embargo period for records or files with automated embargo removal at a specified date

  • Contextualization of records via secondary identifiers to publications, project sites, external data, and other associated research outputs

  • Citations in multiple formats provided for each record

  • Empowers communities to self-curate domain specific records

  • Ability to archive software via GitHub integration within records, allowing data to be linked to code [coming soon]

  • Usage metrics and analytics (COUNTER-compliant statistics gathering for record access and downloads) [coming soon]

Documentation:

  • Extensive support documentation

  • Possible addition of informative pages e.g., Terms of Use, Privacy Policy, Deposit Agreement.

Metadata and Discovery:

  • Supports industry standards for interoperability

  • Ability to broker metadata to search and discovery services

  • Hand curation by repository administrators for metadata quality assurance

  • Versioning of all deposits, with a per-version permanent identifier and a per-deposit (across all versions) concept permanent identifier

  • Configurable registration with external identifier systems e.g., DOI

Access, Security, and Quality Control:

  • Authentication configurable with institutional credentials or third-parties (e.g., ORCiD, GitHub)

  • Checksum for uploaded files ensure file integrity throughout each file's lifetime

  • Restricted access capability

  • Can be configured for secure storage

  • Automated periodic file audits

New version of CaltechDATA launches on InvenioRDM

Tom Morrell Sep 21, 2022 Caltech

Caltech Library is pleased to announce that CaltechDATA, our institutional data and software repository, launched a major upgrade on Wednesday September 21, 2022.

CaltechDATA has served as critical research infrastructure for campus since 2017, and it hosts over 20,000 records containing datasets and software for a wide variety of disciplines. With this launch, CaltechDATA now runs on the open-source InvenioRDM platform and brings many new features that Caltech researchers have requested:

  • Easier record creation with autocomplete for creators, affiliations, subjects, and funders
  • Automatic record versioning
  • Private share link for reviewers
  • Improved record views, with dynamic citations and an expanded file previewer

This version of CaltechDATA also introduces communities, which enable groups at Caltech to create their own record curation and approval processes. Researchers can collect records into a single browse and search interface. A curation pipeline allows records to be submitted by Caltech users, and then approved by a defined set of curators. We’ve pre-seeded a small number of initial communities, and look forward to seeing what researchers create.

InvenioRDM is a customizable open-source repository platform developed by CERN and twenty partner organizations, including Caltech Library. It is built on the twenty-year history of the Invenio repository platform, whose most-successful implementation is the Zenodo generalist repository hosted by CERN. InvenioRDM takes the features of Zenodo and makes them customizable for institutions. InvenioRDM will enable Caltech Library to more rapidly roll out new features and collaborate with other institutions to establish repository best practices.

Conference Spotlight: InvenioRDM Workshop Day at Open Repositories 2022

Sara Gonzales Jul 11, 2022 InvenioRDM

On June 6, 2022 InvenioRDM partners from CERN, Northwestern University, TU Wien, NYU Libraries, and Cottage Labs assembled to present the InvenioRDM Workshop Day at Open Repositories 2022 (OR2022). At this first in-person meet-up in over two years, the team talked InvenioRDM with over 30 workshop attendees.

In the morning, participants were treated to Introduction and Installation presentations, followed by hands-on assistance with running InvenioRDM on their own machines. Attendees appreciated the one-on-one help and feedback and came away with a greater understanding of how to support the repository at their own institutions.

In the afternoon, participants were treated to show-and-tell sessions on local customizations at both TU Wien and Northwestern, followed by an introductory session on partner benefits and how to join the community. Next followed fascinating in-depth presentations on increasing accessibility in InvenioRDM, building off the robust work in this field undertaken by NYU Libraries and Northwestern. A case study in data migration followed, and the afternoon wrapped with a presentation on local implementation and adoption approaches at NYU libraries, which incorporated feature and sprint-based development of repository policies. In addition, the InvenioRDM-based COVID-19 SSH Data Portal of the European University Institute was demoed by Cottage Labs.

The team was pleased to be able to share information and tips in person with potential adopters, and to experience some much-missed in-person teambuilding. Here’s hoping these meet-up opportunities increase in the future!

And although OR2022 unfortunately did not offer a hybrid option for attendance, the whole InvenioRDM community (and beyond) can experience the workshop day by visiting this Zenodo community to see individual presentations, the day’s itinerary, links to additional resources, and a feedback form.

If you’ll be attending a conference soon, and might be thinking of presenting on InvenioRDM, please let Sara or Lars know!

InvenioRDM Partner Contributions: Virtual Project Meeting, 2021

Sara Gonzales Oct 12, 2021 InvenioRDM

On September 28-29 the InvenioRDM project team gathered again (though sadly, again, without fondue) for the Virtual Project meeting. Over 50 partners and team members were present and a great deal of helpful feedback was shared as we worked to develop the roadmap for the next stage of the project. Thanks to everyone for your contributions to the discussion, they are much appreciated!

The InvenioRDM project team would not be so robust without the thoughtful contribution of each of its members. Thank you to all of you who were able to commit the requested 1.5 person months to the project for the next year, in some cases more. The work of each partner site results in unique and valuable contributions to the team. We’d like to use this post to highlight, in a summarized format, the contributions that each project partner highlighted for us at the meeting. Here’s the amazing compendium of work that you have contributed over the past year:

  • BNL: Integrating the keycloak OAuth plugin module a part of the InvenioRDM framework.

  • Caltech: DataCite 4.3 support and JSON REST API for DOI Registration. Registering very complete metadata records with DataCite.

  • Data Futures GmbH: IIIF Presentation API record serializer, IIIF powered search view thumbnails, embedded Mirador 3.0 IIIF previewer.

  • EkoKonnect: InvenioRDM part of Open Science cloud platform.

  • Geo: Predefined queries for GEO Priority Areas, map preview/reference.

  • HZDR: Documentation of deployment steps, testing.

  • INFN: Community survey, requirements gathering, testing functionalities, participating in Metadata Working Group, commitment to REST API testing, writing of guides on how to integrate InvenioRDM APIs with programming languages used by research communities.

  • JRC: Invenio-records-resources feature to support one of January Release functionalities.

  • Northwestern: Various core contributions, including implementing OpenID Connect / OAuth at NU (includes SSO and MFA) and sharing local migration process and local implementation rollout.

  • NYU Contributed usability study findings and presented to Telecon; proof of concept geo previewer / viewer for vector GIS data; working toward extending the concept of index sample metadata records via the command line tool.

  • NII Testing based on local scenarios; knowledge sharing for InvenioRDM operation on Kubernetes (Monitoring, Logging and Backup strategies, etc.).

  • TIND Simple k8s demo setup (helm-invenio).

  • Tubitak Provided Turkish translations to InvenioRDM; will maintain and participate in release testing and Interest Group participation.

  • TU Graz: Ongoing work: 18n of the python modules; i18n of the react modules; OAI-PMH, core sprint participation.

  • TU Wien: During development sprints, helped implement Access Control (v1.0), Share-by-Link (v2.0), and Keycloak SSO (v4.0). Would like to contribute to Communities code in future.

  • Universitat Hamburg: Contribution highlights: S3 plugin, initial SAML integration. Committed to: Help with testing, documentation, dissemination.

  • Universitat Munster: Development of helm chart, to deploy InvenioRDM inside a Kubernetes infrastructure; implementation/documentation of Kubernetes based architecture (https://github.com/ulbmuenster/inveniordm-2-production); tests of REST API by automated (OAI-PMH based) import of datasets from the WWU document repository miami (https://miami.uni-muenster.de) into InvenioRDM.

  • Universitat Tubingen: Contributions planned if production status reached: dissemination, acquiring funding for development needs in interdisciplinary research projects, participation in NFDI’s (nationwide infrastructural developments in domain-related RDM).

  • Universitat Freiburg: Regular testing of new releases and features; automatic deployment & recovery strategies; basic UI customization.

  • ZHB Luzern: Testing: usability of setup resources (e.g. invenio-cli) & customization and styling; evaluation of metadata standard.

Thanks again, everyone, for your contributions! Hope to see you in person at the next project meeting.

InvenioRDM reaches major milestone - v6.0 released

Lars Holm Nielsen Aug 5, 2021 InvenioRDM

We're very happy to announce that InvenioRDM v6.0 LTS has been released!!!

Try It

Want to try InvenioRDM? Just head over to our demo site: https://inveniordm.web.cern.ch

If you want to install it, follow the installation instructions on https://inveniordm.docs.cern.ch/install/

Production ready

InvenioRDM v6.0 is the first release to be suitable for production services, and therefore the first to receive the Long-Term Support release label. This marks the achievement a major milestone for the InvenioRDM.

Features

Following is a high-level overview of features currently supported by InvenioRDM. This is just the beginning as we have a packed road map with exciting features ahead of us.

Records

  • Any resource type InvenioRDM allows you to store publications, datasets, software, images, videos or any other resource type you may have thus can serve as a single repository for all your records.

  • Any file format/size InvenioRDM accepts any file format in any size given that your underlying infrastructure can support it.

  • Versioning support Records and files are all versioned with optimized storage for large files.

  • DOI registration via DataCite InvenioRDM can register DOIs with DataCite for all records, and allows you to write plugins for other identifier schemes.

  • DataCite-based metadata InvenioRDMs internal metadata is based on the DataCite Metadata Schema which is a simple yet powerful format for describing nearly any research output (paper, data, software, ...).

  • Strong support for persistent identifiers Authors, affiliations, licenses, related papers/datasets etc can all be identified via persistent identifiers such as ORCID and RORs.

  • Extended Date Time Format (EDTF) support Publication dates and other dates support the EDTF format for recording imprecise dates and date ranges such as 1939/1945.

  • Previewers InvenioRDM comes with previewers for common files formats such as PDFs, images, CSV, Markdown, XML and JSON.

  • Citation formatting. InvenioRDM can generate citations strings for your records using the Citation Style Language with support for more than 800+ journal citation styles.

  • Record preview. Before you publish your record, you can see a preview of how it looks like.

  • Metadata-only records Both records with or without associated files are supported.

  • Identifier detection and validation. InvenioRDM comes with support for automatic detection and validation for a large number of persistent identifier schemes (i.e. less typing and clicking for end-users).

Search

  • Faceted search. InvenioRDM supports fully customizable faceted search.

  • Advanced query syntax. InvenioRDM has support for advanced querying via simple term search, phrase search, range search, regular expressions and custom ranking/sorting/

  • Auto-complete as you type. InvenioRDM exposed advanced APIs for search-as-you-type scenarios.

Auth, permissions and security

  • Login via institutional account. InvenioRDM makes it easy to integrate your institutional authentication provider such as e.g. Keycloak, OAuth or alternative use e.g. ORCID for login.

  • Restricted records. InvenioRDM supports restricting access to files only or to the entire record.

  • Share by link. Restricted records can be shared with peer-reviewers or your colleagues via secret links.

  • Embargo support Restricted records can be embargoed so that they are automatically made publicly on a specific date so that you can comply with e.g. funders' Open Access mandates.

  • Logged in devices. InvenioRDM allows users to see a list of currently logged in devices on their account.

Customizations

  • Styling and theming InvenioRDM can be styled and themed to fit into your institutional visual identity.

  • Custom vocabularies All vocabularies such as types for resources, dates, roles, relations, affiliations etc can be customized to your local instance.

  • Subjects InvenioRDM can load external subjects vocabularies used for classifications such as Medial Subject Headings (MeSH) and many others.

  • Permission system InvenioRDM supports advanced customizations to the permission system for e.g. IP-based access control.

APIs and interoperability

  • REST API InvenioRDM exposes a strong versioned REST API for all operations on the repository, that allows you to build your own integrations on top of InvenioRDM.

  • Export formats InvenioRDM supports exporting records metadata in multiple formats such as JSON, Citation Style Language JSON, DataCite JSON/XML, Dublin Core.

Infrastructure

  • Large file support InvenioRDM supports uploading and handling TB-sized files and can manage from MBs to PBs of data as long as your underlying storage systems supports it.

  • Multi-storage systems InvenioRDM allows you to integrate backend multiple storage systems in the same instance such as S3, XRootD and more.

  • Deploy anywhere InvenioRDM is a Python application and you can deploy into your institutional infrastructure wheather it is on bare metal, VMs, containers, Kubernetes or OpenShift.

Partners

The development of InvenioRDM is the result of an Open Source project kicked of 2 years ago with a diverse set of partners from all over academia and research. Most of the development and testing work has been conducted during the pandemic making it extra challenging for the people involved but also in a fully online environment.

Just the start

This release is just the start. Our next major milestone is to bring Zenodo.org on top of InvenioRDM, which means that most larger features have been shipped along the way, and that the system have been fully tested against large-scale heavy production loads.

InvenioRDM Community Spotlight: Summer 2021

Sara Gonzales Jun 7, 2021

Summer 2021 is an exciting time in the development of InvenioRDM, as the team works towards the Long-Term Support (LTS) version. Here are just a few updates we have to share about recent work:

Contributions from the community

InvenioRDM partners are not only local implementers, but frequently contribute their coding expertise to the project. Read this recent news item about the efforts of TU Wien developer Maximilian Moser related to the share-by-link feature and the authentication modules.

Usability testing

One way that the entire community can contribute towards InvenioRDM development is to test the most recent version of the software either through your local implementation or at the CERN sandbox site. Report any bugs you find using this form. Timely and accurate bug reporting gives the entire project a boost!

Getting the word out

InvenioRDM will be featured in two sessions at Open Repositories 2021, June 7-10, “Poster Minute Madness” and the “Repository Rodeo”, both taking place on June 9. If you or someone you know would like an introduction to the software, encourage them to attend!

Are you presenting on InvenioRDM at an upcoming meeting? Please let us know using this form.