Open-Source Medical Datasets
Using artificial intelligence (AI) and machine learning (ML) in the medical field has led to countless innovations, from diagnosing patients to monitoring epidemics. However, research and development teams may hit a major roadblock when trying to use this technology for healthcare: the cost of commercial medical datasets.
AI and ML solutions are only as good as the data that they’re trained on. If this information is inaccessible to an organization, then they may not be able to bring their transformative product to market. However, there’s hope in the form of open-source medical datasets.
What Are Open-Source Medical Datasets?
Much like the open-source software movement spurred a major change in the software world, open-source medical datasets focus on democratizing access to the information needed to power AI and ML healthcare technology.
This movement places an emphasis on treating medical data as though it’s a public good. To support this approach, several organizations and universities have committed to opening up access to their data sets.
Stanford’s Center for Artificial Intelligence in Medicine and Imaging(AIMI) is one of the leading forces in this area. It launched a free repository filled with medical imaging datasets designed to support open-source AI, such as recommendation engines. It has over one million images and is one of the largest open-source medical datasets of its kind.
AIMI hasn’t stopped there, however. It recently partnered with Microsoft’s AI for Health program to continue expanding data accessibility. The two organizations are working together to create a new platform with more functionality.
The goals of this platform include:
- More automation
- Better accessibility
- Greater data visibility
- Hosting and organizing images from other institutions
- Collaborative features for sharing research
- Providing cloud-based computing power for researchers who don’t have access to the necessary resource-heavy infrastructure
This new platform represents another important step forward for the open-source medical dataset movement, especially since it’s configured to support repositories from external institutions as well.
AIMI has positioned itself as a centralized source for incredibly valuable medical data, and they’re doing everything possible to support the researchers making advances in ML and open-source AI applications for healthcare.
Benefits of Open-Source Medical Datasets
Open-source medical benefits have a wide range of benefits, from individual- to population-level healthcare.
- Opening up access to researchers with limited funding: Smaller research groups and organizations may not have sufficient resources to pay for commercial medical datasets. If they have enough of a budget to support those costs, they may have to sacrifice in other important areas of the project. Open-source medical datasets eliminate that cost burden from these researchers and level the playing field for them.
- Identifying potential biases or harm in AI applications: AI bias comes in many forms, and in healthcare, it can lead to life-threatening consequences. When researchers have access to more medical datasets, they can reduce the biases inherent in the data. The added perspective can go a long way to stopping unintentional harm from happening in AI-powered applications.
- Supporting niche research: Researchers who focus on niche areas may struggle to find funding for their projects. That money ends up allocated to areas with a broader focus, which makes it difficult for these researchers. Open-source medical datasets support many types of research, which can lead to helping people with rare medical conditions, discovering new ways of using AI in healthcare and other innovations.
- Expanding available medical datasets: Researchers may discover important trends and patterns in their data when they have access to more datasets. By working with more information, they can search for factors they’ve overlooked, find connections that they never would have considered and otherwise expand their perspectives.
- Moving away from a dependence on commercial research: If corporations are the only ones creating large-scale medical datasets and offering them for sale, then it only ends up focusing on the areas that the organization cares about. If the company goes out of business or discontinues these medical datasets, then researchers could be left behind. Open-source medical datasets remove this worry from research teams, as well as making it easy for everyone to use this data for the greater good.
Other Open-Source Medical Datasets
AIMI’s open-source medical datasets are not the only ones available for researchers. This movement is continuing to gain popularity around the world, resulting in the growing availability of this data. Here are some examples of other open-source medical datasets that are well-suited for open-source AI and ML healthcare applications.
- gov: The U.S. Department of Health & Human Services manages this repository filled with high-value health data. These datasets are US-centric and are frequently updated and added to. Researchers can easily search through these topic areas to find exactly what they need for their projects.
- CDC Wide-ranging Online Data for Epidemiological Research: WONDER is the CDC’s repository for US public health data. This platform makes it easy to query this data and create reports with a US-centric focus and population-level healthcare.
- World Health Organization: The WHO maintains the Global Health Observatory, which contains health data from all 194 member states of this organization. There are over 1,000 indicators that focus on a wide range of high-priority health topics. Since this data comes from around the world, it can be particularly helpful in addressing region-based AI biases that may be present in other data sets.
- gov: The Centers for Medicare and Medicaid Services maintain this repository with datasets provided by Medicare and Medicaid accepting institutions. This is a US-centric resource that covers healthcare services data.
- Kent Ridge Biomedical Datasets: This dataset repository offers biomedical datasets that are featured in peer-reviewed journals. It primarily focuses on genomic sequence data and similar topics.
Open-source medical datasets represent an important movement for making the world better for everyone. With organizations such as AIMI leading the way forward for the medical world, healthcare AI and ML research has never been brighter.