GeoGPT Development and Collaboration Meeting
Summary:
Date: Wednesday 12 March 2025
Location: Geological Society of London and online
All presentations can be found at geogpt.zero2x.org/news
Attendees: 8 in-person and 48 virtual, spanning interested organisations across DDE, IUGS, geoscience societies and publishers, academia, industry and governmental organisations. The meeting lasted for 5 hours from 11:00 – 16:00 GMT.
Chaired by the GeoGPT Governance Committee (GC)
Attended by the GeoGPT Executive (Exec)
The objective of the meeting was to present the progress of GeoGPT since the July 2024 meeting. It was the first meeting to be chaired by the GC and attended by the executive and open to stakeholders. The agenda included:
- Introductions and Objectives
- The role of GeoGPT & the LLM landscape:
- Technical status of GeoGPT
- Overview of the Governance Committee
- Open Discussion
- Conclusions
Agenda and notes
1. Opening Remarks
The GC chairs outlined the meeting structure and encouraged active participation. All participants introduced themselves and their affiliation.
2. The Role of GeoGPT & the LLM Landscape:
GeoGPT Executive: The presentation covered the following key points:
a. AI’s Transformative Impact and Nobel Prize Recognition:
- Discussed the recent Nobel Prize awarded to Geoffrey Hinton, noting the surprise of AI pioneers winning a physics prize.
- Likened AI’s current impact to historical breakthroughs such as penicillin and X-rays, calling it a transformative moment in science.
b. Paradigm Shift in Science:
- Referenced Thomas Kuhn’s The Structure of Scientific Revolutions, and how AI is driving a new paradigm shift in scientific research.
- Emphasized AI’s growing role in reshaping scientific methodologies and problem-solving approaches.
c. Evolution of AI:
- Using a timeline chart, traced AI’s development from the 1950s (starting with Turing’s paper on intelligent machines) to the pivotal 2017 Attention is All You Need paper by Google.
d. Transformers and LLM Foundations:
- The Attention is All You Need paper introduced two key concepts: attention mechanisms and tokenization, which became the foundation for large language models (LLMs).
- Referenced an interview with Geoffrey Hinton, who speculated that without Google’s publication of the paper, LLMs might have been delayed, though new technologies would still have ultimately emerged. Hinton also expressed his belief that younger generations should lead the exploration of new technologies.
e. From ImageNet to Transformers:
- Noted that the work on transformers can be traced back to ImageNet, a foundational dataset that preceded the transformer era.
f. The Third Paradigm:
- Discussed the Third Paradigm as the synergy of compute-intensive models, data cleaning, and model integration.
- Highlighted that the most significant change in the last five years has been the focus on scale, which has driven increases in speed, automation, and capabilities.
- Emphasised that the defining shift in artificial intelligence today is scale, a transformation that has driven significant advancements and led directly to initiatives like Deep-time Digital Earth (DDE) and GeoGPT. Collaborations with esteemed institutions such as AAAS and Springer Nature have further enhanced this evolution. For GeoGPT, feedback from geoscientists is particularly crucial, ensuring the relevance and effectiveness of AI applications within the geosciences community. The convergence of data-driven methodologies, intensive computational capabilities, and sophisticated modeling marks a third major paradigm shift in scientific research and technological innovation.
- Data driven + Intensive Computation + Innovative Models will lead to a paradigm shift
g. Introduction to GeoGPT:
- Introduced GeoGPT, outlining its vision and mission to address three core issues in geoscience.
- Underlined that GeoGPT could serve as a role model for other scientific disciplines.
- GeoGPT is not intended to replace scientists but to enhance human creativity and support geoscientific research.
h. History of LLM Development:
- A brief overview of the history of large language models (LLMs), tracing their development from pre-2018 to the scaling phase in 2019.
- Highlighted the rapid advancements in LLMs and their growing applications across various fields.
i. Geoscience-Focused LLMs:
- The emergence of geoscience-focused LLMs, positioning GeoGPT within this context.
- GeoGPT is designed to address the unique challenges and opportunities in geoscience research.
j. Challenges in Developing Geoscience LLMs:
- Outlined the specific challenges in creating LLMs for geoscience, including data complexity, standardization, and the need for domain-specific adaptations.
k. GeoGPT’s Strengths and Position:
- Compared GeoGPT to other similar initiatives, highlighting its unique strengths, such as its focus on geoscience and its integration of advanced AI tools.
- Emphasized GeoGPT’s potential to serve as a model for other scientific disciplines.
Key Announcement following session 1:
- An MOU between DDE and Zhejiang Lab has been signed and is available online.
- GeoGPT will be made publicly available at EGU 2025
- The Director of the Governing Committee of DDE (Deep Digital Earth project) highlighted the overall objectives of DDE in partnership with GeoGPT, saying that DDE and GeoGPT are on the same wavelength and expressed strong support for GeoGPT’s goals and shared optimism about its future progress.
- Confirmed that the signed MOU between DDE and Zhejiang Lab will be made public.
- Emphasized the importance of open communication and collaboration moving forward.
- The GeoGPT team was commended team for its commitment to open science and open data initiatives. Encouraged to adhere to the FAIR principles (Findable, Accessible, Interoperable, and Reusable) to ensure data is freely available and usable.
Comments and Discussions on session 1
- Questions were posed on data standardization and geoscience is dealing with this complex issue?
- Comment in the chat that there is data standardisation these days but mostly coming from engineering geology sector. See also the IFC standard (open) from the AECOM industry and BIM that drives digitilisation. Perhaps one can build data standardisation along these lines. See also in ISO 14688 and ISO 14689. IUGS has its CGI commission.
- An attendee sought clarification on whether the Third Paradigm refers to the integration of open models and data. This was confirmed to be part of the later discussion.
- Issues around data disclosure were briefly raised but were deferred for more detailed discussion later in the meeting.
- Ongoing challenges of data standardization in geoscience were noted, particularly within the data pipeline, noting its complexity and importance for the success of initiatives like GeoGPT.
- An attendee expressed admiration for the progress made so far, noting that they had thought the project was still in its early stages and asked about the project timeline and how the Governance Committee could contribute before the final release.
3. Technical status of GeoGPT
The GeoGPT project leader delivered a detailed presentation on GeoGPT, its functionality, and future plans, followed by a live demo.
a. GeoGPT Overview:
- GeoGPT is a system based on large language models (LLMs) but is distinct from ChatGPT and other LLM as It includes specialized tools and functions designed to assist with geoscience-related tasks.
b. Timeline and Upcoming Upgrades:
- GeoGPT will be released in conjunction with EGU meeting in Vienna (end of April 2025). This will include the future development timeline of GeoGPT and upcoming upgrades, including release of:
- Open model weights.
- Data source sharing.
- Deep research functions.
- Feedback from the July 2024 meeting had been addressed in a detailed FAQ published online.
c. Training Data and Use Cases:
- Discussion on how GeoGPT obtains its training data.
- Introduced four specific use cases of GeoGPT, showcasing its practical applications in geoscience.
- Highlighted the global impact of GeoGPT’s work over the past few months.
Questions and Discussion:
a. Data Standardization Pipeline:
- Questions were asked about the pipeline for data collection, processing, and synthesis.
- How does GeoGPT deal with data standardization for geoscience?
b. Source Code Availability:
- Asked if GeoGPT’s source code would be made available, the GeoGPT exec clarified that only the beta version is currently available, with the full release planned for a later date.
- EGU Release In response to a question regarding the EGU 2025 release, it was explicitly communicated to the EGU General Secretary that the release would coincide with the EGU conference, during which GeoGPT would have a dedicated booth and undertake specific arrangements. The intention was not to request a special session or any similar event from EGU.
c. Referencing and Licensing Issues
- Concerns about how GeoGPT handles referencing and licensing, particularly if papers are illegally obtained by a user.
- Discussed strategies to ensure data quality, such as collecting expert-interpreted data to fine-tune the model.
d. Copyright Concerns and Questions:
- An attendee highlighted issues around copyright, especially when using open-access materials for commercial purposes. For example, using CC-BY-NC content to train model and then use for commercial purpose, whether this is OK or it is a bypass with the NC license?
- Question from an attendee on the implications of personal use of GeoGPT for commercial purpose utilizing copyrighted materials obtained by the user.
- It was agreed that specific clarification on third party use in commercial applications needed to be understood, although GeoGPT itself will not develop commercial products
- Asked if GeoGPT could integrate GIS functionality to extract topological information, which would be a significant contribution.
e. Knowledge Graph and Copyrighted Documents:
- Questions were raised about using commercially licensed papers to build knowledge graphs and whether they could be published.
- Discussed how to handle copyrighted documents and referencing issues, emphasizing the need for clear guidelines.
- oComment that the availability of knowledge graphs based upon open published science needs further thought as they could be hugely valuable to the wider community
f. Map chat:
Discussion on:
- How accurate is the content generated from Map chat?
- Does the Map chat include GIS information?
- Will Map chat can be 3D in the future?
4. GeoGPT Governance Committee
The GC introduced the GeoGPT Governance Committee, outlining its role, goals, and plans. The presentation focused on ensuring the responsible development and deployment of GeoGPT, addressing key challenges in AI governance:
a. AI Trust Problem:
- Highlighted the growing concerns around trust in AI systems, emphasizing the need for transparency, accountability, and ethical practices.
b. AI Paradoxes:
- Discussed some inherent paradoxes in AI development, such as the balance between innovation and regulation, and the tension between open access and intellectual property rights.
c. Controlling AI Through Guidelines, Laws, and Regulations:
- Stressed the importance of establishing clear guidelines, laws, and regulations to govern AI systems and ensure their responsible use.
d. Controlling AI Through the Courts:
- Mentioned the potential role of legal systems in addressing AI-related disputes and ensuring compliance with ethical and legal standards.
e. Lessons from GeoGalactica:
- Shared insights from the GeoGalactica initiative, highlighting lessons learned that inform GeoGPT’s governance framework.
f. Introduction to the GeoGPT Governance Committee:
- Outlined the committee’s role in overseeing GeoGPT’s development, ensuring ethical practices, and addressing stakeholder concerns.
- Emphasized the committee’s commitment to fostering collaboration, transparency, and accountability.
Questions and Discussions:
- Asked whether the Governance Committee had received feedback from publishers regarding data usage and other concerns.
Response:
- Confirmed that discussions with publishers are ongoing but no concrete agreements have been reached yet.
- Noted that the discussions have been positive so far, indicating a willingness to collaborate.
5. Open Discussion
Funding and Financial Longevity
a. Asked about the overall financial model of GeoGPT and its sustainability.
GeoGPT exec Response:
- Explained that GeoGPT is part of Zhejiang Lab, which is funded by the Zhejiang Provincial Government as a non-profit organization.
- Emphasized the importance of community support and partnerships to sustain the project, as GeoGPT is open-source and not intended to be a commercial product.
Concerns were raised about the project’s long-term financial sustainability.
GeoGPT executive Response:
- Stated that funding is secured for the next 3-5 years but acknowledged the need for broader community support to ensure long-term sustainability.
- A participant expressed concerns that uncertainty around future funding could lead to a decline in confidence and potential drops in quality.
GC Response:
- Highlighted the role of the Governance Committee in ensuring financial stability and maintaining quality standards.
- Proposed developing a business plan initiative in the future to address longer term funding challenges.
GEoGPT exec perspective:
- Stressed that the focus should be on ensuring the quality of GeoGPT rather than scaling up quickly.
- Believed that community support and collaboration are key to achieving this goal.
License Agreement and Copyright Issues
The Role of Publishers:
There was discussion on the debate between about the role of publishers and the need to contribute more science as open access while acknowledging that publication is a business which comes with costs which must be considered.
Attendees
- Emphasized the need for a long-term plan to address licensing and copyright issues, as licenses require funding regardless of open science initiatives.
- Mentioned license agreement framework linking number of words/tokens to licensing fee
- Emphasised that some publisher/societies were using third party brokers in discussions with LLM.
- Discussion on open release for older publications (e.g., more than 2 years since publication) or lower cost licensing. Discussion on what is the financial value of content from decades ago. View was expressed that legacy content has value which should be considered; recognition that this will be a part of licensing negotiations.
- Reiterated the importance of maintaining high-quality data to ensure the reliability of GeoGPT’s outputs.
- Participants raised concerns about proper attribution for authors whose work is used by GeoGPT.
- Asked if the tools developed for GeoGPT are available on other LLM models. GeoGPT project leader clarified that the tools are purpose built for GeoGPT to carry out geoscience-related tasks.
6. Conclusion
- The Governance Committee co-chairs thanked participants and expressed optimism that GeoGPT will be a valuable tool in meeting geoscientists’ needs.
- Emphasised that all presentations are available on-line. Key updates will be posted.
- Welcomed further ideas and feedback and use of the FAQ link.
- Emphasised GeoGPT’s commitment to open dialogue, communication, and collaboration
Outstanding issues to be followed up on by the GC:
- Development of licensing agreements with data owners
- More information on tools such as map chat
- Better outline of the overall business model for GeoGPT
- Data standardization issues to be refined
- Risk Assessment to be completed by EGU release
John Ludden & Richard Chuchla
Co-Chairs GeoGPT Governance Committee