sessions and coursework Computational exercises Paired with each lecture (Due at the end of each computer lab) Research challenge Assignment to complete (details after Lecture 9) Registration of absence or mitigation goes via the student office
sessions and coursework Computational exercises Completed - well done! Research challenge Individual assignment (details today) Registration of absence or mitigation goes via the student office
skills you have picked up so far • To extend your knowledge through self-study, exploration, and cohort interactions • To produce an annotated code with comparison to community benchmarks An opportunity to develop your practical skills. Goals:
Your job is to produce an original model for the given classification or regression task Some tasks use chemical composition only, while others use composition and structure
Read the matbench paper and the models that have been tested I. Data Preparation II. Model Selection, Training & Testing III. Discussion of Results https://doi.org/10.1038/s41524-020-00406-3
unique solution for a given problem You may be interested in speed or clarity, but ultimately want a robust code • Check package manuals, e.g. https://matplotlib.org & https://scikit-learn.org • Search https://stackexchange.com & https://github.com for ideas
you use an LLM (e.g. GPT-4, Gemini, Co-Pilot)? • Specify tasks (e.g. code assistance) • Were any limitations/biases noted? • How did you ensure ethical use? Statement to be included in the submitted notebook
(4,764) Regression (with structure) Xia, Kinga B Experimental bandgap (4,604) Regression (composition only) Irea, Pan C Glass formation (5,680) Classification (composition only) Yifan, Fintan Dataset details are provided in Notebook 9 One challenge per person has been randomly assigned
rooms: Class 9 14:00-15:30 Class 10 14:00-15:30 The computer room is also booked on Feb 24th and 27th from 13:00-16:00 for self-study (no GTAs) Submission deadline: 10th March 15:00
notebook (.ipynb) and 2. Recorded presentation* (max 5 min) where you introduce your code and your results on model training, selection, and performance *Format is flexible. Could be recorded in PowerPoint, screenshare on Zoom, or plain video
appropriate pre-processing steps Model Selection, Training and Testing 20 % Justify model based on the problem, with appropriate validation and testing Model Analysis and Discussion 20 % Analysis of model performance, including high-quality plots Python Code Quality 20 % Clearly structured code with meaningful annotations Recorded Presentation 30 % Clarity and conciseness in model choices, results, limitations
on decision making processes How do these translate to the materials context? Transparency and Explainability Interpretation of model predictions Privacy and Data Protection Collection, storage and using sensitive data Social Impacts From productivity increases to job displacements