Dynamic Content Generation through Web Crawling
C. Lee Giles
David Reese Professor
Director of the Intelligent Systems Research Laboratory
College of Information Sciences and Technology
Description
Our early conversations with Lee explored several different areas related to his research: search engines, web crawling, knowledge management, data mining, machine learning… the list is long! With the expertise of Lee, along with graduate students in his lab, we started to brainstorm applications of various technologies and web search methods with the needs of teaching and learning across Penn State. One specific example really stood out: SCIgen.
In a nutshell, SCIgen is a software program that can develop ‘fake’ papers, using historical computer science papers. While the application of this technology is rather devious, we started to talk with Lee about how we might take the same principles behind SCIgen, but put them to use in a way that can enhance teaching and learning.
What we’re working on with Lee is a tool that can go out and crawl the web based on a list of various criteria, and attempt to generate a useful document or set of resources. Imagine a faculty member setting out to create a new course on learning analytics. In addition to conducting background research and exploring various single resources, what if the faculty member could enter a variety of key words of phrases that represent discrete topics in the course? For instance, if I were teaching a course on learning analytics, I might begin by choosing several distinct content areas:
- Educational data mining
- Statistics
- Learning Science
- Education Psychology
- Data mining
Then, within one topic area, I might select a few key phrases, people or tools I’d like to include in the course. Continuing our example:
- Educational data mining
- Decision Trees
- Predicting student persistence
- Machine learning
- Software: R
- Software: Rapid Data Miner
- People: Ryan Baker
The software then works similar to SCIgen, and will go out on the web to various places looking for relevant content to bring back to the faculty member. The technology can also be configured in such a way that it can make assumptions about the complexity of the content, so the faculty member in this example can indicate if the course is for novice or advanced students.
We believe that a tool like this will benefit a faculty member or instructional designer in the design and development of various learning environments. The tool might also be helpful for students reviewing specific topic areas. We look forward to working with Lee and his students throughout the academic year on this exciting project!
The Team
Bart Pursel – Project Lead
Kyle Bowen
Hannah Williams
Ben Brautigam
Shuting Wang
Chen Liang
Zhaohui Wu
Cohort
2014
Focus Area
Generative AI