Dynamic GEN AI-Powered Web Crawling on Azure Using Automation Account and GPT-3.5
Chandan Srinath1, Sakshi Srivastava2
1Chandan Srinath, Digital Aviation Solutions, Boeing India, Pt. Ltd., Bangalore, India.
2Sakshi Srivastava, Digital Aviation Solutions, Boeing India, Pt. Ltd., Bangalore, India.
Manuscript received on 19 September 2024 | Revised Manuscript received on 27 September 2024 | Manuscript Accepted on 15 December 2024 | Manuscript published on 30 December 2024 | PP: 6-10 | Volume-14 Issue-2, December 2024 | Retrieval Number: 100.1/ijeat.B455614021224 | DOI: 10.35940/ijeat.B4556.14021224
Open Access | Editorial and Publishing Policies | Cite | Zenodo | OJS | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: The integration of AI-powered automation in web crawling marks a significant advancement over traditional methods, which were often labor-intensive, inflexible, and prone to security risks. This paper presents a case study on the implementation of a dynamic web crawling solution using Azure Automation Account, leveraging GPT-3.5 from Azure OpenAI services. This new approach allows for parameterized execution via automation variables, enabling user-defined requirements to guide the crawler’s behavior in a more flexible and intelligent manner. Unlike previous static methods that required constant manual adjustments, our system uses GPT-3.5’s Natural Language Processing (NLP) capabilities to interpret complex instructions and dynamically adapt to various web structures. Post-crawling, the data undergoes a security scan using ClamAV, ensuring its integrity before storage in Azure Blob Storage. SendGrid is employed for user alerts regarding the scan results and storage status. The system is scheduled to run at regular intervals, fully automating the process while maintaining robust security protocols. This paper includes a detailed comparison between traditional web crawling techniques and this AI-driven approach, demonstrating the improvements in efficiency, security, and adaptability.
Keywords: Azure Automation, Clam AV, GPT-3.5, Web Crawling.
Scope of the Article: Artificial Intelligence and Methods