10 Best Python Libraries for Natural Language Processing (2024)



nltk for python :: Article Creator

Python GitHub Token Leak Shows Binary Files Can Burn Developers Too

Scrubbing tokens from source code is not enough, as shown by the publishing of a Python Software Foundation access token with administrator privileges to a container image on Docker Hub.

A personal GitHub access token with administrative privileges to the official repositories for the Python programming language and the Python Package Index (PyPI) was exposed for over a year. The access token belonged to the Python Software Foundation's director of infrastructure and was accidentally included in a compiled binary file that was published as part of a container image on Docker Hub.

"Although we encounter many secrets that are leaked in the same manner, this case was exceptional because it is difficult to overestimate the potential consequences if it had fallen into the wrong hands — one could supposedly inject malicious code into PyPI packages (imagine replacing all Python packages with malicious ones), and even to the Python language itself," researchers from security firm JFrog, who found and reported the token, wrote in a report.

The incident shows that scrubbing access tokens from source code only, which some development tools do automatically, is not enough to prevent potential security breaches. Sensitive credentials can also be included in environment variables, configuration files and even binary artifacts as a result of automated build processes and developer mistakes.

The Python token leak was the result of laziness

Ee Durbin, the administrator of PyPI and director of infrastructure for the Python Software Foundation (PSF), wrote an incident report explaining how the leak happened. The leak involved the access token for Durbin's own account, which had administrative privileges due to his role in the organization.

In early 2023, Durbin was working on cabotage-app, a Docker-based tool developed by the PSF that is used to deploy PyPI and associated services on a Kubernetes cluster. While working on the build portion of the codebase, he kept running into API rate limits that GitHub enforces for anonymous access.

In what he calls "an act of laziness," Durbin decided to modify the source code locally to include an access token for his own account in order to bypass the default rate limits and finish the job faster. This was a quick fix, an alternative to configuring a localhost GitHub App to do the build instead of using the GitHub API.

While Durbin knew that adding personal access tokens (PATs) to source code is bad security practice, the change was only to his local copy of the codebase and was never intended to be pushed remotely. In fact, the automated build and deployment script was supposed to revert local changes, which should have scrubbed the token.

What Durbin didn't realize was that the token was also included in .Pyc (Python compiled bytecode) files generated as part of the build process, and that those files, stored in the __pycache__ folder, were not configured to be excluded from the final Docker image uploaded to Docker Hub.

After being notified by JFrog in late June, the PyPI security team revoked the token and reviewed all GitHub audit logs and account activity for possible signs that the token might have been used maliciously. No evidence of malicious use was found. The cabotage-app version containing the token was published on Docker Hub on March 3, 2023, and was removed on June 21, 2024 — fifteen months later.

"Cabotage is now entirely self-hosting, which means that builds of the cabotage-app no longer utilize a public registry and deployment builds are initiated from clean checkouts of source only," Durbin wrote. "This mitigates the scenario of local edits making it into an image build outside of development environments, as well as removing the need to publish to public registries."

Durbin said he will avoid creating personal access tokens for his account in the future unless absolutely needed, because aside from this one case, he doesn't remember any other instances where such a long-lived token has been helpful.

"This is a great reminder to set aggressive expiration dates for API tokens (if you need them at all), treat .Pyc files as if they were source code, and perform builds on automated systems from clean source only," he advised.

JFrog congratulated the PyPI security team for responding to their report and revoking the token within an impressive 17 minutes. While having perfect security is never possible, having a clear point of contact for security issues and a fast response time is critical to limiting the impact of security incidents for any organization.

Advice for developers

Aside from scanning binary artifacts and configuration files for potential secrets, developers should use the new fine-grained GitHub personal access tokens that were introduced two years ago instead of the classic ones. The new tokens enable users to choose the privilege levels and the specific repositories they provide access to.

"Creating the 'one ring to rule them all' is always a bad idea," the JFrog researchers wrote in their report. "We highly recommend using this feature, as we frequently encounter situations where a token providing ultimate access to the entire infrastructure gets leaked within a side project or temporary 'hello-world' application."

In addition, since 2021 GitHub tokens have a new format that includes a ghp_ prefix and a checksum, making it easier for automated tools to detect them. Old GitHub tokens, which haven't been deprecated and are still around, are indistinguishable from SHA1 hashes, which are also common in source code and not a security risk, so could be skipped by scanners. Developers are strongly advised to switch to the new token format.


Python And JavaScript Code Languages Are The Top Choice For Testers Scripting?

Ministry of Testing is your one stop shop for all things software testing. It has everything you need, from resources, education, events, and a network to validate you are on the right track.

  • Unlock catalog of knowledge
  • On-demand Courses
  • Testing Trends
  • Community Slack
  • Weekly Office Hours
  • Network with leaders in the industry
  • Unlock Professional Membership

    Why Python Remains One Of The Best Programming Languages To Learn In 2024

    Share

    Share

    Share

    Email

    Python's continued relevance and flexibility make it one of the best programming languages to learn in 2024. Its simplicity, extensive libraries, and strong community support contribute to its widespread use across various domains. Whether you are a beginner looking to start your programming journey or an experienced developer seeking to expand your skill set, Python offers numerous opportunities. Its applications in web development, data science, automation, and more ensure that Python skills will remain in high demand. Invest your time in learning Python to stay ahead in the ever-evolving tech industry.

    Easy to Learn:

    Python's syntax is clean and straightforward, making it an ideal language for beginners. The readability of the code allows new programmers to grasp concepts without getting bogged down by complex syntax. This simplicity extends to experienced developers, who can write and maintain code efficiently.

    High-Level Language

    Being a high-level language, Python abstracts many of the complex details of computer programming, such as memory management. This abstraction allows developers to focus on solving problems rather than worrying about the underlying hardware.

    Versatility Across Domains: Web Development:

    Python is a popular choice for web development, thanks to frameworks like Django and Flask. These frameworks streamline the process of building robust web applications. Django, in particular, follows the "batteries-included" philosophy, providing built-in features for various tasks like authentication, routing, and database management.

    Data Science and Machine Learning:

    Python is the de facto language for data science and machine learning. Libraries such as NumPy, pandas, and Scikit-learn facilitate data manipulation, analysis, and model building. TensorFlow and PyTorch, two leading frameworks for machine learning, further cement Python's position in this field.

    Automation and Scripting:

    Python excels in automation and scripting. Whether it's automating repetitive tasks or managing servers, Python scripts can handle it all. Tools like Ansible, written in Python, automate IT tasks such as configuration management, application deployment, and task automation.

    Strong Community and Ecosystem: Extensive Libraries and Frameworks:

    Python's rich ecosystem includes libraries and frameworks for virtually any task. From web development and data science to automation and artificial intelligence, Python has a library for it. This extensive collection reduces development time and effort, allowing developers to build applications more efficiently.

    Active Community:

    Python boasts an active and supportive community. This community-driven development ensures that Python evolves to meet modern needs. Numerous forums, blogs, and tutorials are available for learning and troubleshooting, making it easier for developers to get help and share knowledge.

    Regular Updates:

    The Python Software Foundation (PSF) ensures that Python is regularly updated with new features and improvements. Python 3.X, the current version, continues to receive enhancements that improve performance and add functionality. This commitment to regular updates keeps Python relevant and modern.

    Integration and Compatibility: Cross-Platform Compatibility:

    Python runs on all major operating systems, including Windows, macOS, and Linux. This cross-platform compatibility makes it a versatile tool for developers who work in diverse environments. Python's ability to interface with other languages like C/C++ and Java further enhances its utility.

    Cloud and IoT Integration:

    Python is widely used in cloud computing and the Internet of Things (IoT). Cloud platforms like AWS, Google Cloud, and Azure support Python, allowing developers to build and deploy applications in the cloud. Python's lightweight nature and extensive libraries make it suitable for IoT development as well.

    Performance and Efficiency: Efficient Development:

    Python's simplicity and readability contribute to faster development times. Developers can write, debug, and maintain code more efficiently compared to other languages. This efficiency translates into cost savings and faster time-to-market for applications.

    Performance Optimization:

    While Python is not the fastest language in terms of execution speed, various tools and techniques can optimize performance. Just-in-time (JIT) compilers like PyPy can significantly improve execution speed. Additionally, critical code sections can be rewritten in faster languages like C/C++ and integrated with Python.

    Career Opportunities and Demand: High Demand for Python Developers:

    The demand for Python developers remains high across various industries. Companies value Python for its versatility and efficiency. Roles in web development, data science, machine learning, automation, and DevOps frequently require Python skills.

    Competitive Salaries:

    Python developers often enjoy competitive salaries. The high demand for their skills, combined with the language's widespread use, makes Python proficiency a valuable asset in the job market. Learning Python can open doors to lucrative career opportunities.

    Educational Resources: Abundant Learning Materials:

    Python's popularity means there is an abundance of learning materials available. Online courses, tutorials, books, and coding bootcamps offer comprehensive Python education. Platforms like Coursera, Udemy, and edX provide courses tailored to various skill levels, from beginner to advanced.

    Hands-On Projects:

    Hands-on projects are a crucial part of learning Python. The language's versatility allows learners to work on diverse projects, from web applications and data analysis to automation scripts and machine learning models. These projects help reinforce learning and build a portfolio.

    Conclusion:

    Python, a language that has been around since the late 1980s, continues to dominate the tech industry. Its relevance and versatility make it a top choice for developers, both beginners and experts. In this article, we will delve into the reasons why Python remains one of the best programming languages to learn in 2024. We will explore its features, applications, and the vibrant community that supports its growth.






    Comments

    Follow It

    Popular posts from this blog

    Dark Web ChatGPT' - Is your data safe? - PC Guide

    Reimagining Healthcare: Unleashing the Power of Artificial ...

    Christopher Wylie: we need to regulate artificial intelligence before it ...