A Complete Collection of Data Science Free Courses – Part 1
Stanford NLP Group Releases Stanza: A Python NLP Toolkit
The Stanford NLP Group recently released Stanza, a new python natural language processing toolkit. Stanza features both a language-agnostic fully neural pipeline for text analysis (supporting 66 human languages), and a python interface to Stanford's CoreNLP java software.
Stanza version 1.0.0 is the next version of the library previously known as "stanfordnlp". Researchers and engineers building text analysis pipelines can use Stanza's tools for tasks such as tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and named-entity recognition (NER). Compared to existing popular NLP toolkits which aid in similar tasks, Stanza aims to support more human languages, increase accuracy in text analysis tasks, and remove the need for any preprocessing by providing a unified framework for processing raw human language text. The table below comparing features with other NLP toolkits can be found in Stanza's associated research paper.
Stanza's pipeline is trained on 112 datasets, including many multilingual corpora like the Universal Dependencies (UD) treebanks. The UD project attempts to facilitate multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective by developing cross-linguistically consistent treebank annotation for over 70 languages. The fully neural architecture applied to Stanza generalizes well as it helps achieve competitive performance on all languages tested.
The research paper displays the results after tests run on the UD treebanks dataset and a multilingual NER dataset. On the UD treebanks, Stanza shows that it's language-agnostic pipeline architecture is able to adapt to different languages by scoring the highest macro-averaged scores over 100 treebanks which covers 66 languages.
On the NER component, Stanza achieves similar F1 scores to FLAIR (on 75% smaller NER models) and outperforms spaCy.
Stanza also offers a python interface for accessing Stanford's Java CoreNLP software which provides additional tools to NLP practitioners. Taking advantage of CoreNLP's existing server interface, Stanza adds a robust client which starts up the CoreNLP server automatically as a local process when the client is instantiated. The client communicates with the server through RESTful APIs.
In the future the team behind Stanza hopes to provide an interface for outside researchers to contribute their models, improve the computational efficiency, and extend the functionalities by implementing other processors. The team at spaCy quickly migrated spacy-stanza (which allows users to import Stanza models as spaCy pipelines) to work with this new API.
15 Of The Best Python Courses You Can Take For Free This Week
TL;DR: A wide range of Python courses(opens in a new tab) are available for free on Udemy. Find some of the best examples here.
Python is a popular beginner-friendly option for anyone looking to take their first steps towards a career in coding. If you're interested in getting started, we can help.
You can find a wide range of beginner-friendly Python courses(opens in a new tab) for free on Udemy. That might sound too good to be true, but we wouldn't lie to you. We've gone ahead and checked out the entire range of free online coding and programming courses on Udemy, and lined up a selection of standout free options to give you a gentle push in the right direction.
These are the best Python courses you can take for free:
We know you must be looking for a catch. It's true that these free courses do not include things like a certificate of completion or direct messaging with the instructor, but that shouldn't stop you from enrolling. You can still learn at your own pace with unlimited access to all the video content, so what's stopping you?
If you insist on needing a certificate of completion to stick on your CV, you can upgrade for a small fee.
Find the best free online Python courses(opens in a new tab) on Udemy.
Opens in a new tab
(opens in a new tab)Credit: Udemy
The Best New Features And Fixes In Python 3.12
The Python programming language releases new versions yearly, with a feature-locked beta release in the first half of the year and the final release toward the end of the year.
Python 3.12 beta 1 has just been released. Developers are encouraged to try out this latest version on non-production code, both to verify that it works with your programs and to get an idea of whether your code will benefit from the new feature sets and performance enhancements in this latest version.
Here's a rundown of the most significant new features in Python 3.12 and what they mean for Python developers.
Better error messagesError messages have been getting more precise (exact positions in lines) and more detailed (better suggestions about what might be wrong) in recent Python versions. Python 3.12 brings additional enhancements:
The widely used Linux profiler tool perf works with Python, but only returns information about what's happening at the C level in the Python runtime. Information about actual Python program functions doesn't show up.
Python 3.12 enables an opt-in mode to allow perf to harvest details about Python programs. The opt-in can be done at the environment level or inside a Python program with the sys.Activate_stack_trampoline function.
Faster debug/profile monitoringRunning a profiler or attaching a debugger to a Python program gives you visibility and insight into what the program's doing. It also comes at a performance cost. Programs can run as much as an order of magnitude slower when run through a debugger or profiler.
PEP 669 provides hooks for code object events that profilers and debuggers can attach to, such as the start or end of a function. A callback function could be registered by a tool to fire whenever such an event is triggered. There will still be a performance hit for profiling or debugging, but it'll be greatly reduced.
Buffer protocol dundersPython's buffer protocol provides a way to get access to the raw region of memory wrapped by many Python objects, like bytes or bytearray. But most interactions with the buffer protocol happen through C extensions. Up till now, it hasn't been possible for Python code to know whether a given object supports the buffer protocol, or to type-annotate code as being compatible with the protocol.
PEP 688 implements new dunder methods for objects that allow Python code to work with the buffer protocol. This makes it easier to write objects in Python that expose their data buffers, instead of having to write those objects in C. The __buffer__ method can be used for code that allocates new memory or simply accesses existing memory; it returns a memoryview object. The __release_buffer__ method is used to release the memory used for the buffer.
Right now the PEP 688 methods don't have a way to indicate if a given buffer is read-only or not—which is useful if you're dealing with data for an immutable object like bytes. But the door is open to add that feature if it's needed.
Typing improvementsPython's type-hinting syntax, added in Python 3.5, allows linting tools to catch a wide variety of errors ahead of time. With each new version, typing in Python gains features to cover a broader and more granular range of use cases.
TypedDictIn Python 3.12, you can use a TypedDict as source of types to hint keyword arguments used in a function. The Unpack variadic generic, introduced in version 3.11, is used for this. Here's an example from the relevant PEP:
class Movie(TypedDict): name: str year: int def foo(**kwargs: Unpack[Movie]) -> None: ...Here, foo can take in keyword arguments of names and types that match the contents of Movie—name:str and year:int. One scenario where this is useful is type-hinting functions that take optional keyword-only arguments with no default values.
Type parameter syntaxThe type parameter syntax provides a cleaner way to specify types in a generic class, function, or type alias. Here's an example taken from the PEP:
# the old method from typing import TypeVar _T = TypeVar("_T") def func(a: _T, b: _T) -> _T: ... # the new type parameter method def func[T](a: T, b: T) -> T: ...With the new method, one doesn't need to import TypeVar. One can just use the func[T] syntax to indicate generic type references. It's also possible to specify type bounds, such as whether a given type is one of a group of types, although such types can't themselves be generic. An example is func[T: (str,int)].
Finally, the new @override decorator can be used to flag methods that override methods in a parent, as a way to ensure any changes made to the parent during refactoring (renaming or deleting) also are reflected in its children.
Performance improvementsWith Python 3.11, a number of allied projects got underway to improve Python's performance by leaps and bounds with each new version. The performance improvements in Python 3.12 aren't as dramatic, but they're still noteworthy.
Comprehension inliningComprehensions, a syntax that lets you quickly construct lists, dictionaries, and sets, are now constructed "inline" rather than by way of temporary objects. The speedup for this has been clocked at around 11% for a real-world case and up to twice as fast for a micro-benchmark.
Immortal objectsEvery object in Python has a reference count that tracks how many times other objects refer to it, including built-in objects like None. PEP 683 allows objects to be treated as "immortal," so that they never have their reference count changed.
Making objects immortal has other powerful implications for Python in the long run. It makes it easier to implement multicore scaling, and to implement other optimizations (like avoiding copy-on-write) that would have been hard to implement before.
Smaller object sizesWith earlier versions of Python, the base size of an object was 208 bytes. Objects have been refactored multiple times over the last few versions of Python to make them smaller, which doesn't just allow more objects to live in memory but helps with cache locality. As of Python 3.12, the base size of an object is now 96 bytes—less than half of what it used to be.
SubinterpretersA long-awaited feature for Python is subinterpreters—the ability to have multiple instances of an interpreter, each with its own GIL, running side-by-side within a single Python process. This would be a big step toward better parallelism in Python.
However, version 3.12 only includes the CPython internals to make this possible. There's still no end-user interface to subinterpreters. A standard library module, interpreters, is intended to do this, but it's now slated to appear in Python 3.13.
Additional changesPython 3.12 rolls out countless other little changes in addition to the big ones discussed so far. Here's a quick look.
Unstable APIA key ongoing project has been the refactoring of CPython's internals, especially its API sets, so that fewer of CPython's low-level functions need to be exposed. Python 3.12 introduced the unstable API tier, an API set marked specifically as being likely to change between versions. It's not intended to be used by most C extensions, but by low-level tools such as debuggers or JIT compilers.
Standard library deprecations and removalsWith version 3.11, a number of standard library modules long known to be obsolete (so-called dead batteries) got flagged for removal as of Python 3.12 and 3.13. In version 3.12, one of the biggest removals was distutils, which has long been obviated by setuptools. Other modules removed in this version were asynchat, asyncore (both replaced by asyncio), and smtpd.
Garbage collectionPython's garbage collection mechanism (GC) used to be able to run whenever an object was allocated. As of Python 3.12, the GC runs only on the "eval breaker" mechanism in the Python bytecode loop—that is, between executing one bytecode and another. It also runs whenever CPython's signal-handler-checking mechanism is invoked. This makes it possible to run GC periodically on a long-running call to a C extension outside the runtime.
Copyright © 2023 IDG Communications, Inc.
Comments
Post a Comment