Eingaben
GraphRAG unterstützt mehrere Eingabeformate, um die Aufnahme Ihrer Daten zu vereinfachen. Die Mechanik und die für Eingabedateien und Text-Chunking verfügbaren Funktionen werden hier besprochen.
Input Loading and Schema
Alle Eingabeformate werden innerhalb von GraphRAG geladen und als documents DataFrame an die Indizierungspipeline übergeben. Dieser DataFrame hat eine Zeile für jedes Dokument, das ein gemeinsames Spaltenschema verwendet.
| name | type | description |
|---|---|---|
| id | str | ID des Dokuments. Diese wird mithilfe eines Hashs des Textinhalts generiert, um die Stabilität über verschiedene Durchläufe hinweg zu gewährleisten. |
| text | str | Der vollständige Text des Dokuments. |
| title | str | Name des Dokuments. Einige Formate erlauben die Konfiguration dieses Werts. |
| creation_date | str | Das Erstellungsdatum des Dokuments, dargestellt als ISO8601-String. Dies wird aus dem Quell-Dateisystem extrahiert. |
| metadata | dict | Optionale zusätzliche Dokumentenmetadaten. Mehr Details unten. |
Siehe auch die Dokumentation zu den Ausgaben für das Schema der finalen Dokumententabelle, die nach Abschluss der Pipeline in Parquet gespeichert wird.
Bring-your-own DataFrame
Ab Version 2.6.0 ermöglicht die Indizierungs-API-Methode von GraphRAG, Ihren eigenen pandas DataFrame zu übergeben und die gesamte Eingabeladung/-analyse, die im nächsten Abschnitt beschrieben wird, zu umgehen. Dies ist praktisch, wenn Ihr Inhalt in einem Format oder Speicherort vorliegt, den wir nicht sofort unterstützen. Sie müssen sicherstellen, dass Ihr Eingabe-DataFrame dem oben beschriebenen Schema entspricht. Alle später beschriebenen Chunking-Verhaltensweisen werden exakt gleich ablaufen.
Formats
Wir unterstützen standardmäßig drei Dateiformate. Dies deckt die überwiegende Mehrheit der von uns angetroffenen Anwendungsfälle ab. Wenn Sie ein anderes Format haben, empfehlen wir, ein Skript zu schreiben, um es in eines dieser Formate zu konvertieren, da diese weit verbreitet sind und von vielen Werkzeugen und Bibliotheken unterstützt werden.
Plain Text
Plain-Text-Dateien (normalerweise mit der Dateierweiterung .txt). Bei Plain-Text-Dateien importieren wir den gesamten Dateiinhalt als text-Feld, und der title ist immer der Dateiname.
Comma-delimited
CSV-Dateien (normalerweise mit der Endung .csv). Diese werden mit der read_csv-Methode von pandas mit Standardoptionen geladen. Jede Zeile in einer CSV-Datei wird als einzelnes Dokument behandelt. Wenn Sie mehrere CSV-Dateien in Ihrem Eingabeordner haben, werden diese zu einem einzigen resultierenden documents DataFrame verkettet.
Mit dem CSV-Format können Sie die Spalte text_column und title_column konfigurieren, wenn Ihre Daten strukturierte Inhalte haben, die Sie bevorzugen. Wenn Sie diese nicht im input-Block Ihrer settings.yaml konfigurieren, ist der Titel der Dateiname, wie im obigen Schema beschrieben. Die text_column wird als "text" in Ihrer Datei angenommen, wenn sie nicht speziell konfiguriert ist. Wir suchen auch nach einer "id"-Spalte und verwenden diese, falls vorhanden, andernfalls wird die ID wie oben beschrieben generiert.
JSON
JSON-Dateien (normalerweise mit der Endung .json) enthalten strukturierte Objekte. Diese werden mit der json.loads-Methode von Python geladen, daher müssen Ihre Dateien ordnungsgemäß konform sein. JSON-Dateien können ein einzelnes Objekt in der Datei enthalten *oder* die Datei kann ein Array von Objekten an der Wurzel enthalten. Wir prüfen und handhaben beide Fälle. Wie bei CSV werden mehrere Dateien zu einer finalen Tabelle verkettet, und die Konfigurationsoptionen text_column und title_column werden auf die Eigenschaften jedes geladenen Objekts angewendet. Beachten Sie, dass das spezialisierte jsonl-Format, das von einigen Bibliotheken erzeugt wird (ein vollständiges JSON-Objekt pro Zeile, nicht in einem Array), derzeit nicht unterstützt wird.
Metadata
Bei den strukturierten Dateiformaten (CSV und JSON) können Sie beliebig viele Spalten konfigurieren, die einem gespeicherten metadata-Feld im DataFrame hinzugefügt werden. Dies wird konfiguriert, indem eine Liste von Spaltennamen angegeben wird, die gesammelt werden sollen. Wenn dies konfiguriert ist, enthält die Ausgabe-metadata-Spalte ein Wörterbuch mit einem Schlüssel für jede Spalte und dem Wert der Spalte für dieses Dokument. Diese Metadaten können später in der GraphRAG-Pipeline optional verwendet werden.
Beispiel
software.csv
text,title,tag
My first program,Hello World,tutorial
An early space shooter game,Space Invaders,arcade
settings.yaml
Documents DataFrame
| id | title | text | creation_date | metadata |
|---|---|---|---|---|
| (generated from text) | Hello World | My first program | (create date of software.csv) | { "title": "Hello World", "tag": "tutorial" } |
| (generated from text) | Space Invaders | An early space shooter game | (create date of software.csv) | { "title": "Space Invaders", "tag": "arcade" } |
Chunking and Metadata
Wie auf der Seite zum Standard-Datenfluss beschrieben, werden Dokumente für die Verarbeitung in kleinere "Text Units" gechukt. Dies geschieht, weil die Größe des Dokumenteninhalts oft das verfügbare Kontextfenster für ein bestimmtes Sprachmodell überschreitet. Es gibt eine Handvoll Einstellungen, die Sie für dieses Chunking anpassen können, am relevantesten sind chunk_size und overlap. Wir unterstützen jetzt auch ein Metadatenverarbeitungsschema, das die Indizierungsergebnisse für einige Anwendungsfälle verbessern kann. Wir werden diese Funktion hier im Detail beschreiben.
Stellen Sie sich folgendes Szenario vor: Sie indizieren eine Sammlung von Nachrichtenartikeln. Jeder Artikeltext beginnt mit einer Überschrift und einem Autor und fährt dann mit dem Inhalt fort. Wenn Dokumente gechukt werden, werden sie gleichmäßig gemäß Ihrer konfigurierten Chunk-Größe aufgeteilt. Mit anderen Worten, die ersten n Token werden in eine Texteinheit gelesen, dann die nächsten n, bis zum Ende des Inhalts. Das bedeutet, dass der vordere Teil am Anfang des Dokuments (wie die Überschrift und der Autor in diesem Beispiel) nicht in jeden Chunk kopiert wird. Er existiert nur im ersten Chunk. Wenn wir diese Chunks später zur Zusammenfassung abrufen, fehlen möglicherweise gemeinsame Informationen über das Quelldokument, die dem Modell immer zur Verfügung gestellt werden sollten. Wir haben Konfigurationsoptionen, um wiederholte Inhalte in jede Texteinheit zu kopieren, um dieses Problem zu lösen.
Input Config
Wie oben beschrieben, können Sie beim Importieren von Dokumenten eine Liste von metadata-Spalten angeben, die mit jeder Zeile einbezogen werden sollen. Dies muss konfiguriert werden, damit das Kopieren pro Chunk funktioniert.
Chunking Config
Als Nächstes muss der chunks-Block dem Chunker mitteilen, wie mit diesen Metadaten beim Erstellen von Texteinheiten umgegangen werden soll. Standardmäßig werden sie ignoriert. Wir haben zwei Einstellungen, um sie einzubeziehen:
prepend_metadata. Dies weist den Importeur an, den Inhalt dermetadata-Spalte für jede Zeile am Anfang jeder einzelnen Texteinheit zu kopieren. Diese Metadaten werden als Schlüssel:Wert-Paare auf neuen Zeilen kopiert.chunk_size_includes_metadata: Dies teilt dem Chunker mit, wie die Chunk-Größe bei Einbeziehung von Metadaten berechnet werden soll. Standardmäßig erstellen wir die Texteinheiten unter Verwendung Ihrer angegebenenchunk_size, und dann fügen wir die Metadaten hinzu. Dies bedeutet, dass die endgültigen Längen der Texteinheiten länger als Ihre konfiguriertechunk_sizesein können und je nach Länge der Metadaten für jedes Dokument variieren. Wenn diese EinstellungTrueist, berechnen wir den Rohtext unter Berücksichtigung des Rests nach der Messung der Metadatenlänge, sodass die resultierenden Texteinheiten immer Ihrer konfiguriertenchunk_sizeentsprechen.
Examples
Die folgenden Beispiele sollen verdeutlichen, wie die Chunking-Konfiguration und das Voranstellen von Metadaten für jedes Dateiformat funktionieren. Beachten Sie, dass wir hier die Wortanzahl als "Token" für die Illustration verwenden, aber Sprachmodell-Token sind nicht gleichbedeutend mit Wörtern.
Text files
Dieses Beispiel verwendet zwei einzelne Nachrichtenartikel-Textdateien.
--
File: US to lift most federal COVID-19 vaccine mandates.txt
Content
WASHINGTON (AP) The Biden administration will end most of the last remaining federal COVID-19 vaccine requirements next week when the national public health emergency for the coronavirus ends, the White House said Monday. Vaccine requirements for federal workers and federal contractors, as well as foreign air travelers to the U.S., will end May 11. The government is also beginning the process of lifting shot requirements for Head Start educators, healthcare workers, and noncitizens at U.S. land borders. The requirements are among the last vestiges of some of the more coercive measures taken by the federal government to promote vaccination as the deadly virus raged, and their end marks the latest display of how President Joe Biden's administration is moving to treat COVID-19 as a routine, endemic illness. "While I believe that these vaccine mandates had a tremendous beneficial impact, we are now at a point where we think that it makes a lot of sense to pull these requirements down," White House COVID-19 coordinator Dr. Ashish Jha told The Associated Press on Monday.
--
File: NY lawmakers begin debating budget 1 month after due date.txt
Content
ALBANY, N.Y. (AP) New York lawmakers began voting Monday on a $229 billion state budget due a month ago that would raise the minimum wage, crack down on illicit pot shops and ban gas stoves and furnaces in new buildings. Negotiations among Gov. Kathy Hochul and her fellow Democrats in control of the Legislature dragged on past the April 1 budget deadline, largely because of disagreements over changes to the bail law and other policy proposals included in the spending plan. Floor debates on some budget bills began Monday. State Senate Majority Leader Andrea Stewart-Cousins said she expected voting to be wrapped up Tuesday for a budget she said contains "significant wins" for New Yorkers. "I would have liked to have done this sooner. I think we would all agree to that," Cousins told reporters before voting began. "This has been a very policy-laden budget and a lot of the policies had to parsed through." Hochul was able to push through a change to the bail law that will eliminate the standard that requires judges to prescribe the "least restrictive" means to ensure defendants return to court. Hochul said judges needed the extra discretion. Some liberal lawmakers argued that it would undercut the sweeping bail reforms approved in 2019 and result in more people with low incomes and people of color in pretrial detention. Here are some other policy provisions that will be included in the budget, according to state officials. The minimum wage would be raised to $17 in New York City and some of its suburbs and $16 in the rest of the state by 2026. That's up from $15 in the city and $14.20 upstate.
--
settings.yaml
input:
file_type: text
metadata: [title]
chunks:
size: 100
overlap: 0
prepend_metadata: true
chunk_size_includes_metadata: false
Documents DataFrame
| id | title | text | creation_date | metadata |
|---|---|---|---|---|
| (generated from text) | US to lift most federal COVID-19 vaccine mandates.txt | (full content of text file) | (create date of article txt file) | { "title": "US to lift most federal COVID-19 vaccine mandates.txt" } |
| (generated from text) | NY lawmakers begin debating budget 1 month after due date.txt | (full content of text file) | (create date of article txt file) | { "title": "NY lawmakers begin debating budget 1 month after due date.txt" } |
Raw Text Chunks
| content | length |
|---|---|
| title: US to lift most federal COVID-19 vaccine mandates.txt WASHINGTON (AP) The Biden administration will end most of the last remaining federal COVID-19 vaccine requirements next week when the national public health emergency for the coronavirus ends, the White House said Monday. Vaccine requirements for federal workers and federal contractors, as well as foreign air travelers to the U.S., will end May 11. The government is also beginning the process of lifting shot requirements for Head Start educators, healthcare workers, and noncitizens at U.S. land borders. The requirements are among the last vestiges of some of the more coercive measures taken by the federal government to promote vaccination as |
109 |
| title: US to lift most federal COVID-19 vaccine mandates.txt the deadly virus raged, and their end marks the latest display of how President Joe Biden's administration is moving to treat COVID-19 as a routine, endemic illness. "While I believe that these vaccine mandates had a tremendous beneficial impact, we are now at a point where we think that it makes a lot of sense to pull these requirements down," White House COVID-19 coordinator Dr. Ashish Jha told The Associated Press on Monday. |
82 |
| title: NY lawmakers begin debating budget 1 month after due date.txt ALBANY, N.Y. (AP) New York lawmakers began voting Monday on a $229 billion state budget due a month ago that would raise the minimum wage, crack down on illicit pot shops and ban gas stoves and furnaces in new buildings. Negotiations among Gov. Kathy Hochul and her fellow Democrats in control of the Legislature dragged on past the April 1 budget deadline, largely because of disagreements over changes to the bail law and other policy proposals included in the spending plan. Floor debates on some budget bills began Monday. State Senate Majority Leader Andrea Stewart-Cousins said she expected voting to |
111 |
| title: NY lawmakers begin debating budget 1 month after due date.txt be wrapped up Tuesday for a budget she said contains "significant wins" for New Yorkers. "I would have liked to have done this sooner. I think we would all agree to that," Cousins told reporters before voting began. "This has been a very policy-laden budget and a lot of the policies had to parsed through." Hochul was able to push through a change to the bail law that will eliminate the standard that requires judges to prescribe the "least restrictive" means to ensure defendants return to court. Hochul said judges needed the extra discretion. Some liberal lawmakers argued that it |
111 |
| title: NY lawmakers begin debating budget 1 month after due date.txt would undercut the sweeping bail reforms approved in 2019 and result in more people with low incomes and people of color in pretrial detention. Here are some other policy provisions that will be included in the budget, according to state officials. The minimum wage would be raised to $17 in New York City and some of its suburbs and $16 in the rest of the state by 2026. That's up from $15 in the city and $14.20 upstate. |
89 |
In diesem Beispiel sehen wir, dass die beiden Eingabedokumente in fünf Ausgabe-Text-Chunks aufgeteilt wurden. Der Titel (Dateiname) jedes Dokuments wird vorangestellt, ist aber nicht in die berechnete Chunk-Größe einbezogen. Beachten Sie auch, dass der letzte Text-Chunk für jedes Dokument normalerweise kleiner ist, da er die letzten Token enthält.
CSV files
Dieses Beispiel verwendet eine einzige CSV-Datei mit denselben beiden Artikeln als Zeilen (beachten Sie, dass der Textinhalt für die tatsächliche CSV-Verwendung nicht korrekt maskiert ist).
--
File: articles.csv
Content
headline,article
US to lift most federal COVID-19 vaccine mandates,WASHINGTON (AP) The Biden administration will end most of the last remaining federal COVID-19 vaccine requirements next week when the national public health emergency for the coronavirus ends, the White House said Monday. Vaccine requirements for federal workers and federal contractors, as well as foreign air travelers to the U.S., will end May 11. The government is also beginning the process of lifting shot requirements for Head Start educators, healthcare workers, and noncitizens at U.S. land borders. The requirements are among the last vestiges of some of the more coercive measures taken by the federal government to promote vaccination as the deadly virus raged, and their end marks the latest display of how President Joe Biden's administration is moving to treat COVID-19 as a routine, endemic illness. "While I believe that these vaccine mandates had a tremendous beneficial impact, we are now at a point where we think that it makes a lot of sense to pull these requirements down," White House COVID-19 coordinator Dr. Ashish Jha told The Associated Press on Monday.
NY lawmakers begin debating budget 1 month after due date,ALBANY, N.Y. (AP) New York lawmakers began voting Monday on a $229 billion state budget due a month ago that would raise the minimum wage, crack down on illicit pot shops and ban gas stoves and furnaces in new buildings. Negotiations among Gov. Kathy Hochul and her fellow Democrats in control of the Legislature dragged on past the April 1 budget deadline, largely because of disagreements over changes to the bail law and other policy proposals included in the spending plan. Floor debates on some budget bills began Monday. State Senate Majority Leader Andrea Stewart-Cousins said she expected voting to be wrapped up Tuesday for a budget she said contains "significant wins" for New Yorkers. "I would have liked to have done this sooner. I think we would all agree to that," Cousins told reporters before voting began. "This has been a very policy-laden budget and a lot of the policies had to parsed through." Hochul was able to push through a change to the bail law that will eliminate the standard that requires judges to prescribe the "least restrictive" means to ensure defendants return to court. Hochul said judges needed the extra discretion. Some liberal lawmakers argued that it would undercut the sweeping bail reforms approved in 2019 and result in more people with low incomes and people of color in pretrial detention. Here are some other policy provisions that will be included in the budget, according to state officials. The minimum wage would be raised to $17 in New York City and some of its suburbs and $16 in the rest of the state by 2026. That's up from $15 in the city and $14.20 upstate.
--
settings.yaml
input:
file_type: csv
title_column: headline
text_column: article
metadata: [headline]
chunks:
size: 50
overlap: 5
prepend_metadata: true
chunk_size_includes_metadata: true
Documents DataFrame
| id | title | text | creation_date | metadata |
|---|---|---|---|---|
| (generated from text) | US to lift most federal COVID-19 vaccine mandates | (article column content) | (create date of articles.csv) | { "headline": "US to lift most federal COVID-19 vaccine mandates" } |
| (generated from text) | NY lawmakers begin debating budget 1 month after due date | (article column content) | (create date of articles.csv) | { "headline": "NY lawmakers begin debating budget 1 month after due date" } |
Raw Text Chunks
| content | length |
|---|---|
| title: US to lift most federal COVID-19 vaccine mandates WASHINGTON (AP) The Biden administration will end most of the last remaining federal COVID-19 vaccine requirements next week when the national public health emergency for the coronavirus ends, the White House said Monday. Vaccine requirements for federal workers and federal contractors, |
50 |
| title: US to lift most federal COVID-19 vaccine mandates federal workers and federal contractors as well as foreign air travelers to the U.S., will end May 11. The government is also beginning the process of lifting shot requirements for Head Start educators, healthcare workers, and noncitizens at U.S. land borders. |
50 |
| title: US to lift most federal COVID-19 vaccine mandates noncitizens at U.S. land borders. The requirements are among the last vestiges of some of the more coercive measures taken by the federal government to promote vaccination as the deadly virus raged, and their end marks the latest display of how |
50 |
| title: US to lift most federal COVID-19 vaccine mandates the latest display of how President Joe Biden's administration is moving to treat COVID-19 as a routine, endemic illness. "While I believe that these vaccine mandates had a tremendous beneficial impact, we are now at a point where we think that |
50 |
| title: US to lift most federal COVID-19 vaccine mandates point where we think that it makes a lot of sense to pull these requirements down," White House COVID-19 coordinator Dr. Ashish Jha told The Associated Press on Monday. |
38 |
| title: NY lawmakers begin debating budget 1 month after due date ALBANY, N.Y. (AP) New York lawmakers began voting Monday on a $229 billion state budget due a month ago that would raise the minimum wage, crack down on illicit pot shops and ban gas stoves and furnaces in new |
50 |
| title: NY lawmakers begin debating budget 1 month after due date stoves and furnaces in new buildings. Negotiations among Gov. Kathy Hochul and her fellow Democrats in control of the Legislature dragged on past the April 1 budget deadline, largely because of disagreements over changes to the bail law and |
50 |
| title: NY lawmakers begin debating budget 1 month after due date to the bail law and other policy proposals included in the spending plan. Floor debates on some budget bills began Monday. State Senate Majority Leader Andrea Stewart-Cousins said she expected voting to be wrapped up Tuesday for a budget |
50 |
| title: NY lawmakers begin debating budget 1 month after due date up Tuesday for a budget she said contains "significant wins" for New Yorkers. "I would have liked to have done this sooner. I think we would all agree to that," Cousins told reporters before voting began. "This has been |
50 |
| title: NY lawmakers begin debating budget 1 month after due date voting began. "This has been a very policy-laden budget and a lot of the policies had to parsed through." Hochul was able to push through a change to the bail law that will eliminate the standard that requires judges |
50 |
| title: NY lawmakers begin debating budget 1 month after due date the standard that requires judges to prescribe the "least restrictive" means to ensure defendants return to court. Hochul said judges needed the extra discretion. Some liberal lawmakers argued that it would undercut the sweeping bail reforms approved in 2019 |
50 |
| title: NY lawmakers begin debating budget 1 month after due date bail reforms approved in 2019 and result in more people with low incomes and people of color in pretrial detention. Here are some other policy provisions that will be included in the budget, according to state officials. The minimum |
50 |
| title: NY lawmakers begin debating budget 1 month after due date to state officials. The minimum wage would be raised to $17 in be raised to $17 in New York City and some of its suburbs and $16 in the rest of the state by 2026. That's up from $15 |
50 |
| title: NY lawmakers begin debating budget 1 month after due date 2026. That's up from $15 in the city and $14.20 upstate. |
22 |
In diesem Beispiel sehen wir, dass die beiden Eingabedokumente in vierzehn Ausgabe-Text-Chunks aufgeteilt wurden. Der Titel (Überschrift) jedes Dokuments wird vorangestellt und ist in die berechnete Chunk-Größe einbezogen, sodass jeder Chunk der konfigurierten Chunk-Größe entspricht (außer dem letzten für jedes Dokument). Wir haben auch eine Überlappung in diesen Text-Chunks konfiguriert, sodass die letzten fünf Token geteilt werden. Warum sollten Sie Überlappung in Ihren Text-Chunks verwenden? Betrachten Sie, dass beim Aufteilen von Dokumenten basierend auf Token hochwahrscheinlich Sätze oder sogar verwandte Konzepte in separate Chunks aufgeteilt werden. Jeder Text-Chunk wird separat vom Sprachmodell verarbeitet, was zu unvollständigen "Ideen" an den Grenzen des Chunks führen kann. Überlappung stellt sicher, dass diese geteilten Konzepte mindestens in einem der Chunks vollständig enthalten sind.
JSON files
Dieses letzte Beispiel verwendet eine JSON-Datei für jeden der beiden gleichen Artikel. In diesem Beispiel legen wir die Objektfelder zum Lesen fest, fügen aber keine Metadaten zu den Text-Chunks hinzu.
--
File: article1.json
Content
{
"headline": "US to lift most federal COVID-19 vaccine mandates",
"content": "WASHINGTON (AP) The Biden administration will end most of the last remaining federal COVID-19 vaccine requirements next week when the national public health emergency for the coronavirus ends, the White House said Monday. Vaccine requirements for federal workers and federal contractors, as well as foreign air travelers to the U.S., will end May 11. The government is also beginning the process of lifting shot requirements for Head Start educators, healthcare workers, and noncitizens at U.S. land borders. The requirements are among the last vestiges of some of the more coercive measures taken by the federal government to promote vaccination as the deadly virus raged, and their end marks the latest display of how President Joe Biden's administration is moving to treat COVID-19 as a routine, endemic illness. "While I believe that these vaccine mandates had a tremendous beneficial impact, we are now at a point where we think that it makes a lot of sense to pull these requirements down," White House COVID-19 coordinator Dr. Ashish Jha told The Associated Press on Monday."
}
File: article2.json
Content
{
"headline": "NY lawmakers begin debating budget 1 month after due date",
"content": "ALBANY, N.Y. (AP) New York lawmakers began voting Monday on a $229 billion state budget due a month ago that would raise the minimum wage, crack down on illicit pot shops and ban gas stoves and furnaces in new buildings. Negotiations among Gov. Kathy Hochul and her fellow Democrats in control of the Legislature dragged on past the April 1 budget deadline, largely because of disagreements over changes to the bail law and other policy proposals included in the spending plan. Floor debates on some budget bills began Monday. State Senate Majority Leader Andrea Stewart-Cousins said she expected voting to be wrapped up Tuesday for a budget she said contains "significant wins" for New Yorkers. "I would have liked to have done this sooner. I think we would all agree to that," Cousins told reporters before voting began. "This has been a very policy-laden budget and a lot of the policies had to parsed through." Hochul was able to push through a change to the bail law that will eliminate the standard that requires judges to prescribe the "least restrictive" means to ensure defendants return to court. Hochul said judges needed the extra discretion. Some liberal lawmakers argued that it would undercut the sweeping bail reforms approved in 2019 and result in more people with low incomes and people of color in pretrial detention. Here are some other policy provisions that will be included in the budget, according to state officials. The minimum wage would be raised to $17 in New York City and some of its suburbs and $16 in the rest of the state by 2026. That's up from $15 in the city and $14.20 upstate."
}
--
settings.yaml
Documents DataFrame
| id | title | text | creation_date | metadata |
|---|---|---|---|---|
| (generated from text) | US to lift most federal COVID-19 vaccine mandates | (article column content) | (create date of article1.json) | { } |
| (generated from text) | NY lawmakers begin debating budget 1 month after due date | (article column content) | (create date of article2.json) | { } |
Raw Text Chunks
| content | length |
|---|---|
| WASHINGTON (AP) The Biden administration will end most of the last remaining federal COVID-19 vaccine requirements next week when the national public health emergency for the coronavirus ends, the White House said Monday. Vaccine requirements for federal workers and federal contractors, as well as foreign air travelers to the U.S., will end May 11. The government is also beginning the process of lifting shot requirements for Head Start educators, healthcare workers, and noncitizens at U.S. land borders. The requirements are among the last vestiges of some of the more coercive measures taken by the federal government to promote vaccination as | 100 |
| measures taken by the federal government to promote vaccination as the deadly virus raged, and their end marks the latest display of how President Joe Biden's administration is moving to treat COVID-19 as a routine, endemic illness. "While I believe that these vaccine mandates had a tremendous beneficial impact, we are now at a point where we think that it makes a lot of sense to pull these requirements down," White House COVID-19 coordinator Dr. Ashish Jha told The Associated Press on Monday. | 83 |
| ALBANY, N.Y. (AP) New York lawmakers began voting Monday on a $229 billion state budget due a month ago that would raise the minimum wage, crack down on illicit pot shops and ban gas stoves and furnaces in new buildings. Negotiations among Gov. Kathy Hochul and her fellow Democrats in control of the Legislature dragged on past the April 1 budget deadline, largely because of disagreements over changes to the bail law and other policy proposals included in the spending plan. Floor debates on some budget bills began Monday. State Senate Majority Leader Andrea Stewart-Cousins said she expected voting to | 100 |
| Senate Majority Leader Andrea Stewart-Cousins said she expected voting to be wrapped up Tuesday for a budget she said contains "significant wins" for New Yorkers. "I would have liked to have done this sooner. I think we would all agree to that," Cousins told reporters before voting began. "This has been a very policy-laden budget and a lot of the policies had to parsed through." Hochul was able to push through a change to the bail law that will eliminate the standard that requires judges to prescribe the "least restrictive" means to ensure defendants return to court. Hochul said judges | 100 |
| means to ensure defendants return to court. Hochul said judges needed the extra discretion. Some liberal lawmakers argued that it would undercut the sweeping bail reforms approved in 2019 and result in more people with low incomes and people of color in pretrial detention. Here are some other policy provisions that will be included in the budget, according to state officials. The minimum wage would be raised to $17 in New York City and some of its suburbs and $16 in the rest of the state by 2026. That's up from $15 in the city and $14.20 upstate. | 98 |
In diesem Beispiel wurden die beiden Eingabedokumente in fünf Ausgabe-Text-Chunks aufgeteilt. Es werden keine Metadaten vorangestellt, daher entspricht jeder Chunk der konfigurierten Chunk-Größe (außer dem letzten für jedes Dokument). Wir haben auch eine Überlappung in diesen Text-Chunks konfiguriert, sodass die letzten zehn Token geteilt werden.