Evahan

BERT 4EVER@EvaHan 2022: Ancient Chinese Word Segmentation and Part
The First International Ancient Chinese Word Segmentation and POS Tagging Bakeoff: Overview of the EvaHan 2022 Evaluation Campaign
Vahan Info
eVAHAN – New age automobile maintenance startup
Glyph Features Matter: A Multimodal Solution for EvaHan in LT4HALA2022
GitHub

BERT 4EVER@EvaHan 2022: Ancient Chinese Word Segmentation and Part

Abstract With the development of artificial intelligence (AI) and digital humanities, ancient Chinese resources and language technology have also developed and grown, which have become an increasingly important part to the study of historiography and traditional Chinese culture. In order to promote the research on automatic analysis technology of ancient Chinese, we conduct various experiments on ancient Chinese word segmentation and part-of-speech (POS) tagging tasks for the EvaHan 2022 shared task. We model the word segmentation and POS tagging tasks jointly as a sequence tagging problem. In addition, we perform a series of training strategies based on the provided ancient Chinese pre-trained model to enhance the model performance. Concretely, we employ several augmentation strategies, including continual pre-training, adversarial training, and ensemble learning to alleviate the limited amount of training data and the imbalance between POS labels. Extensive experiments demonstrate that our proposed models achieve considerable performance on ancient Chinese word segmentation and POS tagging tasks. Keywords: ancient Chinese, word segmentation, part-of-speech tagging, adversarial learning, continuing pre-training Anthology ID: 2022.lt4hala-1.22 Volume: Month: June Year: 2022 Address: Marseille, France Venue: SIG: Publisher: European Language Resources Association Note: Pages: 150–154 Language: URL: DOI: Bibkey: zhang-etal-2022-bert Cite (ACL): Hailin Zhang, Ziyu Yang, Yingw...

The First International Ancient Chinese Word Segmentation and POS Tagging Bakeoff: Overview of the EvaHan 2022 Evaluation Campaign

Abstract This paper presents the results of the First Ancient Chinese Word Segmentation and POS Tagging Bakeoff (EvaHan), which was held at the Second Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) 2022, in the context of the 13th Edition of the Language Resources and Evaluation Conference (LREC 2022). We give the motivation for having an international shared contest, as well as the data and tracks. The contest is consisted of two modalities, closed and open. In the closed modality, the participants are only allowed to use the training data, obtained the highest F1 score of 96.03% and 92.05% in word segmentation and POS tagging. In the open modality, the participants can use whatever resource they have, with the highest F1 score of 96.34% and 92.56% in word segmentation and POS tagging. The scores on the blind test dataset decrease around 3 points, which shows that the out-of-vocabulary words still are the bottleneck for lexical analyzers. Anthology ID: 2022.lt4hala-1.19 Volume: Month: June Year: 2022 Address: Marseille, France Venue: SIG: Publisher: European Language Resources Association Note: Pages: 135–140 Language: URL: DOI: Bibkey: li-etal-2022-first Cite (ACL): Bin Li, Yiguo Yuan, Jingya Lu, Minxuan Feng, Chao Xu, Weiguang Qu, and Dongbo Wang. 2022. Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages, pages 135–140, Marseille, France. European Language Resources Association. Cite (Informa...

Vahan Info

Vahan Info is the online portal to find out the RTO vehicle information online. Vahan info is very easy and quick online service to find vehicle owner or vehicle RTO or vehicle registration status or search details and rto info of any vehicle by number plate or registration number. Vahan info portal gives you the complete details of any vehicle which is registered in India. The RTO Database provides you all the necessary details of any vehicle in a single click. Vahan info also provides the Driving license details with just one click. You can also verify your all license details online by using Vahan info portal. Vahan info portal provides online petrol and diesel price also. Here on vahan info you can also give driving license test for free. You can practise learning driving license test online for free. Also we have list of all rto over all india. What is a Vehicle Number Plate? Vehicle number plates are nothing but a combination of digits and alphabets that give your vehicle a distinct identity and register you with the Regional Transport Office or RTO. The Motor Vehicles Act, 1988 makes it compulsory for any vehicle owner to get his/her vehicle registered as per the number plate rules and install a number plate on the rear and front of the vehicle. An unregistered plate without any number in violation of the number plate rules can attract a heavy fine. Best Family Scooter/Scooty Bikes Breakdown of Number plate of vehicle/vahan - eg. AB06CD 1234 AB : The first part indi...

eVAHAN – New age automobile maintenance startup

eVahan® – a registered brand of VKARE Retail Ventures Pvt Ltd – is a new-age automobile maintenance start-up. Working with a vision to transform the unorganized vehicle maintenance market using new-age technology, eVahan delivers top-notch services ranging from vehicle servicing to the renewal of mandatory vehicle certificates and providing expert driving lessons at unbelievably affordable prices. Essentially, at eVahan, we help our customers to securely maintain digital copies of their driving licence, vehicle registration certificate, vehicle insurance policy, PUC certificate, and vehicle maintenance record with warranty certificates of spare parts, etc. Through eVahan, the customers find it easy to locate the nearest centers, book vehicle service appointments/ document renewal exercise through our app and website. Not only that, if your vehicle breaks down on the highway, we are only a call away from reaching you with all the required assistance. Broadly, our technologically updated centers and workshops provide the below-mentioned services eVAHAN®️, is inviting applications to own and operate eVAHAN Suvidha Kendra, Online pollution check centres approved by the Transport Department, Government of Punjab on a 50-50 partnership and revenue sharing model. These centres will be placed at petrol pumps and shops in high footfall areas, these centres will be allotted Pincode wise on first come first basis. The eligible candidates will be considered for allotment of eVAHAN Suv...

Glyph Features Matter: A Multimodal Solution for EvaHan in LT4HALA2022

Abstract We participate in the LT4HALA2022 shared task EvaHan. This task has two subtasks. Subtask 1 is word segmentation, and subtask 2 is part-of-speech tagging. Each subtask consists of two tracks, a close track that can only use the data and models provided by the organizer, and an open track without restrictions. We employ three pre-trained models, two of which are open-source pre-trained models for ancient Chinese (Siku-Roberta and roberta-classical-chinese), and one is our pre-trained GlyphBERT combined with glyph features. Our methods include data augmentation, data pre-processing, model pretraining, downstream fine-tuning, k-fold cross validation and model ensemble. We achieve competitive P, R, and F1 scores on both our own validation set and the final public test set. Anthology ID: 2022.lt4hala-1.28 Volume: Month: June Year: 2022 Address: Marseille, France Venue: SIG: Publisher: European Language Resources Association Note: Pages: 178–182 Language: URL: DOI: Bibkey: xinyuan-etal-2022-glyph Cite (ACL): Wei Xinyuan, Liu Weihao, Qing Zong, Zhang Shaoqing, and Baotian Hu. 2022. Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages, pages 178–182, Marseille, France. European Language Resources Association. Cite (Informal): Copy Citation: BibTeX Markdown MODS XML Endnote More options… PDF: Glyph Features Matter: A Multimodal Solution for EvaHan in LT4HALA2022WeiXinyuanauthorLiuWeihaoauthorQingZongauthorZhangShaoqingauthorBaoti...

GitHub

[TOC] 中文版引言数字人文研究需要大规模语料库和高性能古文自然语言处理工具支持。预训练语言模型已经在英语和现代汉语文本上极大的提升了文本挖掘的精度，目前亟需专门面向古文自动处理领域的预训练模型。我们以校验后的高质量《四库全书》全文语料作为训练集，基于BERT深度语言模型框架，构建了面向古文智能处理任务的SikuBERT和SikuRoBERTa预训练语言模型。我们设计了面向《左传》语料的古文自动分词、断句标点、词性标注和命名实体识别4个下游任务，验证模型性能。 • SikuBERT和 SikuRoBERTa基于《四库全书》语料训练，《四库全书》又称《钦定四库全书》，是清代乾隆时期编修的大型丛书。实验去除了原本中的注释部分，仅纳入正文部分，参与实验的训练集共纳入字数达 536,097,588个，数据集内的汉字均为繁体中文。 • 基于领域适应训练（Domain-Adaptive Pretraining）的思想， SikuBERT和 SikuRoBERTa在BERT结构的基础上结合大量古文语料，分别继续训练BERT和RoBERTa模型，以获取面向古文自动处理领域的预训练模型。新闻 • 2021/5/8 模型加入 • 2021/5/6 论文被第五届全国未来智慧图书馆发展论坛会议录用 • 2021/8/20 论文于《图书馆论坛》 • 2021/9/13 更新sikuBERT和sikuRoberta,新发布的模型已具有包含《四库全书》原生词的新词表，新词表相比原先的bert-base的词表多了8000余字，在各项任务上的表现均超越前者。 • 2021/9/15 相关的python工具包sikufenci正式发布,可用于繁体古籍的自动分词，链接见 • 2021/11/6 本项目相关的单机版开源软件sikuaip正式发布，提供包括分词，断句，实体识别，文本分类等多种古文处理功能，可直接下载解压使用。链接见下文。 • 2021/12/10 基于本模型的第一个古汉语领域NLP工具评估比赛—— EvaHan 2022发布，比赛详情见： • 2022 基于四库全书和Chinese-Poetry的古文、古诗词生成式预训练模型发布，详见： • 2022/10/21 • 基于二十四史继续预训练的古文预训练语言模型TfhBERT发布，详见： • 发布古白跨语言预训练模型BTfhBER，详见：面向数字人文的古籍智能处理平台sikuaip1.0版本已正式发布下载链接: 平台的使用方法见“使用方法”文件夹，目前版本支持分词，断句，实体识别，文本分类，词性标注和自动标点六种功能，提供单文本处理和语料库处理两种文本处理模式，欢迎下载使用！古文生成与古诗词生成预训练模型古文生成预训练模型 SikuGPT2：古诗词生成预训练模型 SikuGPT2-poem：论文： AIGC助力数字人文研究的实践探索：SikuGPT驱动的古诗词生成研究：使用方法 Huggingface Transformers 基于 from_pretrained方法可以直接在线获取SikuBERT和SikuRoBERTa模型。 • SikuBERT from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer. from_pretrained( "SIKU-BERT/sikuroberta") model = AutoModel. from_pretrained( "SIKU-BERT/sikurob...